【读论文】【泛读】S-NERF: NEURAL RADIANCE FIELDS FOR STREET VIEWS

文章目录

    • 0. Abstract
    • 1. Introduction
    • 2. Related work
    • 3. Methods-NERF FOR STREET VIEWS
      • 3.1 CAMERA POSE PROCESSING
      • 3.2 REPRESENTATION OF STREET SCENES
      • 3.3 DEPTH SUPERVISION
      • 3.4 Loss function
    • 4. EXPERIMENTS
    • 5. Conclusion
    • Reference

0. Abstract

Problem introduction:

However, we conjugate that this paradigm does not fit the nature of the street views that are collected by many self-driving cars from the large-scale unbounded scenes. Also,the onboard cameras perceive scenes without much overlapping.

Solutions:

  • Consider novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly.

  • Improve the scene parameterization function and the camera poses.

  • We also use the the noisy and sparse LiDAR points to boost the training and learn a robust geometry and reprojection-based confidence to address the depth outliers.

  • Extend our S-NeRF for reconstructing moving vehicles

Effect:

Reduce 7 40% of the mean-squared error in the street-view synthesis and a 45% PSNR gain for the moving vehicles rendering

1. Introduction

Overcome the shortcomings of the work of predecessors:

  • MipNeRF-360 (Barron et al., 2022) is designed for training in unbounded scenes. But it still needs enough intersected camera rays

  • BlockNeRF (Tancik et al., 2022) proposes a block-combination strategy with refined poses, appearances, and exposure on the MipNeRF (Barron et al., 2021) base model for processing large-scale outdoor scenes. But it needs a special platform to collect the data, and is hard to utilize the existing dataset.

  • Urban-NeRF (Rematas et al., 2022) takes accurate dense LiDAR depth as supervision for the reconstruction of urban scenes. But the accurate dense LiDAR is too expensive. S-Nerf just needs the noisy sparse LiDAR signals.

2. Related work

There are lots of papers in the field of Large-scale NeRF and Depth supervised NeRF

在这里插入图片描述

This paper is included in the field of both of them.

3. Methods-NERF FOR STREET VIEWS

3.1 CAMERA POSE PROCESSING

SfM used in previous NeRFs fails. Therefore, we proposed two different methods to reconstruct the camera poses for the static background and the foreground moving vehicles.

  1. Background scenes

    For the static background, we use the camera parameters achieved by sensor-fusion SLAM and IMU of the self-driving cars (Caesar et al., 2019; Sun et al., 2020) and further reduce the inconsistency between multi-cameras with a learning-based pose refinement network.

  2. Moving vehicles
    在这里插入图片描述

    We now transform the coordinate system by setting the target object’s center as the coordinate system’s origin.
    P ^ i = ( P i P b − 1 ) − 1 = P b P i − 1 , P − 1 = [ R T − R T T 0 T 1 ] . \hat{P}_i=(P_iP_b^{-1})^{-1}=P_bP_i^{-1},\quad P^{-1}=\begin{bmatrix}R^T&-R^TT\\\mathbf{0}^T&1\end{bmatrix}. P^i=(PiPb1)1=PbPi1,P1=[RT0TRTT1].

    And, P = [ R T 0 T 1 ] P=\begin{bmatrix}R&T\\\mathbf{0}^T&1\end{bmatrix} P=[R0TT1]represents the old position of the camera or the target object.

    After the transformation, only the camera is moving which is favorable in training NeRFs.

    How to convert the parameter matrix.

3.2 REPRESENTATION OF STREET SCENES

  1. Background scenes

    constrain the whole scene into a bounded range

    image-20231121223229096

    This part is from mipnerf-360

  2. Moving Vehicles

    Compute the dense depth maps for the moving cars as an extra supervision

    • We follow GeoSim (Chen et al., 2021b) to reconstruct coarse mesh from multi-view images and the sparse LiDAR points.

    • After that, a differentiable neural renderer (Liu et al., 2019) is used to render the corresponding depth map with the camera parameter (Section 3.2).

    • The backgrounds are masked during the training by an instance segmentation network (Wang et al., 2020).

    There are three references

3.3 DEPTH SUPERVISION

image-20231123205037308

To provide credible depth supervisions from defect LiDAR depths, we first propagate the sparse depths and then construct a confidence map to address the depth outliers(异常值).

  1. LiDAR depth completion

    Use NLSPN (Park et al., 2020) to propagate the depth information from LiDAR points to surrounding pixels.

  2. Reprojection confidence

    Measure the accuracy of the depths and locate the outliers.

    The warping operation can be represented as:

    X t = ψ ( ψ − 1 ( X s , P s ) , P t ) \mathbf{X}_t=\psi(\psi^{-1}(\mathbf{X}_s,P_s),P_t) Xt=ψ(ψ1(Xs,Ps),Pt)

    And we use three features to measure the similarity:

    C r g b = 1 − ∣ I s − I ^ s ∣ , C s s i m = S S I M ( I s , I ^ s ) ) , C v g g = 1 − ∥ F s − F ^ s ∥ . \mathcal{C}_{\mathrm{rgb}}=1-|\mathcal{I}_{s}-\hat{\mathcal{I}}_{s}|,\quad\mathcal{C}_{\mathrm{ssim}}=\mathrm{SSIM}(\mathcal{I}_{s},\hat{\mathcal{I}}_{s})),\quad\mathcal{C}_{\mathrm{vgg}}=1-\|\mathcal{F}_{s}-\hat{\mathcal{F}}_{s}\|. Crgb=1IsI^s,Cssim=SSIM(Is,I^s)),Cvgg=1FsF^s∥.

  3. Geometry confidence

    Measure the geometry consistency of the depths and flows across different views.

    The depth:

    C d e p t h = γ ( ∣ d t − d ^ t ) ∣ / d s ) , γ ( x ) = { 0 , if x ≥ τ , 1 − x / τ , otherwise . \left.\mathcal{C}_{depth}=\gamma(|d_t-\hat{d}_t)|/d_s),\quad\gamma(x)=\left\{\begin{array}{cc}0,&\text{if}x\geq\tau,\\1-x/\tau,&\text{otherwise}.\end{array}\right.\right. Cdepth=γ(dtd^t)∣/ds),γ(x)={0,1x/τ,ifxτ,otherwise.

    The flow:

    C f l o w = γ ( ∥ Δ x , y − f s → t ( x s , y s ) ∥ ∥ Δ x , y ∥ ) , Δ x , y = ( x t − x s , y t − y s ) . \mathcal{C}_{flow}=\gamma(\frac{\|\Delta_{x,y}-f_{s\rightarrow t}(x_{s},y_{s})\|}{\|\Delta_{x,y}\|}),\quad\Delta_{x,y}=(x_{t}-x_{s},y_{t}-y_{s}). Cflow=γ(Δx,yΔx,yfst(xs,ys)),Δx,y=(xtxs,ytys).

  4. Learnable confidence combination

    The final confidence map can be learned as C ^ = ∑ i ω i C i , \hat{\mathcal{C}}=\sum_{i}{\omega_{i}\mathcal{C}_{i}}, C^=iωiCi, where ∑ i w i = 1 \sum_iw_i=1 iwi=1 and i ∈ { r g b , s s i m , v g g , d e p t h , f l o w } i \in \{rgb,ssim,vgg,depth,flow\} i{rgb,ssim,vgg,depth,flow}

3.4 Loss function

A RGB loss, a depth loss, and an edge-aware smoothness constraint to penalize large variances in depth.
L c o l o r = ∑ r ∈ R ∥ I ( r ) − I ^ ( r ) ∥ 2 2 L d e p t h = ∑ C ^ ⋅ ∣ D − D ^ ∣ L s m o o t h = ∑ ∣ ∂ x D ^ ∣ exp ⁡ − ∣ ∂ x I ∣ + ∣ ∂ y D ^ ∣ exp ⁡ − ∣ ∂ y I ∣ L t o t a l = L c o l o r + λ 1 L d e p t h + λ 2 L s m o o t h \begin{aligned} &\mathcal{L}_{\mathrm{color}}=\sum_{\mathbf{r}\in\mathcal{R}}\|I(\mathbf{r})-\hat{I}(\mathbf{r})\|_{2}^{2} \\ &\mathcal{L}_{depth}=\sum\hat{\mathcal{C}}\cdot|\mathcal{D}-\hat{\mathcal{D}}| \\ &\mathcal{L}_{smooth}=\sum|\partial_{x}\hat{\mathcal{D}}|\exp^{-|\partial_{x}I|}+|\partial_{y}\hat{\mathcal{D}}|\exp^{-|\partial_{y}I|} \\ &\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{color}}+\lambda_{1}\mathcal{L}_{depth}+\lambda_{2}\mathcal{L}_{smooth} \end{aligned} Lcolor=rRI(r)I^(r)22Ldepth=C^DD^Lsmooth=xD^expxI+yD^expyILtotal=Lcolor+λ1Ldepth+λ2Lsmooth

4. EXPERIMENTS

  1. Dataset: nuScenes and Waymo
  • For the foreground vehicles, we extract car crops from nuScenes and Waymo video sequences.
  • For the large-scale background scenes, we use scenes with 90∼180 images.
  • In each scene, the ego vehicle moves around 10∼40 meters, and the whole scene span more than 200m.
  1. Foreground Vehicles

    Different experiments in static vehicles and moving vehicles, compared with Origin-NeRF and GeoSim(the latest non-NeRF car reconstruction method).

    image-20231128233015946

    There are large room for improvement in PSNR

  2. Background scenes

    Compared with the state-of-the-art methods Mip-NeRF (Barron et al., 2021), Urban-NeRF (Rematas et al., 2022), and Mip-NeRF 360.
    在这里插入图片描述
    Also, show a 360-degree panorama to emphasize some details:
    在这里插入图片描述

  3. BACKGROUND AND FOREGROUND FUSION
    Depth-guided placement and inpainting (e.g. GeoSim Chen et al. (2021b)) and joint NeRF rendering (e.g. GIRAFFE Niemeyer & Geiger (2021)) heavily rely on accurate depth maps and 3D geometry information.
    A new method was used without an introduction?

  4. Ablation study
    在这里插入图片描述

    Split into RGB, depth confidence, and smooth loss.

5. Conclusion

In the future, we will use the block merging as proposed in Block-NeRF (Tancik et al., 2022) to learn a larger city-level neural representation

Reference

[1] S-NeRF (ziyang-xie.github.io)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/227645.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

REST-Assured--JAVA REST服务自动化测试的Swiss Army Knife

什么是REST-Assured REST Assured是一套基于 Java 语言实现的开源 REST API 测试框架 Testing and validation of REST services in Java is harder than in dynamic languages such as Ruby and Groovy. REST Assured brings the simplicity of using these languages into t…

【深入解析git和gdb:版本控制与调试利器的终极指南】

【本节目标】 1. 掌握简单gdb使用于调试 2. 学习 git 命令行的简单操作, 能够将代码上传到 Github 上 1.Linux调试器-gdb使用 1.1.背景 程序的发布方式有两种,debug模式和release模式release模式不可被调试,debug模式可被调试Linux gcc/g出来的二进制…

kali linux nmap 端口扫描 简单教程

本次实验所用工具如下: VMwarekali linux (namp扫描工具)Windows sever 2016 需开启(FTP,smp,Telnet,rdp)端口namp操作所用部分代码: -sP ping 扫描 -P 指定端口范围 -sV 服务版本探测 -A …

易宝OA系统ExecuteSqlForSingle接口SQL注入漏洞复现 [附POC]

文章目录 易宝OA系统ExecuteSqlForSingle接口SQL注入漏洞复现 [附POC]0x01 前言0x02 漏洞描述0x03 影响版本0x04 漏洞环境0x05 漏洞复现1.访问漏洞环境2.构造POC3.复现 易宝OA系统ExecuteSqlForSingle接口SQL注入漏洞复现 [附POC] 0x01 前言 免责声明:请勿利用文章…

GitLab 登录中,LDAP和 Standard 验证有什么区别

在 GitLab 中,LDAP(Lightweight Directory Access Protocol)和 Standard 验证是两种不同的身份验证方法,它们有以下区别: LDAP(Lightweight Directory Access Protocol)身份验证: L…

WIFI HaLow技术引领智能互联,打破通信限制

在过去十年里,WIFI技术已在家庭和企业中建立起了庞大的网络,连接了数十亿智能互联设备,促进了信息的迅速传递。然而,当前的WIFI标准存在一些挑战,包括协议范围的限制和整体功能的受限,导致在较远距离进行通…

【开源】基于JAVA的城市桥梁道路管理系统

项目编号: S 025 ,文末获取源码。 \color{red}{项目编号:S025,文末获取源码。} 项目编号:S025,文末获取源码。 目录 一、摘要1.1 项目介绍1.2 项目录屏 二、功能模块三、系统展示四、核心代码4.1 查询城市桥…

鸿蒙4.0开发笔记之ArkTS语法的基础数据类型[DevEco Studio开发](七)

文章目录 一、基本数据类型的定义1、变量声明2、数字类型3、字符串类型4、布尔类型5、数组类型6、元组类型7、枚举类型8、联合类型(少用)9、未知Unkown类型10、未定义和空值类型 二、数据类型的使用1、组件内部声明变量不需要使用let关键字2、使用Divide…

C语言进阶指南(11)(指针数组与二维数组)

*欢迎来到博主的专栏——C语言进阶指南 博主id:reverie_ly 文章目录 N级指针指针数组指针数组与二维数组数组指针作为函数的参数 N级指针 指针变量是一个存放地址的变量,在C语言中,每个变量都会有一个地址值。所以指针变量也有一个地址。 …

Linux文件目录结构_文件管理

Linux文件目录结构 Linux目录结构简洁 windows:以多根的方式组织文件 C:\ D:\ E:\ Linux: 以单根的方式组织文件/ Linux目录结构视图 注意区分: 系统管理员:中文“根”,root 系统目录(文件夹):根&#xf…

时间序列分析【python代码实现】

时间序列分析是一种用于建模和分析时间上连续观测的统计方法。 它涉及研究数据在时间维度上的模式、趋势和周期性。常见的时间序列分析包括时间序列的平稳性检验、自相关性和部分自相关性分析、时间序列模型的建立和预测等。 下面是一个使用Python实现时间序列分析的示例&…

鸿蒙4.0开发笔记之ArkTS语法基础@Entry@Component自定义组件的使用(九)

文章目录 一、自定义组件概述1、什么是自定义组件2、自定义组件的优点 二、创建自定义组件1、自定义组件的结构2、自定义组件要点3、成员变量的创建4、参数传递规则 三、练习案例 一、自定义组件概述 1、什么是自定义组件 在ArkUI中,UI显示的内容均为组件&#xf…