【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

文章目录

  • 1. Street-View Image Generation from a Bird’s-Eye View Layout
    • 1.1 Problem introduction
    • 1.2 Why
    • 1.3 How
    • 1.4 My takeaway
  • 2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
    • 2.1 What
    • 2.2 Why
    • 2.3 How
    • 2.4 My takeaway
  • 3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)
    • 3.1 What
    • 3.2 Why
    • 3.3 How
    • 3.4 My takeaway

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

  • Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
  • The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
  • In addition to this consistency, the images must correspond to the HD
    map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

  1. Methods
    在这里插入图片描述
    As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.

  2. Experiments
    Three metrics are used.
    在这里插入图片描述
    FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
    Scene edit was achieved by the change of Bev layout:
    在这里插入图片描述

1.4 My takeaway

  1. How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
  2. I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

  • Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.

  • Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.

  • GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.

  • Interesting refer:

    17: Layout-transformer: Layout generation and completion with self-attention.

    26: Layout-gan: Generating graphic layouts with wireframe discriminators.

    58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

  1. Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

  1. Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
  2. Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
  3. DiscoScene introduces complexity in interpreting the entire scene and
    faces scalability challenges when using Bbox.
  4. BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator U U U, which can generate a 2D feature map based on BEV map conditions. Builder U U U adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

  1. Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/624764.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

如何在PPT中获得网页般的互动效果

如何在PPT中获得网页般的互动效果 效果可以看视频 PPT中插入网页有互动效果 当然了,获得网页般的互动效果,最简单的方法就是在 PPT 中插入网页呀。 那么如何插入呢? 接下来为你讲解如何获得(此方法在 PowerPoint中行得通&#…

Unity 点击次数统计功能

介绍 💡.调用方便,发生点击事件后直接通过"xxx".CacheClick缓存 💡. 在允许的时间间隔内再次点击会累计点击次数,直到超出后触发事件 传送门👈

记录一下hive跑spark的insert,update语句报类找不到的问题

我hive能正常启动,建表没问题,我建了一个student表,没问题,但执行了下面一条insert语句后报如下错误: hive (default)> insert into table student values(1,abc); Query ID atguigu_20240417184003_f9d459d7-199…

【GD32】_时钟架构及系统时钟频率配置

文章目录 一、有关时钟源二、系统时钟架构三、时钟树分析四、修改参数步骤1、设置外部晶振2、选择外部时钟源。3、 设置系统主频率大小4、修改PLL分频倍频系数 学习系统时钟架构和时钟树,验证及学习笔记如下,如有错误,欢迎指正。主要记录了总…

基于springboot的扶贫助农系统

文章目录 项目介绍主要功能截图:部分代码展示设计总结项目获取方式 🍅 作者主页:超级无敌暴龙战士塔塔开 🍅 简介:Java领域优质创作者🏆、 简历模板、学习资料、面试题库【关注我,都给你】 &…

用海外云手机高效率运营TikTok!

很多做国外社媒运营的公司,想要快速引流,往往一个账号是不够的,多数都是矩阵养号的方式,运营多个TikToK、Facebook、Instagram等账号,慢慢沉淀流量变现,而他们都在用海外云手机这款工具! 海外云…

二级综合医院云HIS系统源码,B/S架构,采用JAVA编程,集成相关医保接口

二级医院云HIS系统源码 云HIS系统是一款满足基层医院各类业务需要的健康云产品。该产品能帮助基层医院完成日常各类业务,提供病患预约挂号支持、病患问诊、电子病历、开药发药、会员管理、统计查询、医生工作站和护士工作站等一系列常规功能,还能与公卫…

【muzzik 分享】关于 MKFramework 的设计想法

MKFramework是我个人维护持续了几年的项目(虽然公开只有一年左右),最开始由于自己从事QP类游戏开发,我很喜欢MVVM,于是想把他做成 MVVM 框架,在论坛第一个 MVVM 框架出来的时候,我的框架已经快完…

PySpark预计算ClickHouse Bitmap实践

1. 背景 ClickHouse全称是Click Stream,Data WareHouse,是一款高性能的OLAP数据库,既使用了ROLAP模型,又拥有着比肩MOLAP的性能。我们可以用ClickHouse用来做分析平台快速出数。其中的bitmap结构方便我们对人群进行交并。Bitmap位…

KDD 2023 | 时空数据(Spatial-Temporal)论文总结

2023 KDD论文接收情况:Research track(研究赛道)接收率:22.1%(313/1416),ADS Track(应用数据科学赛道)接收率:25.4%(184/725) &#…

4个步骤:如何使用 SwiftSoup 和爬虫代理获取网站视频

摘要/导言 在本文中,我们将探讨如何使用 SwiftSoup 库和爬虫代理技术来获取网站上的视频资源。我们将介绍一种简洁、可靠的方法,以及实现这一目标所需的步骤。 背景/引言 随着互联网的迅速发展,爬虫技术在今天的数字世界中扮演着越来越重要…

汽车抗疲劳驾驶测试铸铁试验底座技术要求有哪些

铸铁平台试验台底座的主要技术参数要求 1、 试验台底座设计制造符合JB/T794-1999《铸铁平板》标准。 2、 试验铁底板及所有附件的计量单位全部采用 单位(SI)标准。 3、铸铁平台平板材质:用细密的灰口铸铁HT250或HT200,强度符…