GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

要約

3D セマンティック占有予測は、周囲のシーンの 3D のきめの細かいジオメトリとセマンティクスを取得することを目的としており、ビジョン中心の自動運転の堅牢性にとって重要なタスクです。
既存の手法のほとんどは、ボクセルなどの高密度グリッドをシーン表現として採用していますが、これは占有のまばらさやオブジェクトのスケールの多様性を無視しているため、リソースの不均衡な割り当てにつながります。
これに対処するために、我々は、各ガウスが柔軟な関心領域とその意味論的特徴を表す、まばらな 3D 意味論的ガウスを使用して 3D シーンを記述するオブジェクト中心の表現を提案します。
アテンションメカニズムを通じて画像から情報を集約し、位置、共分散、セマンティクスなどの 3D ガウスのプロパティを繰り返し改良します。
次に、特定の位置の隣接ガウスのみを集計する、3D 占有予測を生成するための効率的なガウスからボクセルへのスプラッティング方法を提案します。
私たちは、広く採用されている nuScenes と KITTI-360 データセットに対して広範な実験を行っています。
実験結果は、GaussianFormer が、わずか 17.8% ～ 24.8% のメモリ消費量で、最先端の手法と同等のパフォーマンスを達成することを示しています。
コードは https://github.com/huang-yh/GaussianFormer から入手できます。

要約(オリジナル)

3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI-360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% – 24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.

arxiv情報

著者	Yuanhui Huang,Wenzhao Zheng,Yunpeng Zhang,Jie Zhou,Jiwen Lu
発行日	2024-05-27 17:59:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー