GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation

要約

鳥瞰図（BeV）表現は、多視点カメラ画像からの3D知覚に広く使用されている。これにより、異なるカメラからの特徴を共通の空間に統合し、3Dシーンの統一された表現を提供することができます。重要なコンポーネントは、画像ビューをBeVに変換するビュー変換器である。しかしながら、ジオメトリやクロスアテンションに基づく実際のビュー変換手法は、環境の微細構造をモデル化するのに最適でない3D空間のサブサンプリングを用いるため、シーンの十分な詳細表現を提供しない。本論文では、3D空間に配置され配向された3Dガウシアンの集合を用いてシーンを細かく表現することで、画像特徴をBeVに変換する新しい手法であるGaussianBeVを提案する。この表現は、ガウシアンスプラッティングに基づく3D表現レンダリングの最近の進歩を適応させることにより、BeV特徴マップを生成するためにスプラッターされる。GaussianBeVは、この3Dガウスモデリングと3Dシーンレンダリングプロセスをオンラインで、つまり特定のシーンで最適化することなく、BeVシーン理解のための単一ステージモデルに直接統合して使用する最初のアプローチである。実験により、提案された表現が非常に効果的であることが示され、nuScenesデータセットのBeVセマンティックセグメンテーションタスクにおいて、GaussianBeVが新たな最先端であることが示された。

要約(オリジナル)

The Bird’s-eye View (BeV) representation is widely used for 3D perception from multi-view camera images. It allows to merge features from different cameras into a common space, providing a unified representation of the 3D scene. The key component is the view transformer, which transforms image views into the BeV. However, actual view transformer methods based on geometry or cross-attention do not provide a sufficiently detailed representation of the scene, as they use a sub-sampling of the 3D space that is non-optimal for modeling the fine structures of the environment. In this paper, we propose GaussianBeV, a novel method for transforming image features to BeV by finely representing the scene using a set of 3D gaussians located and oriented in 3D space. This representation is then splattered to produce the BeV feature map by adapting recent advances in 3D representation rendering based on gaussian splatting. GaussianBeV is the first approach to use this 3D gaussian modeling and 3D scene rendering process online, i.e. without optimizing it on a specific scene and directly integrated into a single stage model for BeV scene understanding. Experiments show that the proposed representation is highly effective and place GaussianBeV as the new state-of-the-art on the BeV semantic segmentation task on the nuScenes dataset.

arxiv情報

著者	Florian Chabot,Nicolas Granger,Guillaume Lapouge
発行日	2024-12-04 16:43:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー