GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

要約

3D セマンティック占有予測は、周囲環境の包括的なセマンティック認識を提供するため、空間理解の基礎となります。
ただし、一般的なアプローチは主に、大規模なラベル付きデータと計算集約型のボクセルベースのモデリングに依存しており、3D 表現学習のスケーラビリティと一般化性が制限されています。
この論文では、自己教師あり 3D 空間理解を促進するために基礎モデルとの位置合わせを活用する新しいガウス変換器である GaussTR を紹介します。
GaussTR は、フィードフォワード方式でシーンを表す 3D ガウスのまばらなセットを予測するために、Transformer アーキテクチャを採用しています。
GaussTR は、レンダリングされたガウス特徴を事前トレーニングされた基礎モデルからのさまざまな知識と調整することにより、多用途の 3D 表現の学習を促進し、明示的な注釈なしでオープンな語彙占有予測を可能にします。
Occ3D-nuScenes データセットの実証評価では、GaussTR の最先端のゼロショットパフォーマンスが実証され、トレーニング時間を約 50% 短縮しながら 11.70 mIoU を達成しました。
これらの実験結果は、スケーラブルで全体的な 3D 空間理解に対する GaussTR の重要な可能性を強調しており、自動運転や身体化エージェントへの有望な影響をもたらします。
コードは https://github.com/hustvl/GaussTR で入手できます。

要約(オリジナル)

3D Semantic Occupancy Prediction is fundamental for spatial understanding as it provides a comprehensive semantic cognition of surrounding environments. However, prevalent approaches primarily rely on extensive labeled data and computationally intensive voxel-based modeling, restricting the scalability and generalizability of 3D representation learning. In this paper, we introduce GaussTR, a novel Gaussian Transformer that leverages alignment with foundation models to advance self-supervised 3D spatial understanding. GaussTR adopts a Transformer architecture to predict sparse sets of 3D Gaussians that represent scenes in a feed-forward manner. Through aligning rendered Gaussian features with diverse knowledge from pre-trained foundation models, GaussTR facilitates the learning of versatile 3D representations and enables open-vocabulary occupancy prediction without explicit annotations. Empirical evaluations on the Occ3D-nuScenes dataset showcase GaussTR’s state-of-the-art zero-shot performance, achieving 11.70 mIoU while reducing training duration by approximately 50%. These experimental results highlight the significant potential of GaussTR for scalable and holistic 3D spatial understanding, with promising implications for autonomous driving and embodied agents. Code is available at https://github.com/hustvl/GaussTR.

arxiv情報

著者	Haoyi Jiang,Liu Liu,Tianheng Cheng,Xinjie Wang,Tianwei Lin,Zhizhong Su,Wenyu Liu,Xinggang Wang
発行日	2024-12-17 18:59:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー