Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian

要約

この論文では、言語埋め込み 3D ガウシアンと大規模言語モデル (LLM) を組み合わせることにより、自動運転におけるオープンボキャブラリー 3D シーンクエリの新しい方法を紹介します。
私たちは、LLM を利用して文脈的に標準的なフレーズを生成し、セグメンテーションとシーン解釈を強化するためにポジティブな単語を支援することを提案します。
私たちの手法では、GPT-3.5 Turbo をエキスパートモデルとして利用して高品質のテキストデータセットを作成し、それを使用して、デバイス上で展開するためのより小型で効率的な LLM を微調整します。
WayveScenes101 データセットに対する包括的な評価では、LLM に基づくセグメンテーションが、事前定義された標準フレーズに基づく従来のアプローチよりも大幅に優れていることが実証されました。
特に、微調整された小型モデルは、より高速な推論時間を維持しながら、大型のエキスパートモデルと同等のパフォーマンスを実現します。
アブレーション研究を通じて、ポジティブな言葉を支援する効果はモデルの規模と相関しており、より大きなモデルほど追加の意味情報を活用する能力が優れていることがわかりました。
この取り組みは、より効率的でコンテキストを認識した自動運転システムに向けた大幅な進歩を表しており、実際的な展開上の考慮事項を維持しながら、3D シーン表現と高レベルのセマンティッククエリを効果的に橋渡しします。

要約(オリジナル)

This paper introduces a novel method for open-vocabulary 3D scene querying in autonomous driving by combining Language Embedded 3D Gaussians with Large Language Models (LLMs). We propose utilizing LLMs to generate both contextually canonical phrases and helping positive words for enhanced segmentation and scene interpretation. Our method leverages GPT-3.5 Turbo as an expert model to create a high-quality text dataset, which we then use to fine-tune smaller, more efficient LLMs for on-device deployment. Our comprehensive evaluation on the WayveScenes101 dataset demonstrates that LLM-guided segmentation significantly outperforms traditional approaches based on predefined canonical phrases. Notably, our fine-tuned smaller models achieve performance comparable to larger expert models while maintaining faster inference times. Through ablation studies, we discover that the effectiveness of helping positive words correlates with model scale, with larger models better equipped to leverage additional semantic information. This work represents a significant advancement towards more efficient, context-aware autonomous driving systems, effectively bridging 3D scene representation with high-level semantic querying while maintaining practical deployment considerations.

arxiv情報

著者	Amirhosein Chahe,Lifeng Zhou
発行日	2024-12-16 20:54:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー