PerLA: Perceptive 3D Language Assistant

要約

大規模言語モデル (LLM) で 3D 物理世界を理解できるようにすることは、新たな、しかし挑戦的な研究の方向性です。
点群を処理するための現在の戦略は、通常、シーンをダウンサンプリングするか、個別の分析のためにシーンをより小さな部分に分割します。
ただし、どちらのアプローチでも、重要なローカルの詳細やグローバルなコンテキスト情報が失われるリスクがあります。
このペーパーでは、詳細とコンテキストの両方をより知覚できるように設計された 3D 言語アシスタントである PerLA を紹介し、LLM にとって視覚的な表現をより有益なものにします。
PerLA は、さまざまな点群領域から高解像度の (ローカル) 詳細を並行してキャプチャし、それらを低解像度の点群全体から取得した (グローバル) コンテキストと統合します。
我々は、ヒルベルト曲線を通じて点群の局所性を保存し、クロスアテンションとグラフニューラルネットワークを通じてローカルからグローバルへの情報を効果的に集約する新しいアルゴリズムを提案します。
最後に、トレーニングの安定性を促進するために、ローカル代表のコンセンサスに新たな損失を導入します。
PerLA は最先端の 3D 言語アシスタントを上回り、質問応答の場合は ScanQA で最大 +1.34 CiDEr、高密度キャプションの場合は ScanRefer で +4.22、Nr3D で +3.88 の向上を実現しました。\url{https://gfmei.
github.io/PerLA/}

要約(オリジナル)

Enabling Large Language Models (LLMs) to understand the 3D physical world is an emerging yet challenging research direction. Current strategies for processing point clouds typically downsample the scene or divide it into smaller parts for separate analysis. However, both approaches risk losing key local details or global contextual information. In this paper, we introduce PerLA, a 3D language assistant designed to be more perceptive to both details and context, making visual representations more informative for the LLM. PerLA captures high-resolution (local) details in parallel from different point cloud areas and integrates them with (global) context obtained from a lower-resolution whole point cloud. We present a novel algorithm that preserves point cloud locality through the Hilbert curve and effectively aggregates local-to-global information via cross-attention and a graph neural network. Lastly, we introduce a novel loss for local representation consensus to promote training stability. PerLA outperforms state-of-the-art 3D language assistants, with gains of up to +1.34 CiDEr on ScanQA for question answering, and +4.22 on ScanRefer and +3.88 on Nr3D for dense captioning.\url{https://gfmei.github.io/PerLA/}

arxiv情報

著者	Guofeng Mei,Wei Lin,Luigi Riz,Yujiao Wu,Fabio Poiesi,Yiming Wang
発行日	2024-11-29 15:20:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PerLA: Perceptive 3D Language Assistant

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー