EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

要約

タイトル：3Dビジュアルグラウンディングのための明示的なテキスト分離と密なアラインメント

要約：
– 3Dビジュアルグラウンディングは、豊富な意味情報を持つ自由形式の自然言語記述で言及されるポイントクラウド内のオブジェクトを見つけることを目的としています。
– 既存の手法は、すべての単語を結合して文レベルの特徴量を抽出するか、オブジェクト名により焦点を当てるため、単語レベルの情報を失うか他の属性を無視するおそれがあります。
– これらの問題を軽減するために、我々はEDAを提案し、テキストの属性を明示的に分離し、細かい言語とポイントクラウドオブジェクトの密なアラインメントを行います。
– 具体的には、最初にテキスト分離モジュールを提案して、すべての意味コンポーネントに対してテキスト特徴量を生成します。次に、2つの損失を設計し、2つのモダリティ間の密なマッチングを監視します：位置アラインメント損失と意味アラインメント損失。
– さらに、オブジェクト名のないオブジェクトを特定する新しいビジュアルグラウンディングタスクを導入し、モデルの密なアラインメント能力を徹底的に評価します。
– 実験を通じて、2つの広く採用された3DビジュアルグラウンディングデータセットであるScanReferとSR3D / NR3Dで最先端の性能を発揮し、新しく提案されたタスクで絶対的なリーダーシップを確保します。ソースコードはhttps://github.com/yanmin-wu/EDAで入手可能です。

要約(オリジナル)

3D visual grounding aims to find the object within point clouds mentioned by free-form natural language descriptions with rich semantic cues. However, existing methods either extract the sentence-level features coupling all words or focus more on object names, which would lose the word-level information or neglect other attributes. To alleviate these issues, we present EDA that Explicitly Decouples the textual attributes in a sentence and conducts Dense Alignment between such fine-grained language and point cloud objects. Specifically, we first propose a text decoupling module to produce textual features for every semantic component. Then, we design two losses to supervise the dense matching between two modalities: position alignment loss and semantic alignment loss. On top of that, we further introduce a new visual grounding task, locating objects without object names, which can thoroughly evaluate the model’s dense alignment capacity. Through experiments, we achieve state-of-the-art performance on two widely-adopted 3D visual grounding datasets, ScanRefer and SR3D/NR3D, and obtain absolute leadership on our newly-proposed task. The source code is available at https://github.com/yanmin-wu/EDA.

arxiv情報

著者	Yanmin Wu,Xinhua Cheng,Renrui Zhang,Zesen Cheng,Jian Zhang
発行日	2023-04-24 13:16:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー