Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

要約

3D ビジュアルグラウンディングタスクは、3D シーン内のターゲットオブジェクトを識別するための参照言語を理解するビジュアルおよび言語ストリームを使用して検討されています。
ただし、ほとんどの既存の方法では、既製の点群エンコーダーを使用して、ビジュアルストリームを 3D 視覚的手がかりをキャプチャすることに専念しています。
このホワイトペーパーで取り上げる主な問題は、「点群から合成された 2D 手がかりによって 3D ビジュアルストリームを統合し、それらをトレーニングとテストで効率的に利用できるか?」です。
主なアイデアは、追加の 2D 入力を必要とせずにリッチな 2D オブジェクト表現を組み込むことで、3D エンコーダーを支援することです。
この目的のために、3D 点群から合成的に生成された 2D 手がかりを活用し、学習した視覚的表現の品質を向上させる適性を経験的に示します。
Nr3D、Sr3D、および ScanRefer データセットでの包括的な実験を通じてアプローチを検証し、既存の方法と比較して一貫したパフォーマンスの向上を示します。
Look Around and Refer (LAR) と呼ばれる私たちが提案したモジュールは、Nr3D、Sr3D、ScanRefer の 3 つのベンチマークで、最先端の 3D ビジュアルグラウンディングテクニックを大幅に上回っています。
コードは https://eslambakr.github.io/LAR.github.io/ で入手できます。

要約(オリジナル)

The 3D visual grounding task has been explored with visual and language streams comprehending referential language to identify target objects in 3D scenes. However, most existing methods devote the visual stream to capturing the 3D visual clues using off-the-shelf point clouds encoders. The main question we address in this paper is ‘can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?’. The main idea is to assist the 3D encoder by incorporating rich 2D object representations without requiring extra 2D inputs. To this end, we leverage 2D clues, synthetically generated from 3D point clouds, and empirically show their aptitude to boost the quality of the learned visual representations. We validate our approach through comprehensive experiments on Nr3D, Sr3D, and ScanRefer datasets and show consistent performance gains compared to existing methods. Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks, i.e., Nr3D, Sr3D, and ScanRefer. The code is available at https://eslambakr.github.io/LAR.github.io/.

arxiv情報

著者	Eslam Mohamed Bakr,Yasmeen Alsaedy,Mohamed Elhoseiny
発行日	2022-11-25 17:12:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー