Instance-Level Semantic Maps for Vision Language Navigation

要約

人間には、環境内の周囲のオブジェクトと意味的な関連付けを実行する自然な能力があります。
これにより、言語指導を受けたときにオンデマンドでナビゲートするのに役立つ環境の心の地図を作成することができます。
Vision Language Navigation (VLN) 研究における当然の目標は、自律エージェントに同様の機能を与えることです。
最近導入された VL マップ \cite{huang23vlmaps} は、ラベル付きデータなしで環境のセマンティック空間マップ表現を作成することで、この目標に向けて一歩を踏み出しました。
ただし、それらの表現は、同じオブジェクトの異なるインスタンスを区別しないため、実際の適用には制限されます。
この研究では、コミュニティ検出アルゴリズムを使用してインスタンスレベルの情報を空間マップ表現に統合し、大規模言語モデル (LLM) によって学習された単語オントロジーを利用して、マッピング表現でオープンセットの意味論的な関連付けを実行することで、この制限に対処します。
結果として得られるマップ表現は、VL マップと比較して、インスタンス固有の記述を含む現実的な言語コマンドでのナビゲーションパフォーマンスを 2 倍 (233\%) 向上させます。
私たちは広範な定性的および定量的な実験を通じて、アプローチの実用性と有効性を検証します。

要約(オリジナル)

Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment which helps them to navigate on-demand when given a linguistic instruction. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recently introduced VL Maps \cite{huang23vlmaps} take a step towards this goal by creating a semantic spatial map representation of the environment without any labelled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and by utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the mapping representation. The resulting map representation improves the navigation performance by two-fold (233\%) on realistic language commands with instance-specific descriptions compared to VL Maps. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments.

arxiv情報

著者	Laksh Nanwani,Anmol Agarwal,Kanishk Jain,Raghav Prabhakar,Aaron Monis,Aditya Mathur,Krishna Murthy,Abdul Hafez,Vineet Gandhi,K. Madhava Krishna
発行日	2023-05-23 08:57:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Instance-Level Semantic Maps for Vision Language Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー