Is Generative Communication between Embodied Agents Good for Zero-Shot ObjectNav?

要約

Zero-Shot ObjectNav では、具体化された地上エージェントは、環境固有の微調整を行わずに、自然言語ラベルで指定されたターゲットオブジェクトに移動することが期待されます。
地上エージェントの視野が限られており、その独立した探索行動を考慮すると、これは困難です。
これらの問題に対処するために、地上エージェントと並行して、限定されたグローバルビューを備えた支援オーバーヘッドエージェントを検討し、賢明な探索のための 2 つの調整されたナビゲーションスキームを提示します。
私たちは、ゼロショット ObjectNav の改善における、視覚言語モデル (VLM) を備えた身体化エージェント間の生成的コミュニケーション (GC) の影響を確立し、地上エージェントの目標オブジェクトを見つける能力の 10% 向上を達成しました。
シミュレーションでのセットアップは不要です。
さらに、幻覚と協力の存在を定量化する固有の特性について GC を分析します。
特に、私たちは具体化された設定に特有の「先制幻覚」という独特の特性を特定します。そこでは、地上エージェントがまだ動いていないのに、地上エージェントが対話でアクションを実行したと頭上エージェントが想定します。
最後に、GC を使用して現実世界の推論を実行し、迅速な微調整によって先制幻覚に対抗することで現実世界の ObjectNav パフォーマンスが向上する定性的な例を紹介します。

要約(オリジナル)

In Zero-Shot ObjectNav, an embodied ground agent is expected to navigate to a target object specified by a natural language label without any environment-specific fine-tuning. This is challenging, given the limited view of a ground agent and its independent exploratory behavior. To address these issues, we consider an assistive overhead agent with a bounded global view alongside the ground agent and present two coordinated navigation schemes for judicious exploration. We establish the influence of the Generative Communication (GC) between the embodied agents equipped with Vision-Language Models (VLMs) in improving zero-shot ObjectNav, achieving a 10% improvement in the ground agent’s ability to find the target object in comparison with an unassisted setup in simulation. We further analyze the GC for unique traits quantifying the presence of hallucination and cooperation. In particular, we identify a unique trait of ‘preemptive hallucination’ specific to our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move. Finally, we conduct real-world inferences with GC and showcase qualitative examples where countering pre-emptive hallucination via prompt finetuning improves real-world ObjectNav performance.

arxiv情報

著者	Vishnu Sashank Dorbala,Vishnu Dutt Sharma,Pratap Tokekar,Dinesh Manocha
発行日	2024-08-11 21:26:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Is Generative Communication between Embodied Agents Good for Zero-Shot ObjectNav?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー