L3MVN: Leveraging Large Language Models for Visual Target Navigation

要約

未知の環境における視覚的ターゲットのナビゲーションは、ロボット工学における重要な問題です。
過去に古典的な学習ベースのアプローチが広範に研究されてきたにもかかわらず、ロボットには家庭内の物品やレイアウトに関する常識的な知識が欠けています。
このタスクに対する従来の最先端のアプローチは、トレーニング中に事前学習を行うことに依存しており、通常、学習にかなりの高価なリソースと時間を必要とします。
これに対処するために、大規模言語モデル (LLM) を活用してオブジェクト検索の常識を伝える、視覚的ターゲットナビゲーションのための新しいフレームワークを提案します。
具体的には、2 つのパラダイムを導入します。(i) ゼロショットアプローチと (ii) 言語を使用して長期目標としてセマンティックマップから関連するフロンティアを見つけ、環境を効率的に探索するフィードフォワードアプローチです。
私たちの分析は、言語の使用による注目すべきゼロショット一般化と伝達能力を実証しています。
Gibson と Habitat-Matterport 3D (HM3D) での実験では、提案されたフレームワークが成功率と一般化の点で既存のマップベースの手法を大幅に上回ることが実証されました。
アブレーション分析は、言語モデルからの常識的な知識がより効率的な意味論的探索につながることも示しています。
最後に、実際のロボット実験を提供して、現実世界のシナリオにおけるフレームワークの適用可能性を検証します。
補足のビデオとコードには、リンク https://sites.google.com/view/l3mvn からアクセスできます。

要約(オリジナル)

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time for learning. To address this, we propose a new framework for visual target navigation that leverages Large Language Models (LLM) to impart common sense for object searching. Specifically, we introduce two paradigms: (i) zero-shot and (ii) feed-forward approaches that use language to find the relevant frontier from the semantic map as a long-term goal and explore the environment efficiently. Our analysis demonstrates the notable zero-shot generalization and transfer capabilities from the use of language. Experiments on Gibson and Habitat-Matterport 3D (HM3D) demonstrate that the proposed framework significantly outperforms existing map-based methods in terms of success rate and generalization. Ablation analysis also indicates that the common-sense knowledge from the language model leads to more efficient semantic exploration. Finally, we provide a real robot experiment to verify the applicability of our framework in real-world scenarios. The supplementary video and code can be accessed via the following link: https://sites.google.com/view/l3mvn.

arxiv情報

著者	Bangguo Yu,Hamidreza Kasaei,Ming Cao
発行日	2023-12-25 07:44:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

L3MVN: Leveraging Large Language Models for Visual Target Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー