Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models

要約

この論文では、大規模な言語モデルと視覚言語モデルを使用して、ロボットが通過可能な障害物のある環境でナビゲーションできるようにする対話型ナビゲーションフレームワークを提案します。
大規模言語モデル (GPT-3.5) とオープンセットのビジョン言語モデル (Grounding DINO) を利用して、アクションを意識したコストマップを作成し、微調整せずに効果的なパス計画を実行します。
大規模なモデルを使用すると、「薬を届けるためにカーテンを通過できますか?」のようなテキストの指示から、アクションを認識する属性を備えた境界ボックス (カーテンなど) まで、エンドツーエンドのシステムを実現できます。
これらを使用して、LiDAR 点群を 2 つの部分 (通過可能な部分と通過不可能な部分) にセグメント化することができ、その後、実行可能なパスを生成するためにアクションを認識したコストマップが構築されます。
事前トレーニングされた大規模なモデルは優れた一般化能力を備えており、トレーニング用に追加の注釈付きデータを必要としないため、対話型ナビゲーションタスクでの迅速な展開が可能になります。
カーテンや草などの複数の通過可能なオブジェクトを検証に使用することを選択し、ロボットにそれらを通過するように指示します。
さらに、医療シナリオでのカーテンの横移動もテストされました。
すべての実験結果は、提案されたフレームワークの有効性と多様な環境への適応性を実証しました。

要約(オリジナル)

This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like ‘Can you pass through the curtains to deliver medicines to me?’, to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework’s effectiveness and adaptability to diverse environments.

arxiv情報

著者	Zhen Zhang,Anran Lin,Chun Wai Wong,Xiangyu Chu,Qi Dou,K. W. Samuel Au
発行日	2024-03-13 02:53:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー