Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

要約

視覚ターゲットナビゲーションは、未知の環境、特に人間とロボットの相互作用シナリオで動作する自律的なロボットにとって重要な機能です。
古典的および学習ベースの方法は有望であるが、既存のアプローチのほとんどは常識的な推論を欠いており、通常、シングルロボット設定向けに設計されており、複雑な環境で効率と堅牢性の低下をもたらします。
これらの制限に対処するために、Co-Navgptを紹介します。これは、Gollas Multi-Robot Visual Target Navigationを有効にするためのグローバルプランナーとしてVision Language Model（VLM）を統合する新しいフレームワークです。
co-navgpt集合体は、多様な視点を持つ複数のロボットからサブマップを統一されたグローバルマップ、ロボット状態とフロンティア地域をエンコードします。
VLMはこの情報を使用してロボット全体にフロンティアを割り当て、調整された効率的な探索を促進します。
Habitat-Matterport 3D（HM3D）の実験は、CO-Navgptがタスク固有のトレーニングを必要とせずに、成功率とナビゲーション効率の観点から既存のベースラインを上回ることを示しています。
アブレーション研究は、VLMからのセマンティックプライアーの重要性をさらに確認しています。
また、Quadrupedalロボットを使用して、実際のシナリオでフレームワークを検証します。
補足ビデオとコードは、https：//sites.google.com/view/co-navgpt2で入手できます。

要約(オリジナル)

Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing approaches lack common-sense reasoning and are typically designed for single-robot settings, leading to reduced efficiency and robustness in complex environments. To address these limitations, we introduce Co-NavGPT, a novel framework that integrates a Vision Language Model (VLM) as a global planner to enable common-sense multi-robot visual target navigation. Co-NavGPT aggregates sub-maps from multiple robots with diverse viewpoints into a unified global map, encoding robot states and frontier regions. The VLM uses this information to assign frontiers across the robots, facilitating coordinated and efficient exploration. Experiments on the Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT outperforms existing baselines in terms of success rate and navigation efficiency, without requiring task-specific training. Ablation studies further confirm the importance of semantic priors from the VLM. We also validate the framework in real-world scenarios using quadrupedal robots. Supplementary video and code are available at: https://sites.google.com/view/co-navgpt2.

arxiv情報

著者	Bangguo Yu,Qihao Yuan,Kailai Li,Hamidreza Kasaei,Ming Cao
発行日	2025-05-06 14:06:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー