Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

要約

不均一なマルチロボットシステムは、ハイブリッド協力を必要とする複雑なタスクで大きな可能性を示しています。
ただし、静的モデルに依存する従来のアプローチは、タスクの多様性と動的環境に苦労することがよくあります。
これは、不均一なエージェント全体で低レベルの実行で高レベルの推論を埋めることができる一般化可能なインテリジェンスの必要性を強調しています。
これに対処するために、プロンプトの大きな言語モデル（LLM）とグリッドマスクが強化された微調整されたビジョン言語モデル（VLM）を統合する階層フレームワークを提案します。
LLMはタスクを分解し、グローバルセマンティックマップを構築しますが、VLMは航空画像からタスク指定のセマンティックラベルと2D空間情報を抽出して、ローカル計画をサポートします。
このフレームワーク内で、空中ロボットは最適化されたグローバルセマンティックパスをたどり、鳥類観測画像を継続的に提供し、地上ロボットのローカルセマンティックナビゲーションと操作を導きます。
実際のキューブまたはオブジェクトの配置タスクに関する実験は、動的環境でのフレームワークの適応性と堅牢性を示しています。
私たちの知る限り、これは、VLMベースの認識をLLM駆動型タスクの推論とモーション計画と統合する航空機の不均一システムの最初のデモンストレーションです。

要約(オリジナル)

Heterogeneous multi-robot systems show great potential in complex tasks requiring hybrid cooperation. However, traditional approaches relying on static models often struggle with task diversity and dynamic environments. This highlights the need for generalizable intelligence that can bridge high-level reasoning with low-level execution across heterogeneous agents. To address this, we propose a hierarchical framework integrating a prompted Large Language Model (LLM) and a GridMask-enhanced fine-tuned Vision Language Model (VLM). The LLM decomposes tasks and constructs a global semantic map, while the VLM extracts task-specified semantic labels and 2D spatial information from aerial images to support local planning. Within this framework, the aerial robot follows an optimized global semantic path and continuously provides bird-view images, guiding the ground robot’s local semantic navigation and manipulation, including target-absent scenarios where implicit alignment is maintained. Experiments on real-world cube or object arrangement tasks demonstrate the framework’s adaptability and robustness in dynamic environments. To the best of our knowledge, this is the first demonstration of an aerial-ground heterogeneous system integrating VLM-based perception with LLM-driven task reasoning and motion planning.

arxiv情報

著者	Haokun Liu,Zhaoqi Ma,Yunong Li,Junichiro Sugihara,Yicheng Chen,Jinjie Li,Moju Zhao
発行日	2025-06-16 05:10:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー