Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation

要約

移動操作ロボットの機能を最大限に活用するには、大規模な未踏の環境で長期的なタスクを自律的に実行できることが不可欠です。
大規模言語モデル (LLM) は任意のタスクに関する創発的な推論スキルを示していますが、既存の研究は主に探索された環境に焦点を当てており、通常は単独のナビゲーションタスクまたは操作タスクに焦点を当てています。
この研究では、MoMa-LLM を提案します。これは、環境が探索されるにつれて動的に更新される、オープン語彙シーングラフから派生した構造化表現内に言語モデルを基礎付ける新しいアプローチです。
これらの表現をオブジェクト中心のアクション空間としっかりと交互に配置します。
物体検出を考慮すると、結果として得られるアプローチはゼロショットでオープンな語彙であり、モバイル操作や家庭用ロボット作業の範囲に容易に拡張できます。
私たちは、現実的な大規模な屋内環境における新しいセマンティック対話型検索タスクにおける MoMa-LLM の有効性を実証します。
シミュレーションと現実世界の両方における広範な実験で、従来のベースラインや最先端のアプローチと比較して検索効率が大幅に向上していること、およびより抽象的なタスクへの適用可能性を示しています。
コードは http://moma-llm.cs.uni-freiburg.de で公開されています。

要約(オリジナル)

To fully leverage the capabilities of mobile manipulation robots, it is imperative that they are able to autonomously execute long-horizon tasks in large unexplored environments. While large language models (LLMs) have shown emergent reasoning skills on arbitrary tasks, existing work primarily concentrates on explored environments, typically focusing on either navigation or manipulation tasks in isolation. In this work, we propose MoMa-LLM, a novel approach that grounds language models within structured representations derived from open-vocabulary scene graphs, dynamically updated as the environment is explored. We tightly interleave these representations with an object-centric action space. Given object detections, the resulting approach is zero-shot, open-vocabulary, and readily extendable to a spectrum of mobile manipulation and household robotic tasks. We demonstrate the effectiveness of MoMa-LLM in a novel semantic interactive search task in large realistic indoor environments. In extensive experiments in both simulation and the real world, we show substantially improved search efficiency compared to conventional baselines and state-of-the-art approaches, as well as its applicability to more abstract tasks. We make the code publicly available at http://moma-llm.cs.uni-freiburg.de.

arxiv情報

著者	Daniel Honerkamp,Martin Büchner,Fabien Despinoy,Tim Welschehold,Abhinav Valada
発行日	2024-07-31 13:59:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー