Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

モバイルロボットが動的な現実世界の環境で長期的なタスクを実行できるようにすることは、特に人間とロボットの相互作用やロボット自身の動作によって環境が頻繁に変化する場合には、非常に困難な課題です。
従来の手法は通常、静的なシーンを想定しているため、継続的に変化する現実世界への適用性が制限されます。
これらの制限を克服するために、動的オープン語彙 3D シーングラフと長期タスク実行のための言語ガイド付きタスク計画モジュールを活用する新しいモバイル操作フレームワークである DovSG を紹介します。
DovSG は、RGB-D シーケンスを入力として受け取り、オブジェクト検出にビジョン言語モデル (VLM) を利用して、高レベルのオブジェクトの意味論的特徴を取得します。
セグメント化されたオブジェクトに基づいて、低レベルの空間関係について構造化された 3D シーングラフが生成されます。
さらに、シーングラフをローカルに更新する効率的なメカニズムにより、ロボットはシーン全体を再構築することなく、インタラクション中にグラフの一部を動的に調整できます。
このメカニズムは動的な環境で特に価値があり、ロボットがシーンの変化に継続的に適応し、長期的なタスクの実行を効果的にサポートできるようになります。
私たちは、さまざまな程度の手動変更を加えて実際の環境でシステムを検証し、長期的なタスクにおけるその有効性と優れたパフォーマンスを実証しました。
私たちのプロジェクトページは https://BJHYZJ.github.io/DoviSG から入手できます。

要約(オリジナル)

Enabling mobile robots to perform long-term tasks in dynamic real-world environments is a formidable challenge, especially when the environment changes frequently due to human-robot interactions or the robot’s own actions. Traditional methods typically assume static scenes, which limits their applicability in the continuously changing real world. To overcome these limitations, we present DovSG, a novel mobile manipulation framework that leverages dynamic open-vocabulary 3D scene graphs and a language-guided task planning module for long-term task execution. DovSG takes RGB-D sequences as input and utilizes vision-language models (VLMs) for object detection to obtain high-level object semantic features. Based on the segmented objects, a structured 3D scene graph is generated for low-level spatial relationships. Furthermore, an efficient mechanism for locally updating the scene graph, allows the robot to adjust parts of the graph dynamically during interactions without the need for full scene reconstruction. This mechanism is particularly valuable in dynamic environments, enabling the robot to continually adapt to scene changes and effectively support the execution of long-term tasks. We validated our system in real-world environments with varying degrees of manual modifications, demonstrating its effectiveness and superior performance in long-term tasks. Our project page is available at: https://BJHYZJ.github.io/DoviSG.

arxiv情報

著者	Zhijie Yan,Shufei Li,Zuoxu Wang,Lixiu Wu,Han Wang,Jun Zhu,Lijiang Chen,Jihong Liu
発行日	2024-10-17 10:06:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー