Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

モバイルロボットが動的な現実世界環境で長期的なタスクを実行できるようにすることは、特に人間とロボットの相互作用やロボット自身の行動により環境が頻繁に変化する場合、恐るべき課題です。
従来の方法は通常、静的なシーンを想定しており、これにより、継続的に変化する現実世界での適用性が制限されます。
これらの制限を克服するために、動的なオープンボキャブラリー3Dシーングラフを活用する新しいモバイル操作フレームワークであるDOVSGと、長期タスク実行のための言語誘導タスク計画モジュールを提示します。
DOVSGはRGB-Dシーケンスを入力として取り、オブジェクト検出にVision-Language Models（VLMS）を利用して、高レベルのオブジェクトセマンティック機能を取得します。
セグメント化されたオブジェクトに基づいて、低レベルの空間関係のために構造化された3Dシーングラフが生成されます。
さらに、シーングラフをローカルに更新するための効率的なメカニズムにより、ロボットはフルシーンの再構成を必要とせずに、相互作用中にグラフの部分を動的に調整できます。
このメカニズムは、動的環境で特に価値があり、ロボットがシーンの変更に継続的に適応し、長期的なタスクの実行を効果的にサポートできるようになります。
私たちは、さまざまな程度の手動修正で現実世界の環境でシステムを検証し、その有効性と長期的なタスクにおける優れたパフォーマンスを実証しました。
プロジェクトページは、https：//bjhyzj.github.io/dovsg-webで入手できます。

要約(オリジナル)

Enabling mobile robots to perform long-term tasks in dynamic real-world environments is a formidable challenge, especially when the environment changes frequently due to human-robot interactions or the robot’s own actions. Traditional methods typically assume static scenes, which limits their applicability in the continuously changing real world. To overcome these limitations, we present DovSG, a novel mobile manipulation framework that leverages dynamic open-vocabulary 3D scene graphs and a language-guided task planning module for long-term task execution. DovSG takes RGB-D sequences as input and utilizes vision-language models (VLMs) for object detection to obtain high-level object semantic features. Based on the segmented objects, a structured 3D scene graph is generated for low-level spatial relationships. Furthermore, an efficient mechanism for locally updating the scene graph, allows the robot to adjust parts of the graph dynamically during interactions without the need for full scene reconstruction. This mechanism is particularly valuable in dynamic environments, enabling the robot to continually adapt to scene changes and effectively support the execution of long-term tasks. We validated our system in real-world environments with varying degrees of manual modifications, demonstrating its effectiveness and superior performance in long-term tasks. Our project page is available at: https://bjhyzj.github.io/dovsg-web.

arxiv情報

著者	Zhijie Yan,Shufei Li,Zuoxu Wang,Lixiu Wu,Han Wang,Jun Zhu,Lijiang Chen,Jihong Liu
発行日	2025-02-18 13:48:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー