Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

特に、人間とロボットの相互作用やロボット自身の行動によって環境が頻繁に変化する場合はなおさらである。従来の手法は一般的に静的なシーンを想定しているため、絶えず変化する実世界での適用には限界がある。これらの限界を克服するために、我々は、動的なオープン語彙3Dシーングラフと、長期的なタスク実行のための言語ガイド付きタスクプランニングモジュールを活用する、新しいモバイル操作フレームワークであるDovSGを発表する。DovSGは、RGB-Dシーケンスを入力とし、視覚言語モデル（VLM）をオブジェクト検出に利用することで、高レベルのオブジェクト意味特徴を得る。セグメンテーションされた物体に基づいて、低レベルの空間的関係のために構造化された3Dシーングラフが生成される。さらに、シーングラフを局所的に更新する効率的なメカニズムにより、ロボットはシーンの完全な再構成を必要とせずに、インタラクション中にグラフの一部を動的に調整することができる。このメカニズムは動的な環境において特に有用であり、ロボットがシーンの変化に継続的に適応し、長期的なタスクの実行を効果的にサポートすることを可能にする。我々は、様々な程度の手動修正を伴う実環境において我々のシステムを検証し、その有効性と長期タスクにおける優れた性能を実証した。私たちのプロジェクトページはhttps://bjhyzj.github.io/dovsg-web。

要約(オリジナル)

Enabling mobile robots to perform long-term tasks in dynamic real-world environments is a formidable challenge, especially when the environment changes frequently due to human-robot interactions or the robot’s own actions. Traditional methods typically assume static scenes, which limits their applicability in the continuously changing real world. To overcome these limitations, we present DovSG, a novel mobile manipulation framework that leverages dynamic open-vocabulary 3D scene graphs and a language-guided task planning module for long-term task execution. DovSG takes RGB-D sequences as input and utilizes vision-language models (VLMs) for object detection to obtain high-level object semantic features. Based on the segmented objects, a structured 3D scene graph is generated for low-level spatial relationships. Furthermore, an efficient mechanism for locally updating the scene graph, allows the robot to adjust parts of the graph dynamically during interactions without the need for full scene reconstruction. This mechanism is particularly valuable in dynamic environments, enabling the robot to continually adapt to scene changes and effectively support the execution of long-term tasks. We validated our system in real-world environments with varying degrees of manual modifications, demonstrating its effectiveness and superior performance in long-term tasks. Our project page is available at: https://bjhyzj.github.io/dovsg-web.

arxiv情報

著者	Zhijie Yan,Shufei Li,Zuoxu Wang,Lixiu Wu,Han Wang,Jun Zhu,Lijiang Chen,Jihong Liu
発行日	2025-02-04 14:28:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー