Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

要約

最近のオープンボキャブラリーロボットマッピング手法は、事前にトレーニングされた視覚言語機能を備えた高密度の幾何学的マップを強化します。
これらのマップを使用すると、特定の言語概念についてクエリを実行したときに点ごとの顕著性マップを予測できますが、大規模な環境やオブジェクトレベルを超えた抽象的なクエリは依然としてかなりのハードルとなり、最終的には言語に基づいたロボットナビゲーションを制限します。
この研究では、言語に基づいたロボットナビゲーションのための階層的オープン語彙 3D シーングラフマッピングアプローチである HOV-SG を紹介します。
オープン語彙ビジョン基盤モデルを活用して、まず最先端のオープン語彙セグメントレベルのマップを 3D で取得し、その後、床、部屋、およびオブジェクトの概念で構成される 3D シーングラフ階層を構築します。
語彙の特徴。
私たちのアプローチは、複数階建ての建物を表現することができ、クロスフロアボロノイグラフを使用してロボットによる建物の横断を可能にします。
HOV-SG は 3 つの異なるデータセットで評価され、高密度のオープン語彙マップと比較して表現サイズを 75% 削減しながら、オブジェクト、部屋、フロアレベルでのオープン語彙の意味論的精度において以前のベースラインを上回っています。
HOV-SG の有効性と一般化機能を証明するために、現実世界のマルチストレージ環境内で成功した長期言語条件付きロボットナビゲーションを紹介します。
コードとトライアルビデオデータは http://hovsg.github.io/ で提供されています。

要約(オリジナル)

Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, large-scale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting language-grounded robotic navigation. In this work, we present HOV-SG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with open-vocabulary features. Our approach is able to represent multi-story buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within real-world multi-storage environments. We provide code and trial video data at http://hovsg.github.io/.

arxiv情報

著者	Abdelrhman Werby,Chenguang Huang,Martin Büchner,Abhinav Valada,Wolfram Burgard
発行日	2024-03-26 16:36:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー