Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

要約

さまざまな地形通過条件に対応できる、屋外環境向けの新しい自律ロボットナビゲーションアルゴリズムを紹介します。
私たちのアプローチである VLM-GroNav は、ビジョン言語モデル (VLM) を使用し、変形性や滑りやすさなどの固有の地形特性を評価するために使用される物理的な接地とそれらを統合します。
私たちは固有受容ベースのセンシングを使用しています。これにより、これらの物理的特性が直接測定され、地形の全体的な意味の理解が強化されます。
私たちの定式化では、コンテキスト内学習を使用して固有受容データによる VLM の意味的理解を基礎にし、環境とロボットのリアルタイムの物理的相互作用に基づいて通過可能性の推定値を動的に更新できるようにします。
私たちは、更新された通過可能性の推定値を使用して、ローカルとグローバルの両方のプランナーにリアルタイムの軌道再計画を通知します。
私たちは、脚式ロボット (Ghost Vision 60) と車輪付きロボット (Clearpath Husky) を使用して、さまざまな変形可能で滑りやすい地形を持つ現実世界のさまざまな屋外環境でこの方法を検証します。
実際に、ナビゲーションの成功率が最大 50% 向上するなど、最先端の方法と比べて大幅な改善が見られます。

要約(オリジナル)

We present a novel autonomous robot navigation algorithm for outdoor environments that is capable of handling diverse terrain traversability conditions. Our approach, VLM-GroNav, uses vision-language models (VLMs) and integrates them with physical grounding that is used to assess intrinsic terrain properties such as deformability and slipperiness. We use proprioceptive-based sensing, which provides direct measurements of these physical properties, and enhances the overall semantic understanding of the terrains. Our formulation uses in-context learning to ground the VLM’s semantic understanding with proprioceptive data to allow dynamic updates of traversability estimates based on the robot’s real-time physical interactions with the environment. We use the updated traversability estimations to inform both the local and global planners for real-time trajectory replanning. We validate our method on a legged robot (Ghost Vision 60) and a wheeled robot (Clearpath Husky), in diverse real-world outdoor environments with different deformable and slippery terrains. In practice, we observe significant improvements over state-of-the-art methods by up to 50% increase in navigation success rate.

arxiv情報

著者	Mohamed Elnoor,Kasun Weerakoon,Gershom Seneviratne,Ruiqi Xian,Tianrui Guan,Mohamed Khalid M Jaffar,Vignesh Rajagopal,Dinesh Manocha
発行日	2024-09-30 16:03:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー