VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

要約

本論文では、人間中心の実環境における地図なし屋外ナビゲーションのためのマルチモーダル軌道生成・選択アルゴリズムを紹介する。このような環境には、横断歩道、芝生、縁石などの豊富な特徴が含まれており、これらは人間には容易に解釈できるが、移動ロボットには解釈できない。我々は、(1)環境特有のトラバース可能性制約を満たし、(2)横断歩道や歩道などをナビゲートしながら人間のような経路を生成する、適切な軌道を計算することを目的とする。我々の定式化では、グローバルナビゲーションのための複数の候補軌道を生成するために、トラバサビリティ制約で強化された条件付き変分オートエンコーダ（CVAE）生成モデルを用いる。我々は視覚的プロンプティングアプローチを開発し、視覚言語モデル（VLM）のゼロショット能力である意味理解と論理的推論を活用して、タスクに関する文脈情報を与えられた最適な軌道を選択する。我々は、車輪付きロボットを用いて、様々な屋外シーンで本手法を評価し、他のグローバルナビゲーションアルゴリズムと性能を比較する。実際に、4つの異なる屋外ナビゲーションシナリオにおいて、トラバース可能性制約を満足する平均20.81%、人間のようなナビゲーションの観点から28.51%の改善が観察された。

要約(オリジナル)

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model’s (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 20.81% in satisfying traversability constraints and 28.51% in terms of human-like navigation in four different outdoor navigation scenarios.

arxiv情報

著者	Daeun Song,Jing Liang,Xuesu Xiao,Dinesh Manocha
発行日	2025-04-04 00:41:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー