TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

要約

人間中心の環境における実世界のマップレス屋外ナビゲーションのためのマルチモーダル軌道生成および選択アルゴリズムを紹介します。
このような環境には、横断歩道、芝生、縁石などの豊富な特徴が含まれており、これらは人間には容易に解釈できますが、移動ロボットには解釈できません。
私たちは、(1) 環境固有の通過可能性の制約を満たす、(2) 横断歩道や歩道などを移動するときに人間のような経路を生成する、適切な軌道を計算することを目指しています。私たちの定式化では、通過可能性が強化された条件付き変分オートエンコーダ (CVAE) 生成モデルを使用します。
制約を使用して、グローバルナビゲーション用の複数の候補軌道を生成します。
私たちは視覚的なプロンプトアプローチを開発し、視覚言語モデル (VLM) の意味理解と論理的推論のゼロショット機能を活用して、タスクに関するコンテキスト情報を考慮して最適な軌道を選択します。
車輪付きロボットを使用したさまざまな屋外シーンでこの手法を評価し、他のグローバルナビゲーションアルゴリズムとパフォーマンスを比較します。
実際に、4 つの異なる屋外ナビゲーションシナリオにおいて、通過性の制約を満たすことで平均 22.07%、人間のようなナビゲーションに関しては 30.53% の改善が見られました。

要約(オリジナル)

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model’s (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 22.07% in satisfying traversability constraints and 30.53% in terms of human-like navigation in four different outdoor navigation scenarios.

arxiv情報

著者	Daeun Song,Jing Liang,Xuesu Xiao,Dinesh Manocha
発行日	2024-12-04 09:26:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー