Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

要約

ゼロショット設定の下で、連続環境における視覚言語ナビゲーション (VLN-CE) のタスクに取り組みます。
ゼロショット VLN-CE は、トレーニングのための専門家によるデモンストレーションが存在しないことと、ガイドナビゲーション前の構造的な環境が最小限であるため、特に困難です。
これらの課題に対処するために、私たちはゼロショット VLN-CE をシーケンシャルな制約を意識したサブ命令完了プロセスとして再構成する制約認識ナビゲーター (CA-Nav) を提案します。
CA-Nav は、Constraint-Aware Sub-instruction Manager (CSM) と Constraint-Aware Value Mapper (CVM) という 2 つのコアモジュールを使用して、サブ命令をナビゲーションプランに継続的に変換します。
CSM は、分解されたサブ命令の完了基準を制約として定義し、制約を認識した方法でサブ命令を切り替えることによってナビゲーションの進行状況を追跡します。
CVM は、CSM の制約に基づいて値マップをオンザフライで生成し、スーパーピクセルクラスタリングを使用してそれを改良し、ナビゲーションの安定性を向上させます。
CA-Nav は、2 つの VLN-CE ベンチマークで最先端のパフォーマンスを達成し、R2R-CE と RxR-CE の未確認の分割の検証での成功率で、以前の最良の方法をそれぞれ 12 パーセントと 13 パーセント上回りました。
さらに、CA-Nav は、さまざまな屋内シーンや指示にわたる実際のロボット展開においてその有効性を実証します。

要約(オリジナル)

We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM’s constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12 percent and 13 percent in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.

arxiv情報

著者	Kehan Chen,Dong An,Yan Huang,Rongtao Xu,Yifei Su,Yonggen Ling,Ian Reid,Liang Wang
発行日	2024-12-13 13:38:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー