FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

要約

Vision and-Language Navigation（VLN）タスクの願望は、さまざまなタスクにわたってナビゲーション機能をシームレスに転送できる堅牢な適応性を備えた具体化されたエージェントを開発することでした。
近年の驚くべき進歩にもかかわらず、ほとんどの方法ではデータセット固有のトレーニングが必要であり、それにより、異なるタイプの命令を含む多様なデータセット全体に一般化する機能が欠けています。
大規模な言語モデル（LLMS）は、ロボットアクションプランニングで大きな可能性を示しており、例外的な推論と一般化能力を実証しています。
このホワイトペーパーでは、VLNへの革新的な階層的アプローチであるFlexVLNを提案します。これは、監督者の学習ベースの指導フォロワーの基本的なナビゲーション能力とLLMプランナーの堅牢な一般化能力を統合し、多様なVLNデータセット全体で効果的な一般化を可能にします。
さらに、LLMプランナーによる潜在的な幻覚を軽減し、命令フォロワーの実行精度を強化するために、検証メカニズムとマルチモデル統合メカニズムが提案されています。
一般化能力を評価するために、すぐに、すぐにdomainのデータセットとしてCVDNターゲットを採用します。
FlexVLNの一般化パフォーマンスは、以前のすべての方法のパフォーマンスを大幅に超えています。

要約(オリジナル)

The aspiration of the Vision-and-Language Navigation (VLN) task has long been to develop an embodied agent with robust adaptability, capable of seamlessly transferring its navigation capabilities across various tasks. Despite remarkable advancements in recent years, most methods necessitate dataset-specific training, thereby lacking the capability to generalize across diverse datasets encompassing distinct types of instructions. Large language models (LLMs) have demonstrated exceptional reasoning and generalization abilities, exhibiting immense potential in robot action planning. In this paper, we propose FlexVLN, an innovative hierarchical approach to VLN that integrates the fundamental navigation ability of a supervised-learning-based Instruction Follower with the robust generalization ability of the LLM Planner, enabling effective generalization across diverse VLN datasets. Moreover, a verification mechanism and a multi-model integration mechanism are proposed to mitigate potential hallucinations by the LLM Planner and enhance execution accuracy of the Instruction Follower. We take REVERIE, SOON, and CVDN-target as out-of-domain datasets for assessing generalization ability. The generalization performance of FlexVLN surpasses that of all the previous methods to a large extent.

arxiv情報

著者	Siqi Zhang,Yanyuan Qiao,Qunbo Wang,Longteng Guo,Zhihua Wei,Jing Liu
発行日	2025-03-18 06:58:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー