Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments

要約

現実世界のナビゲーションでは、閉まったドア、移動した物体、予測不可能なエンティティなどの予期せぬ障害物に対処することがよくあります。
ただし、主流の視覚と言語のナビゲーション (VLN) タスクでは、通常、命令が固定および事前定義されたナビゲーショングラフに何の障害もなく完全に一致していることを前提としています。
この仮定は、実際のナビゲーショングラフと与えられた指示の潜在的な矛盾を見落としており、屋内エージェントと屋外エージェントの両方に重大な障害を引き起こす可能性があります。
この問題に対処するために、ナビゲーショングラフと視覚的観測の両方を変更することで、さまざまな障害物を R2R データセットに統合し、革新的なデータセットとタスク、予期せぬ障害物を含む R2R (R2R-UNO) を導入します。
R2R-UNO には、VLN 研究のための命令と現実の不一致を生成するさまざまなタイプと数の経路障害物が含まれています。
R2R-UNO の実験により、最先端の VLN 手法は、このような不一致に直面すると必然的に重大な課題に遭遇することが明らかになり、適応的にナビゲートするのではなく、指示に厳格に従うことが示されています。
そこで、我々は、ObVLN (Obstructed VLN) と呼ばれる新しい方法を提案します。これには、エージェントが障害のある環境に効果的に適応できるようにするためのカリキュラムトレーニング戦略と仮想グラフ構築が含まれます。
経験的な結果は、ObVLN が障害物のないシナリオで堅牢なパフォーマンスを維持するだけでなく、予期しない障害物がある場合でも大幅なパフォーマンスの利点を達成することを示しています。

要約(オリジナル)

Real-world navigation often involves dealing with unexpected obstructions such as closed doors, moved objects, and unpredictable entities. However, mainstream Vision-and-Language Navigation (VLN) tasks typically assume instructions perfectly align with the fixed and predefined navigation graphs without any obstructions. This assumption overlooks potential discrepancies in actual navigation graphs and given instructions, which can cause major failures for both indoor and outdoor agents. To address this issue, we integrate diverse obstructions into the R2R dataset by modifying both the navigation graphs and visual observations, introducing an innovative dataset and task, R2R with UNexpected Obstructions (R2R-UNO). R2R-UNO contains various types and numbers of path obstructions to generate instruction-reality mismatches for VLN research. Experiments on R2R-UNO reveal that state-of-the-art VLN methods inevitably encounter significant challenges when facing such mismatches, indicating that they rigidly follow instructions rather than navigate adaptively. Therefore, we propose a novel method called ObVLN (Obstructed VLN), which includes a curriculum training strategy and virtual graph construction to help agents effectively adapt to obstructed environments. Empirical results show that ObVLN not only maintains robust performance in unobstructed scenarios but also achieves a substantial performance advantage with unexpected obstructions.

arxiv情報

著者	Haodong Hong,Sen Wang,Zi Huang,Qi Wu,Jiajun Liu
発行日	2024-07-31 08:55:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー