AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

要約

視覚言語ナビゲーションは、ロボットが自然言語の指示に基づいて現実的な環境内を移動できるようにするタスクです。
これまでの研究は主に静的な設定に焦点を当ててきましたが、現実世界のナビゲーションでは、多くの場合、動的な人間の障害物と戦わなければなりません。
そこで、このギャップを狭めることを目的とした、Adaptive Visual Language Navigation (AdaVLN) と呼ばれるタスクの拡張を提案します。
AdaVLN では、動的に動く人間の障害物が存在する複雑な 3D 屋内環境をロボットがナビゲートする必要があり、現実世界を模倣したナビゲーションタスクにさらに複雑さが加わります。
このタスクの探索をサポートするために、AdaVLN シミュレーターと AdaR2R データセットも紹介します。
AdaVLN シミュレーターを使用すると、完全にアニメーション化された人間のモデルを、Matterport3D などの一般的なデータセットに直接簡単に組み込むことができます。
また、ナビゲーションタスクとシミュレーターの両方に「フリーズタイム」メカニズムを導入します。これにより、エージェント推論中に世界状態の更新が一時停止され、異なるハードウェア間での公平な比較と実験の再現性が可能になります。
私たちは、このタスクに関するいくつかのベースラインモデルを評価し、AdaVLN によってもたらされた固有の課題を分析し、VLN 研究におけるシミュレーションと現実のギャップを埋める可能性を実証します。

要約(オリジナル)

Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a ‘freeze-time’ mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.

arxiv情報

著者	Dillon Loh,Tomasz Bednarz,Xinxing Xia,Frank Guan
発行日	2024-11-27 17:36:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー