BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation

要約

ビジョンと言語のナビゲーション (VLN) の既存のアプローチは、主に個別のビューに対するクロスモーダル推論に基づいています。
ただし、このスキームは、単一のビュー内のオブジェクトが不完全であり、ビュー間で観測が重複しているため、エージェントの空間的および数値的推論を妨げる可能性があります。
潜在的な解決策は、個別のビューを統合された鳥瞰図にマッピングすることです。これにより、部分的および重複した観測を集約できます。
既存のメトリックマップはこの目標を達成できますが、表現力の低いセマンティクス (通常は事前定義されたラベルなど) とマップサイズの制限に悩まされ、エージェントの言語基盤と長期的な計画能力が弱まります。
ロボティクスコミュニティに着想を得て、ハイブリッドトポロジメトリックマップを VLN に導入しました。VLN では、トポロジマップが長期計画に使用され、メトリックマップが短期推論に使用されます。
より表現力豊かな深い機能を使用したマッピングを超えて、ハイブリッドマップを介して事前トレーニングフレームワークをさらに設計し、言語に基づいたマップ表現を学習します。これにより、クロスモーダルグラウンディングが強化され、最終的な言語ガイドナビゲーションの目標が容易になります。
広範な実験により、VLN のマップベースのルートの有効性が実証され、提案された方法は 3 つの VLN ベンチマークで新しい最先端技術を設定します。

要約(オリジナル)

Existing approaches for vision-and-language navigation (VLN) are mainly based on cross-modal reasoning over discrete views. However, this scheme may hamper an agent’s spatial and numerical reasoning because of incomplete objects within a single view and duplicate observations across views. A potential solution is mapping discrete views into a unified birds’s-eye view, which can aggregate partial and duplicate observations. Existing metric maps could achieve this goal, but they suffer from less expressive semantics (e.g. usually predefined labels) and limited map size, which weakens an agent’s language grounding and long-term planning ability. Inspired by the robotics community, we introduce hybrid topo-metric maps into VLN, where a topological map is used for long-term planning and a metric map for short-term reasoning. Beyond mapping with more expressive deep features, we further design a pre-training framework via the hybrid map to learn language-informed map representations, which enhances cross-modal grounding and facilitates the final language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based route for VLN, and the proposed method sets the new state-of-the-art on three VLN benchmarks.

arxiv情報

著者	Dong An,Yuankai Qi,Yangguang Li,Yan Huang,Liang Wang,Tieniu Tan,Jing Shao
発行日	2022-12-08 16:27:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BEVBert: Topo-Metric Map Pre-training for Language-guided Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー