Vision-Language Navigation with Embodied Intelligence: A Survey

要約

人工知能の分野における長期的なビジョンとして、身体化知能の中核目標は、エージェントと環境の知覚、理解、および対話能力を向上させることです。
視覚言語ナビゲーション (VLN) は、身体化されたインテリジェンスを実現するための重要な研究経路として、エージェントが自然言語を使用して人間と効果的にコミュニケーションし、指示を受け取って理解し、最終的には視覚情報に頼って正確なナビゲーションを実現する方法を探ることに焦点を当てています。
VLN は、人工知能、自然言語処理、コンピュータービジョン、ロボット工学を統合します。
この分野は技術的な課題に直面していますが、人間とコンピュータの相互作用などの応用の可能性を示しています。
しかし、VLN は言語理解からアクション実行までのプロセスが複雑なため、視覚情報と言語指示の調整、汎化能力の向上など、多くの課題に直面しています。
この調査は、VLN の研究の進捗状況を体系的にレビューし、身体化されたインテリジェンスを備えた VLN の研究の方向性を詳しく説明します。
そのシステムアーキテクチャと手法と一般的に使用されるベンチマークデータセットに基づく研究を詳細にまとめた後、現在の研究が直面している問題と課題を包括的に分析し、この分野の将来の発展の方向性を探り、研究者に実用的な参考資料を提供することを目的としています。

要約(オリジナル)

As a long-term vision in the field of artificial intelligence, the core goal of embodied intelligence is to improve the perception, understanding, and interaction capabilities of agents and the environment. Vision-language navigation (VLN), as a critical research path to achieve embodied intelligence, focuses on exploring how agents use natural language to communicate effectively with humans, receive and understand instructions, and ultimately rely on visual information to achieve accurate navigation. VLN integrates artificial intelligence, natural language processing, computer vision, and robotics. This field faces technical challenges but shows potential for application such as human-computer interaction. However, due to the complex process involved from language understanding to action execution, VLN faces the problem of aligning visual information and language instructions, improving generalization ability, and many other challenges. This survey systematically reviews the research progress of VLN and details the research direction of VLN with embodied intelligence. After a detailed summary of its system architecture and research based on methods and commonly used benchmark datasets, we comprehensively analyze the problems and challenges faced by current research and explore the future development direction of this field, aiming to provide a practical reference for researchers.

arxiv情報

著者	Peng Gao,Peng Wang,Feng Gao,Fei Wang,Ruyue Yuan
発行日	2024-02-22 05:45:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-Language Navigation with Embodied Intelligence: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー