Lana: A Language-Capable Navigator for Instruction Following and Generation

要約

最近、ロボットエージェントがナビゲーションの指示に従うことを伴う視覚言語ナビゲーション (VLN) が大きな進歩を遂げています。
ただし、既存の文献では、指示をアクションに解釈することに最も重点が置かれており、「ばかげた」道案内エージェントしか提供されていません。
この記事では、言語対応のナビゲーションエージェントである LANA を考案します。LANA は、人間が作成したナビゲーションコマンドを実行できるだけでなく、ルートの説明を人間に提供することもできます。
これは、1 つのモデルだけで命令の追従と生成を同時に学習することによって実現されます。
より具体的には、ルートと言語のエンコード用にそれぞれ 2 つのエンコーダーが構築され、アクション予測と命令生成用にそれぞれ 2 つのデコーダーによって共有されます。これにより、クロスタスクの知識を活用し、タスク固有の特性をキャプチャします。
事前トレーニングと微調整を通じて、命令の追跡と生成の両方が最適化の目標として設定されます。
最近の高度なタスク固有のソリューションと比較して、LANA は命令に従うこととルート記述の両方でより優れたパフォーマンスを達成し、複雑さはほぼ半分であることを経験的に確認します。
さらに、言語生成機能を備えた LANA は、人間にその行動を説明し、人間の道案内を支援することができます。
この作業は、より信頼でき、社会的にインテリジェントなナビゲーションロボットを構築するための将来の取り組みを促進することが期待されています。

要約(オリジナル)

Recently, visual-language navigation (VLN) — entailing robot agents to follow navigation instructions — has shown great advance. However, existing literature put most emphasis on interpreting instructions into actions, only delivering ‘dumb’ wayfinding agents. In this article, we devise LANA, a language-capable navigation agent which is able to not only execute human-written navigation commands, but also provide route descriptions to humans. This is achieved by simultaneously learning instruction following and generation with only one single model. More specifically, two encoders, respectively for route and language encoding, are built and shared by two decoders, respectively, for action prediction and instruction generation, so as to exploit cross-task knowledge and capture task-specific characteristics. Throughout pretraining and fine-tuning, both instruction following and generation are set as optimization objectives. We empirically verify that, compared with recent advanced task-specific solutions, LANA attains better performances on both instruction following and route description, with nearly half complexity. In addition, endowed with language generation capability, LANA can explain to humans its behaviors and assist human’s wayfinding. This work is expected to foster future efforts towards building more trustworthy and socially-intelligent navigation robots.

arxiv情報

著者	Xiaohan Wang,Wenguan Wang,Jiayi Shao,Yi Yang
発行日	2023-03-15 07:21:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lana: A Language-Capable Navigator for Instruction Following and Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー