DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

要約

言語の指示に従って目に見えない環境を移動することは、自律的に身体化されたエージェントにとって困難な作業です。
強力な表現機能を備えた事前トレーニング済みの視覚および言語モデルは、VLN で広く使用されています。
ただし、それらのほとんどは Web クロールされた汎用データセットでトレーニングされているため、VLN タスクに使用するとかなりのドメインギャップが生じます。
この問題に対処するために、モデルに依存しない新しいドメイン認識プロンプト学習 (DAP) フレームワークを提案します。
VLN タスクで特定のオブジェクトレベルおよびシーンレベルのクロスモーダルアライメントを事前トレーニング済みモデルに装備するために、DAP は低コストのプロンプトチューニングパラダイムを適用して、ドメイン内画像セマンティクスを抽出するためのソフトビジュアルプロンプトを学習します。
具体的には、まず、CLIP モデルを利用して、ドメイン内の画像とテキストのペアのセットを生成します。
次に、事前トレーニング済みモデルのビジュアルエンコーダーの入力空間にソフトビジュアルプロンプトを導入します。
DAP は、ドメイン内の視覚的知識を事前トレーニング済みモデルの視覚的エンコーダーに効率的な方法で注入します。
R2R と REVERIE の両方に関する実験結果は、既存の最先端の方法と比較して DAP の優位性を示しています。

要約(オリジナル)

Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic domain-aware prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.

arxiv情報

著者	Ting Liu,Yue Hu,Wansen Wu,Youkai Wang,Kai Xu,Quanjun Yin
発行日	2023-12-28 13:59:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー