Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

要約

環境と相互作用する身体化強化学習 (RL) エージェントの場合、目に見えない視覚的観察に対してポリシーを迅速に適応させることが望ましいですが、ゼロショット適応能力を達成することは、RL の文脈では困難な問題であると考えられています。
この問題に対処するために、我々は、事前訓練された視覚言語モデルと視覚的プロンプトのセットを利用する新しいコントラストプロンプトアンサンブル（ConPE）フレームワークを提示します。これにより、身体的人間が遭遇する広範囲の環境的および物理的変化に対する効率的な政策学習と適応が可能になります。
エージェント。
具体的には、視覚言語モデル上で複数の視覚的プロンプトを使用して、ガイド付き注意ベースのアンサンブルアプローチを考案し、堅牢な状態表現を構築します。
各プロンプトは、エージェントの自己中心的な認識と観察に大きな影響を与える個々の領域要因に関して対照的に学習されます。
特定のタスクについて、注意ベースのアンサンブルとポリシーが共同で学習されるため、結果の状態表現はさまざまなドメインに一般化されるだけでなく、タスクの学習用に最適化されます。
実験を通じて、ConPE が、AI2THOR でのナビゲーション、自己中心的メタワールドでの操作、CARLA での自動運転などのいくつかの具体化されたエージェントタスクにおいて他の最先端のアルゴリズムを上回るパフォーマンスを示し、同時にポリシーの学習と適応のサンプル効率も向上することを示しました。
。

要約(オリジナル)

For embodied reinforcement learning (RL) agents interacting with the environment, it is desirable to have rapid policy adaptation to unseen visual observations, but achieving zero-shot adaptation capability is considered as a challenging problem in the RL context. To address the problem, we present a novel contrastive prompt ensemble (ConPE) framework which utilizes a pretrained vision-language model and a set of visual prompts, thus enabling efficient policy learning and adaptation upon a wide range of environmental and physical changes encountered by embodied agents. Specifically, we devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. Each prompt is contrastively learned in terms of an individual domain factor that significantly affects the agent’s egocentric perception and observation. For a given task, the attention-based ensemble and policy are jointly learned so that the resulting state representations not only generalize to various domains but are also optimized for learning the task. Through experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation.

arxiv情報

著者	Wonje Choi,Woo Kyung Kim,SeungHyun Kim,Honguk Woo
発行日	2024-12-16 06:53:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー