Task-conditioned adaptation of visual features in multi-task policy learning

要約

さまざまなタスクにうまく対処することは自律エージェントの中核的な能力であり、そのためには基礎となる意思決定戦略を柔軟に適応させる必要があり、この研究で議論しているように、基礎となる認識モジュールも適応させる必要があります。
類似した議論は、人間の視覚システムです。視覚システムは、トップダウン信号を使用して、現在のタスクによって決定された注意を集中させます。
同様に、この研究では、マルチタスクポリシー学習のコンテキストで、特定の下流タスクを条件とした事前トレーニング済みの大規模ビジョンモデルを適応させます。
事前にトレーニングされた重みを微調整する必要のないタスク条件付きアダプターを、動作の複製でトレーニングされ、複数のタスクに対処できる単一のポリシーと組み合わせて導入します。
タスクの埋め込みに基づいてポリシーとビジュアルアダプターを条件付けします。これらは、タスクが既知である場合は推論時に選択でき、あるいは一連のサンプルデモンストレーションから推論できます。
この目的を達成するために、新しい最適化ベースの推定器を提案します。
CortexBench ベンチマークのさまざまなタスクでこの方法を評価し、既存の作業と比較して、単一のポリシーで対処できることを示します。
特に、視覚的な特徴を適応させることが重要な設計上の選択であり、この方法が視覚的なデモンストレーションを行うことで目に見えないタスクに一般化されることを示します。

要約(オリジナル)

Successfully addressing a wide variety of tasks is a core ability of autonomous agents, which requires flexibly adapting the underlying decision-making strategies and, as we argue in this work, also adapting the underlying perception modules. An analogical argument would be the human visual system, which uses top-down signals to focus attention determined by the current task. Similarly, in this work, we adapt pre-trained large vision models conditioned on specific downstream tasks in the context of multi-task policy learning. We introduce task-conditioned adapters that do not require finetuning any pre-trained weights, combined with a single policy trained with behavior cloning and capable of addressing multiple tasks. We condition the policy and visual adapters on task embeddings, which can be selected at inference if the task is known, or alternatively inferred from a set of example demonstrations. To this end, we propose a new optimization-based estimator. We evaluate the method on a wide variety of tasks of the CortexBench benchmark and show that, compared to existing work, it can be addressed with a single policy. In particular, we demonstrate that adapting visual features is a key design choice and that the method generalizes to unseen tasks given visual demonstrations.

arxiv情報

著者	Pierre Marza,Laetitia Matignon,Olivier Simonin,Christian Wolf
発行日	2024-02-12 15:57:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Task-conditioned adaptation of visual features in multi-task policy learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー