CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

要約

オフライン強化学習 (RL) は、事前に収集されたデータセットから効果的なポリシーを学習することを目的としています。
既存の作業のほとんどは、データ収集プロセスの改善にはあまり重点を置かず、高度な学習アルゴリズムを開発することを目的としています。
さらに、単一タスクの設定を拡張し、エージェントが複数の下流タスクを実行できるようにするタスクに依存しないデータセットを収集することさえ困難です。
この論文では、タスクに依存しないデータ収集に適応時間距離を使用して特徴空間を拡張し、最終的にマルチタスクオフライン RL の学習効率と機能を向上させる、好奇心主導の教師なしデータ収集 (CUDC) 手法を提案します。
これを達成するために、CUDC は現在の状態から k ステップの将来の状態に到達できる確率を推定し、ダイナミクスモデルが予測すべき未来のステップ数を調整します。
この適応型到達可能性メカニズムを導入すると、特徴表現を多様化することができ、エージェントは自らをナビゲートして、好奇心を持って高品質のデータを収集できます。
経験的に、CUDC は、DeepMind コントロールスイートのさまざまなダウンストリームオフライン RL タスクにおける効率と学習パフォーマンスにおいて、既存の教師なし手法を上回っています。

要約(オリジナル)

Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less emphasis on improving the data collection process. Moreover, it is even challenging to extend the single-task setting and collect a task-agnostic dataset that allows an agent to perform multiple downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection and ultimately improve learning efficiency and capabilities for multi-task offline RL. To achieve this, CUDC estimates the probability of the k-step future states being reachable from the current states, and adapts how many steps into the future that the dynamics model should predict. With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity. Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.

arxiv情報

著者	Chenyu Sun,Hangwei Qian,Chunyan Miao
発行日	2023-12-19 14:26:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー