Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

要約

私たちは、前方への転送を促進し、連続したオフラインタスクに取り組むことを壊滅的に忘れるのを軽減する実践的なパラダイムである、継続的オフライン強化学習を研究しています。
生成された擬似データを同時に再生することで以前の知識を保持する二重生成再生フレームワークを提案します。
まず、継続的学習ポリシーを拡散ベースの生成行動モデルとマルチヘッドアクション評価モデルに分離し、漸進的な範囲の多様な行動を包含するための分布表現力をポリシーが継承できるようにします。
次に、タスク条件付き拡散モデルをトレーニングして、過去のタスクの状態分布を模倣します。
生成された状態は、動作ジェネレーターからの対応する応答とペアになって、古いタスクを忠実度の高い再生サンプルで表します。
最後に、疑似サンプルと新しいタスクの実際のサンプルをインターリーブすることで、状態と動作のジェネレーターを継続的に更新して、徐々に多様な動作をモデル化し、動作のクローン作成によってマルチヘッドの批評家を正規化して忘却を軽減します。
実験では、私たちの方法が忘却の少ないより良い順方向転送を実現し、サンプル空間の忠実度の高い再生により、以前のグラウンドトゥルースデータを使用した結果に非常に近いことが実証されました。
私たちのコードは \href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO} で入手できます。

要約(オリジナル)

We study continual offline reinforcement learning, a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks. We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data. First, we decouple the continual learning policy into a diffusion-based generative behavior model and a multi-head action evaluation model, allowing the policy to inherit distributional expressivity for encompassing a progressive range of diverse behaviors. Second, we train a task-conditioned diffusion model to mimic state distributions of past tasks. Generated states are paired with corresponding responses from the behavior generator to represent old tasks with high-fidelity replayed samples. Finally, by interleaving pseudo samples with real ones of the new task, we continually update the state and behavior generators to model progressively diverse behaviors, and regularize the multi-head critic via behavior cloning to mitigate forgetting. Experiments demonstrate that our method achieves better forward transfer with less forgetting, and closely approximates the results of using previous ground-truth data due to its high-fidelity replay of the sample space. Our code is available at \href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO}.

arxiv情報

著者	Jinmei Liu,Wenbin Li,Xiangyu Yue,Shilin Zhang,Chunlin Chen,Zhi Wang
発行日	2024-04-16 15:39:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー