Flow Q-Learning

要約

本研究では、データ中の任意に複雑な行動分布をモデル化するために、表現力豊かなフロー・マッチング・ポリシーを活用する、シンプルで高性能なオフライン強化学習(RL)手法であるフローQ学習(FQL)を提案する。RLによるフローポリシーの学習は、アクション生成プロセスの反復的性質のため、厄介な問題である。我々は、値を最大化するための反復的なフロー・ポリシーを直接導くのではなく、RLを用いて表現力豊かなワンステップ・ポリシーを訓練することで、この課題に対処する。この方法によって、不安定な再帰的バックプロパゲーションを完全に回避し、テスト時にコストのかかる反復的アクション生成を排除し、なおかつ表現力をほぼ維持することができる。我々は、FQLが、オフラインRLおよびオフラインからオンラインへのRLにおける、73の困難な状態ベースおよびピクセルベースのOGBenchおよびD4RLタスクにおいて、強力な性能をもたらすことを実験的に示す。プロジェクトページ: https://seohong.me/projects/fql/

要約(オリジナル)

We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/

arxiv情報

著者	Seohong Park,Qiyang Li,Sergey Levine
発行日	2025-02-04 18:04:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Flow Q-Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー