ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

要約

継続的なロボット制御のためにフローマッチングポリシーのファミリーを微調整するシンプルでありながら効果的なオンライン強化学習（RL）フレームワークであるReinflowを提案します。
厳密なRL理論から導き出されたラインフローは、学習可能なノイズをフローポリシーの決定論的パスに注入し、正確かつ単純な尤度計算のために、流れを離散時間マルコフプロセスに変換します。
この変換により、探索が促進され、トレーニングの安定性が保証され、ラインフローが整流フロー[35]やショートカットモデル[19]を含む多様なフローモデルバリアントを微調整できるようにします。
視覚的な入力とまばらな報酬を備えた長老の計画を含む、代表的な移動および操作タスクのリネフローをベンチマークします。
修正フローポリシーのエピソード報酬は、最先端の拡散RL微調整法dppoと比較して、除去ステップと壁の時間の82.63％を節約しながら、格子運動の挑戦で微調整した後、135.36％の平均正味成長を獲得しました[43]。
状態および視覚操作タスクのショートカットモデルポリシーの成功率は、4つまたは1つの除去ステップでラインフローで微調整した後、平均純増加を達成しました。
プロジェクトWebページ：https：//reinflow.github.io/

要約(オリジナル)

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy’s deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/

arxiv情報

著者	Tonghe Zhang,Chao Yu,Sichang Su,Yu Wang
発行日	2025-05-29 02:18:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー