IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

要約

模倣学習（IL）と強化学習（RL）はそれぞれ、ロボット工学ポリシー学習に明確な利点を提供します。ILは、デモンストレーションから安定した学習を提供し、RLは探索を通じて一般化を促進します。
ILベースのプリトレーニングに続いてRLベースの微調整を使用した既存のロボット学習アプローチは有望ですが、この2段階の学習パラダイムは、RL微調整フェーズ中の不安定性とサンプル効率の低下に苦しむことがよくあります。
この作業では、ポリシーの微調整のためにリル内のインターリーブ補強学習と模倣学習を導入します。これは、複数のRL更新後にILの更新を定期的に注入するため、ILの安定性と微調整プロセス全体を通してより効率的な探索のための専門家データのガイダンスから利益を得ることができます。
ILとRLにはさまざまな最適化目標が含まれるため、直交部門の競合する勾配の更新を分離することにより、\ Abbr微調整中の破壊的な干渉を防ぐための勾配分離メカニズムを開発します。
さらに、厳密な分析を実施し、RLとILのインターリートILが学習を安定させ、サンプル効率を向上させる理由を明らかにしました。
家具ベンチ、オープンジム、ロボミミックなど、3つのベンチマークにわたる14のロボット操作と移動タスクに関する広範な実験は、\ abbrがサンプル効率を大幅に改善し、長距離および短距離の両方のタスクでのオンライン微調整中のパフォーマンス崩壊を緩和することができることを示しています。
In-RILは、さまざまな最先端のRLアルゴリズムと互換性のある一般的なプラグインとして、RL微調整を大幅に改善できます。たとえば、ロボミミック輸送の成功率が6.3倍改善され、12 \％から88 \％になります。
プロジェクトページ：https：//github.com/ucd-dare/in-ril。

要約(オリジナル)

Imitation learning (IL) and reinforcement learning (RL) each offer distinct advantages for robotics policy learning: IL provides stable learning from demonstrations, and RL promotes generalization through exploration. While existing robot learning approaches using IL-based pre-training followed by RL-based fine-tuning are promising, this two-step learning paradigm often suffers from instability and poor sample efficiency during the RL fine-tuning phase. In this work, we introduce IN-RIL, INterleaved Reinforcement learning and Imitation Learning, for policy fine-tuning, which periodically injects IL updates after multiple RL updates and hence can benefit from the stability of IL and the guidance of expert data for more efficient exploration throughout the entire fine-tuning process. Since IL and RL involve different optimization objectives, we develop gradient separation mechanisms to prevent destructive interference during \ABBR fine-tuning, by separating possibly conflicting gradient updates in orthogonal subspaces. Furthermore, we conduct rigorous analysis, and our findings shed light on why interleaving IL with RL stabilizes learning and improves sample-efficiency. Extensive experiments on 14 robot manipulation and locomotion tasks across 3 benchmarks, including FurnitureBench, OpenAI Gym, and Robomimic, demonstrate that \ABBR can significantly improve sample efficiency and mitigate performance collapse during online finetuning in both long- and short-horizon tasks with either sparse or dense rewards. IN-RIL, as a general plug-in compatible with various state-of-the-art RL algorithms, can significantly improve RL fine-tuning, e.g., from 12\% to 88\% with 6.3x improvement in the success rate on Robomimic Transport. Project page: https://github.com/ucd-dare/IN-RIL.

arxiv情報

著者	Dechen Gao,Hang Wang,Hanchu Zhou,Nejib Ammar,Shatadal Mishra,Ahmadreza Moradipari,Iman Soltani,Junshan Zhang
発行日	2025-05-15 16:01:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー