FDPP: Fine-tune Diffusion Policy with Human Preference

要約

人間のデモンストレーションから模倣学習することで、ロボットは複雑な操作タスクを実行できるようになり、最近大きな成功を収めています。
ただし、これらの手法は、新しい好みや環境の変化に行動を適応させるのに苦労することがよくあります。
これらの制限に対処するために、私たちは人間の好みによる拡散政策の微調整 (FDPP) を提案します。
FDPP は、好みに基づいた学習を通じて報酬関数を学習します。
この報酬は、強化学習 (RL) で事前トレーニングされたポリシーを微調整するために使用され、その結果、元のタスクを解決しながら、事前トレーニングされたポリシーを人間の新しい好みに合わせることになります。
さまざまなロボットタスクと環境設定にわたる実験により、FDPP がパフォーマンスを損なうことなくポリシーの動作を効果的にカスタマイズできることが実証されました。
さらに、微調整中にカルバック・ライブラー (KL) 正則化を組み込むと、過剰適合が防止され、初期ポリシーのコンピテンシーの維持に役立つことを示します。

要約(オリジナル)

Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback-Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.

arxiv情報

著者	Yuxin Chen,Devesh K. Jha,Masayoshi Tomizuka,Diego Romeres
発行日	2025-01-14 17:15:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FDPP: Fine-tune Diffusion Policy with Human Preference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー