FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

要約

嗜好ベースの強化学習（PBRL）は、事前に訓練されたロボット動作のスタイル適応に適したアプローチです。ロボットのポリシーを適応させて、元のタスクを実行できるようにしながら、人間のユーザーの好みに従うことです。
ただし、ロボット工学における適応プロセスの好みを収集することは、しばしば挑戦的で時間がかかります。
この作業では、低予防産物体制における事前に訓練されたロボットの適応を探ります。
この体制では、最近の適応アプローチは、更新された報酬モデルが新しい好みに覆われている壊滅的な報酬忘却（CRF）に苦しんでおり、エージェントが元のタスクを実行できなくなるように導いていることを示しています。
CRFを緩和するために、優先適応のモデル化を担当する少数のパラメーター（低ランクマトリックス）で元の報酬モデルを強化することを提案します。
私たちの評価は、私たちの方法が、シミュレーションベンチマークタスクと複数の現実世界のロボットタスク全体で、人間の好みにロボットの動作を効率的かつ効果的に調整できることを示しています。

要約(オリジナル)

Preference-based reinforcement learning (PbRL) is a suitable approach for style adaptation of pre-trained robotic behavior: adapting the robot’s policy to follow human user preferences while still being able to perform the original task. However, collecting preferences for the adaptation process in robotics is often challenging and time-consuming. In this work we explore the adaptation of pre-trained robots in the low-preference-data regime. We show that, in this regime, recent adaptation approaches suffer from catastrophic reward forgetting (CRF), where the updated reward model overfits to the new preferences, leading the agent to become unable to perform the original task. To mitigate CRF, we propose to enhance the original reward model with a small number of parameters (low-rank matrices) responsible for modeling the preference adaptation. Our evaluation shows that our method can efficiently and effectively adjust robotic behavior to human preferences across simulation benchmark tasks and multiple real-world robotic tasks.

arxiv情報

著者	Daniel Marta,Simon Holk,Miguel Vasco,Jens Lundell,Timon Homberger,Finn Busch,Olov Andersson,Danica Kragic,Iolanda Leite
発行日	2025-04-14 09:04:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー