AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning

要約

強化学習（RL）は、運動学的参照運動を追跡する物理ベースとロボットのキャラクターの制御を大幅に進めました。
ただし、メソッドは通常、矛盾する報酬機能の加重合計に依存しており、望ましい動作を達成するために広範なチューニングが必要です。
RLの計算コストのため、この反復プロセスは退屈で時間集約的なタスクです。
さらに、ロボット工学アプリケーションの場合、避けられないSIMからリアルへのギャップにもかかわらず、ポリシーが現実の世界でうまく機能するように、重みを選択する必要があります。
これらの課題に対処するために、報酬のトレードオフのパレートの前面にまたがる一連の重みを条件付けた単一のポリシーを訓練する多目的強化学習フレームワークを提案します。
このフレームワーク内で、トレーニング後に重みを選択して調整でき、反復時間を大幅に高速化します。
この改善されたワークフローを使用して、ロボット文字で非常に動的な動きを実行する方法を示します。
さらに、高レベルのポリシーを使用して、現在のタスクに従って重みを動的に選択するために、階層設定で重量化されたポリシーをどのように活用できるかを探ります。
多目的ポリシーがさまざまな動作のスペクトルをコードし、新しいタスクへの効率的な適応を促進することを示します。

要約(オリジナル)

Reinforcement learning (RL) has significantly advanced the control of physics-based and robotic characters that track kinematic reference motion. However, methods typically rely on a weighted sum of conflicting reward functions, requiring extensive tuning to achieve a desired behavior. Due to the computational cost of RL, this iterative process is a tedious, time-intensive task. Furthermore, for robotics applications, the weights need to be chosen such that the policy performs well in the real world, despite inevitable sim-to-real gaps. To address these challenges, we propose a multi-objective reinforcement learning framework that trains a single policy conditioned on a set of weights, spanning the Pareto front of reward trade-offs. Within this framework, weights can be selected and tuned after training, significantly speeding up iteration time. We demonstrate how this improved workflow can be used to perform highly dynamic motions with a robot character. Moreover, we explore how weight-conditioned policies can be leveraged in hierarchical settings, using a high-level policy to dynamically select weights according to the current task. We show that the multi-objective policy encodes a diverse spectrum of behaviors, facilitating efficient adaptation to novel tasks.

arxiv情報

著者	Lucas N. Alegre,Agon Serifi,Ruben Grandia,David Müller,Espen Knoop,Moritz Bächer
発行日	2025-05-29 17:41:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー