Subtask-Aware Visual Reward Learning from Segmented Demonstrations

要約

強化学習（RL）エージェントは、さまざまなロボットタスクにわたって可能性を示しています。
ただし、彼らは依然として人間の工学的報酬機能に大きく依存しており、多くの場合、実世界の設定では利用できないことが多いターゲット行動情報への広範な試行錯誤とアクセスが必要です。
このペーパーでは、レッドを紹介します。デモンストレーションからの報酬学習セグメンテーションでは、最小限の監督でアクションフリーのビデオを活用する新しい報酬学習フレームワークです。
具体的には、Redsはさまざまなソースからのサブタスクにセグメント化されたビデオデモンストレーションを採用し、これらのセグメントをグラウンドトラムの報酬として扱います。
ビデオセグメントとそれらの対応するサブタスクに条件付けられた密な報酬関数をトレーニングして、同等のポリシーの不変比比距離を最小化することにより、グラウンドトゥルース報酬信号との整合を確保します。
さらに、対照的な学習目標を採用して、ビデオ表現をサブタスクと整列させ、オンラインインタラクション中に正確なサブタスク推論を確保します。
私たちの実験は、赤がメタ世界での複雑なロボット操作タスクのベースライン方法を大幅に上回ることを示しており、家具ベンチの家具アセンブリなど、より挑戦的な現実世界のタスクを最小限に抑えます。
さらに、Redsは、目に見えないタスクやロボットの実施形態への一般化を促進し、多様な環境でのスケーラブルな展開の可能性を強調しています。

要約(オリジナル)

Reinforcement Learning (RL) agents have demonstrated their potential across various robotic tasks. However, they still heavily rely on human-engineered reward functions, requiring extensive trial-and-error and access to target behavior information, often unavailable in real-world settings. This paper introduces REDS: REward learning from Demonstration with Segmentations, a novel reward learning framework that leverages action-free videos with minimal supervision. Specifically, REDS employs video demonstrations segmented into subtasks from diverse sources and treats these segments as ground-truth rewards. We train a dense reward function conditioned on video segments and their corresponding subtasks to ensure alignment with ground-truth reward signals by minimizing the Equivalent-Policy Invariant Comparison distance. Additionally, we employ contrastive learning objectives to align video representations with subtasks, ensuring precise subtask inference during online interactions. Our experiments show that REDS significantly outperforms baseline methods on complex robotic manipulation tasks in Meta-World and more challenging real-world tasks, such as furniture assembly in FurnitureBench, with minimal human intervention. Moreover, REDS facilitates generalization to unseen tasks and robot embodiments, highlighting its potential for scalable deployment in diverse environments.

arxiv情報

著者	Changyeon Kim,Minho Heo,Doohyun Lee,Jinwoo Shin,Honglak Lee,Joseph J. Lim,Kimin Lee
発行日	2025-02-28 01:25:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Subtask-Aware Visual Reward Learning from Segmented Demonstrations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー