Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks

要約

Hindsight Experience Replay（彼女）は、バイナリ報酬を備えたロボット操作タスクでサンプル効率の高いマルチゴール補強学習（RL）を達成するための最先端のアルゴリズムと広く見なされています。
彼女は、再定義された目標で軌跡を再生することにより、失敗した試みからの学習を促進します。
ただし、原則的なフレームワークがないヒューリスティックベースのリプレイメソッドに依存しています。
この制限に対処するために、シングルステップの移行に報いることに焦点を当てた新しいリプレイ戦略「次の栄養」を紹介します。
このアプローチは、特に厳格な精度要件の下で、マルチゴールマルコフ決定プロセス（MDP）を学習する際のサンプル効率と精度を大幅に向上させます。これは、複雑で正確なロボットアームタスクを実行するための重要な側面です。
単一ステップの学習がマルチゴールRLフレームワーク内での値近似を改善する方法を強調することにより、私たちの方法の有効性を実証します。
提案されたリプレイ戦略のパフォーマンスは、トレーニングに10個のランダムシードを使用して、8つの挑戦的なロボット操作タスクで評価されます。
我々の結果は、8つのタスクのうち7つのサンプル効率の大幅な改善と、6つのタスクでの成功率が高いことを示しています。
さらに、実際の実験は、学習ポリシーの実用的な実現可能性を検証し、複雑なロボットアームタスクの解決における「次の維持」の可能性を実証します。

要約(オリジナル)

Hindsight Experience Replay (HER) is widely regarded as the state-of-the-art algorithm for achieving sample-efficient multi-goal reinforcement learning (RL) in robotic manipulation tasks with binary rewards. HER facilitates learning from failed attempts by replaying trajectories with redefined goals. However, it relies on a heuristic-based replay method that lacks a principled framework. To address this limitation, we introduce a novel replay strategy, ‘Next-Future’, which focuses on rewarding single-step transitions. This approach significantly enhances sample efficiency and accuracy in learning multi-goal Markov decision processes (MDPs), particularly under stringent accuracy requirements — a critical aspect for performing complex and precise robotic-arm tasks. We demonstrate the efficacy of our method by highlighting how single-step learning enables improved value approximation within the multi-goal RL framework. The performance of the proposed replay strategy is evaluated across eight challenging robotic manipulation tasks, using ten random seeds for training. Our results indicate substantial improvements in sample efficiency for seven out of eight tasks and higher success rates in six tasks. Furthermore, real-world experiments validate the practical feasibility of the learned policies, demonstrating the potential of ‘Next-Future’ in solving complex robotic-arm tasks.

arxiv情報

著者	Fikrican Özgür,René Zurbrügg,Suryansh Kumar
発行日	2025-04-15 14:45:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー