RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation

要約

モデルベースのプランナーとコントローラーは、さまざまな目的を効率的に最適化し、長期的なタスクに一般化できるため、複雑な操作問題を解決するためによく使用されます。
ただし、モデルの忠実度によって制限があり、多くの場合、展開中に失敗が発生します。
ロボットがそのような障害から回復できるようにするために、階層型強化学習を使用して別の回復ポリシーを学習することを提案します。
回復ポリシーは、感覚観察に基づいて障害が検出されたときにトリガーされ、公称モデルベースのコントローラーを使用してロボットがタスクを完了できる状態にロボットを戻そうとします。
RecoveryChaining と呼ばれる私たちのアプローチでは、ハイブリッドアクションスペースを使用します。モデルベースのコントローラーが追加の \emph{nominal} オプションとして提供され、これにより回復ポリシーが回復方法、いつ公称コントローラーに切り替えるか、どのコントローラーに切り替えるかを決定できます。
\emph{報酬が少ない}のに切り替えます。
私たちは、報酬がまばらな 3 つのマルチステップ操作タスクでアプローチを評価します。このタスクでは、ベースラインによって学習されたポリシーよりもはるかに堅牢な回復ポリシーが学習されます。
最後に、シミュレーションで学習した回復ポリシーを物理ロボットに転送することに成功し、私たちの方法によるシミュレーションからリアルへの転送の実現可能性を実証しました。

要約(オリジナル)

Model-based planners and controllers are commonly used to solve complex manipulation problems as they can efficiently optimize diverse objectives and generalize to long horizon tasks. However, they are limited by the fidelity of their model which oftentimes leads to failures during deployment. To enable a robot to recover from such failures, we propose to use hierarchical reinforcement learning to learn a separate recovery policy. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task using the nominal model-based controllers. Our approach, called RecoveryChaining, uses a hybrid action space, where the model-based controllers are provided as additional \emph{nominal} options which allows the recovery policy to decide how to recover, when to switch to a nominal controller and which controller to switch to even with \emph{sparse rewards}. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines. Finally, we successfully transfer recovery policies learned in simulation to a physical robot to demonstrate the feasibility of sim-to-real transfer with our method.

arxiv情報

著者	Shivam Vats,Devesh K. Jha,Maxim Likhachev,Oliver Kroemer,Diego Romeres
発行日	2024-10-17 19:14:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー