The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning

要約

ロボット操作における柔軟なポリシー表現と模倣学習のための新しいアプローチである離散時間ガウスプロセス（MIDIGAP）の混合を提示します。
MIDIGAPは、カメラの観測のみを使用してわずか5つのデモンストレーションから学習することを可能にし、幅広い挑戦的なタスクにわたって一般化できます。
コーヒーの製造、ドアの開くなどの非常に制約のある動き、スパチュラでのスクープなどのダイナミックなアクション、マグカップの吊り下げなどのマルチモーダルタスクなどの長年の行動に優れています。
Midigapは、CPUでこれらのタスクを1分未満で学習し、大きなデータセットに直線的にスケールします。
また、衝突信号やロボットの運動学的制約などの証拠を使用して、推論時間ステアリング用の豊富なツールスイートを開発します。
このステアリングにより、障害物の回避や交差体政策移転など、新しい一般化能力が可能になります。
Midigapは、多様な少数のショット操作ベンチマークで最先端のパフォーマンスを実現しています。
制約付きのRLBenchタスクでは、ポリシーの成功を76パーセントポイント改善し、軌道コストを67％削減します。
マルチモーダルタスクでは、ポリシーの成功を48パーセントポイント改善し、サンプルの効率を20倍に増やします。相互拡大転送では、2倍のポリシーの成功を超えています。
コードをhttps://midigap.cs.uni-freiburg.deで公開しています。

要約(オリジナル)

We present Mixture of Discrete-time Gaussian Processes (MiDiGap), a novel approach for flexible policy representation and imitation learning in robot manipulation. MiDiGap enables learning from as few as five demonstrations using only camera observations and generalizes across a wide range of challenging tasks. It excels at long-horizon behaviors such as making coffee, highly constrained motions such as opening doors, dynamic actions such as scooping with a spatula, and multimodal tasks such as hanging a mug. MiDiGap learns these tasks on a CPU in less than a minute and scales linearly to large datasets. We also develop a rich suite of tools for inference-time steering using evidence such as collision signals and robot kinematic constraints. This steering enables novel generalization capabilities, including obstacle avoidance and cross-embodiment policy transfer. MiDiGap achieves state-of-the-art performance on diverse few-shot manipulation benchmarks. On constrained RLBench tasks, it improves policy success by 76 percentage points and reduces trajectory cost by 67%. On multimodal tasks, it improves policy success by 48 percentage points and increases sample efficiency by a factor of 20. In cross-embodiment transfer, it more than doubles policy success. We make the code publicly available at https://midigap.cs.uni-freiburg.de.

arxiv情報

著者	Jan Ole von Hartz,Adrian Röfer,Joschka Boedecker,Abhinav Valada
発行日	2025-05-06 08:27:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー