THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

要約

効果的な大規模な実世界のロボットアプリケーションを実現するには、ロボットポリシーが環境条件の変化にどの程度適応しているかを評価する必要があります。
残念なことに、研究の大部分は、トレーニング設定と非常に似ているか、あるいは同一の環境でロボットのパフォーマンスを評価しています。
我々は、20 の多様な操作タスクを備えた新しいシミュレーションベンチマークである THE COLOSSEUM を紹介します。これにより、環境摂動の 12 軸にわたるモデルの体系的な評価が可能になります。
これらの摂動には、物体、テーブルトップ、背景の色、質感、サイズの変化が含まれます。
また、照明、気を散らすもの、カメラのポーズも変更します。
THE COLOSSEUM を使用して 4 つの最先端の操作モデルを比較し、これらの摂動要因によって成功率が 30 ～ 50% 低下することを明らかにしました。
複数の摂動が同時に適用されると、成功率は $\geq$75% 低下します。
私たちは、ディストラクタオブジェクトの数、ターゲットオブジェクトの色、または照明条件の変更が、モデルのパフォーマンスを最も低下させる摂動であることを特定しています。
結果の生態学的妥当性を検証するために、シミュレーションでの結果が現実世界の実験での同様の摂動と相関していることを示します ($\bar{R}^2 = 0.614$)。
私たちは他の人が THE COLOSSEUM を使用できるようにソースコードを公開し、現実世界の摂動を再現するために使用されるオブジェクトを 3D プリントするコードもリリースします。
最終的には、THE COLOSSEUM が、操作の一般化を体系的に改善するモデリングの決定を特定するためのベンチマークとして機能することを願っています。
詳細については、https://robot-colosseum.github.io/ を参照してください。

要約(オリジナル)

To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enables systematical evaluation of models across 12 axes of environmental perturbations. These perturbations include changes in color, texture, and size of objects, table-tops, and backgrounds; we also vary lighting, distractors, and camera pose. Using THE COLOSSEUM, we compare 4 state-of-the-art manipulation models to reveal that their success rate degrades between 30-50% across these perturbation factors. When multiple perturbations are applied in unison, the success rate degrades $\geq$75%. We identify that changing the number of distractor objects, target object color, or lighting conditions are the perturbations that reduce model performance the most. To verify the ecological validity of our results, we show that our results in simulation are correlated ($\bar{R}^2 = 0.614$) to similar perturbations in real-world experiments. We open source code for others to use THE COLOSSEUM, and also release code to 3D print the objects used to replicate the real-world perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark to identify modeling decisions that systematically improve generalization for manipulation. See https://robot-colosseum.github.io/ for more details.

arxiv情報

著者	Wilbert Pumacay,Ishika Singh,Jiafei Duan,Ranjay Krishna,Jesse Thomason,Dieter Fox
発行日	2024-02-13 03:25:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー