DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

要約

最近、Roboticsコミュニティは、より大きく、より多様なデータセットを蓄積し、ジェネラリストのロボットポリシーをトレーニングしています。
ただし、これらのポリシーはさまざまなタスクにわたって強力な平均パフォーマンスを実現しますが、多くの場合、個々の専門的なタスクでパフォーマンスが低く、新たに獲得したタスク固有のデータをさらに調整する必要があります。
タスク固有のデータと、共同トレーニングを介して大規模な以前のデータセットの慎重にキュレーションされたサブセットと組み合わせると、より良い専門的なポリシーが生成される可能性がありますが、データを単純に選択すると、実際には下流のパフォーマンスに害を及ぼす可能性があります。
これに対処するために、データモデルのパラダイムに基づいて構築されたポリシー駆動型のデータ選択フレームワークであるDatamilを紹介します。これは、ポリシー自体を使用してパフォーマンスを最も改善するデータポイント自体を識別するために、エンドツーエンドの方法でデータ選択に関する理由です。
品質の人間の概念を使用してデータをフィルタリングする標準的なプラクティス（例えば、セマンティックまたは視覚的類似性に基づいて）とは異なり、Datamilはタスクの成功のためにデータの選択を直接最適化し、それを劣化させるデータを削除しながらポリシーを強化するデータを選択することができます。
選択中に環境で高価なロールアウトを実行することを避けるために、タスク固有のデータで新しいサロゲート損失関数を使用して、パフォーマンスを低下させることなく現実世界でDatamilを使用できるようにします。
60を超えるシミュレーションと現実世界の操作タスクのスイートでアプローチを検証します。最も顕著に、オープンX編集データセットからの成功したデータ選択を、成功率と複数のベースラインでの優れたパフォーマンスの一貫した向上を実現することを示しています。
私たちの結果は、ロボット工学における大規模な以前のデータセットの可能性を解き放つためのエンドツーエンドのパフォーマンス対応データ選択の重要性を強調しています。
詳細については、https：//robin-lab.cs.utexas.edu/datamodels4imitation/

要約(オリジナル)

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist robot policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that enhance the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we use a novel surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks – most notably showing successful data selection from the Open X-Embodiment datasets-demonstrating consistent gains in success rates and superior performance over multiple baselines. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics. More information at https://robin-lab.cs.utexas.edu/datamodels4imitation/

arxiv情報

著者	Shivin Dass,Alaa Khaddaj,Logan Engstrom,Aleksander Madry,Andrew Ilyas,Roberto Martín-Martín
発行日	2025-05-14 17:55:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー