X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

要約

人間のビデオは、ロボット操作ポリシーをトレーニングするためのスケーラブルな方法を提供しますが、標準の模倣学習アルゴリズムに必要なアクションラベルがありません。
既存の交差体のアプローチは、人間の動きをロボットアクションにマッピングしようとしますが、実施形態が大きく異なる場合に失敗することがよくあります。
ロボットポリシーを学習するために密集した転送可能な信号としてオブジェクトモーションを使用する実際のフレームワークであるX-SIMを提案します。
X-SIMは、RGBDヒューマンビデオからのフォトリアリックなシミュレーションを再構築し、オブジェクト中心の報酬を定義するオブジェクトの軌跡を追跡することから始めます。
これらの報酬は、シミュレーションで強化学習（RL）ポリシーを訓練するために使用されます。
学習ポリシーは、さまざまな視点と照明でレンダリングされた合成ロールアウトを使用して、画像条件付き拡散ポリシーに蒸留されます。
現実の世界に転送するために、X-SIは、展開中に実際の観測とシミュレーションを整列させるオンラインドメイン適応手法を導入します。
重要なことに、X-SIMはロボットテレオ操作データを必要としません。
2つの環境で5つの操作タスクで評価し、次のことを示します。（1）ハンドトラッキングおよびSIMからリアルのベースラインよりも平均30％を改善し、（2）データ収集時間の10倍の少ない動作と一致し、（3）新しいカメラの視点とテスト時間の変更に一般化します。
コードとビデオはhttps://portal-cornell.github.io/x-sim/で入手できます。

要約(オリジナル)

Human videos offer a scalable way to train robot manipulation policies, but lack the action labels needed by standard imitation learning algorithms. Existing cross-embodiment approaches try to map human motion to robot actions, but often fail when the embodiments differ significantly. We propose X-Sim, a real-to-sim-to-real framework that uses object motion as a dense and transferable signal for learning robot policies. X-Sim starts by reconstructing a photorealistic simulation from an RGBD human video and tracking object trajectories to define object-centric rewards. These rewards are used to train a reinforcement learning (RL) policy in simulation. The learned policy is then distilled into an image-conditioned diffusion policy using synthetic rollouts rendered with varied viewpoints and lighting. To transfer to the real world, X-Si introduces an online domain adaptation technique that aligns real and simulated observations during deployment. Importantly, X-Sim does not require any robot teleoperation data. We evaluate it across 5 manipulation tasks in 2 environments and show that it: (1) improves task progress by 30% on average over hand-tracking and sim-to-real baselines, (2) matches behavior cloning with 10x less data collection time, and (3) generalizes to new camera viewpoints and test-time changes. Code and videos are available at https://portal-cornell.github.io/X-Sim/.

arxiv情報

著者	Prithwish Dan,Kushal Kedia,Angela Chao,Edward Weiyi Duan,Maximus Adrian Pace,Wei-Chiu Ma,Sanjiban Choudhury
発行日	2025-05-11 19:04:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー