CIVIL: Causal and Intuitive Visual Imitation Learning

要約

今日のロボットは、人間の例を模倣することで新しいタスクを学びます。
ただし、視覚模倣学習に対するこの標準的なアプローチは根本的に限られています。ロボットは、人間が何をするかを観察しますが、人間がそれらの行動を選択する理由ではありません。
ロボット学習者は、人間の決定にその要因を考慮せずに、データを誤って解釈し、環境が変化したときにタスクを実行できないことがよくあります。
したがって、私たちは視点の変化を提案します。ロボットがどのような行動をとるべきかを示すためだけに人間の教師に尋ねる代わりに、人間はマーカーと言語プロンプトを使用してタスクに関連する機能を示すことができます。
提案されたアルゴリズムであるCivilは、この増強されたデータを活用して、ロボットの視覚的観察をフィルタリングし、人間の行動を因果的に情報に因果的に伝える機能表現を抽出します。
市民は、これらの因果関係を適用して、視覚的な注意散漫に混乱することなく人間の行動をエミュレートするトランスベースのポリシーを訓練します。
私たちのシミュレーション、現実世界の実験、およびユーザー調査は、市民で訓練されたロボットが、より少ない人間のデモから学習し、特に以前に見えなかったシナリオで最先端のベースラインよりも優れたパフォーマンスを発揮できることを示しています。
プロジェクトWebサイトのビデオを参照してください：https：//civil2025.github.io

要約(オリジナル)

Today’s robots learn new tasks by imitating human examples. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding the features that factor into the human’s decisions, robot learners often misinterpret the data and fail to perform the task when the environment changes. We therefore propose a shift in perspective: instead of asking human teachers just to show what actions the robot should take, we also enable humans to indicate task-relevant features using markers and language prompts. Our proposed algorithm, CIVIL, leverages this augmented data to filter the robot’s visual observations and extract a feature representation that causally informs human actions. CIVIL then applies these causal features to train a transformer-based policy that emulates human behaviors without being confused by visual distractors. Our simulations, real-world experiments, and user study demonstrate that robots trained with CIVIL can learn from fewer human demonstrations and perform better than state-of-the-art baselines, especially in previously unseen scenarios. See videos at our project website: https://civil2025.github.io

arxiv情報

著者	Yinlong Dai,Robert Ramirez Sanchez,Ryan Jeronimus,Shahabedin Sagheb,Cara M. Nunez,Heramb Nemlekar,Dylan P. Losey
発行日	2025-04-24 22:08:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CIVIL: Causal and Intuitive Visual Imitation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー