Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

要約

人間は物理的な相互作用を利用してロボットアームを教えることができます。
この物理的な相互作用は、タスク、ユーザー、ロボットがこれまでに学習した内容に応じて複数の形式を取ります。
最先端のアプローチは、単一のモダリティからの学習に重点を置くか、ロボットが人間の意図するタスクに関する事前情報を持っていると仮定して、複数のインタラクションタイプを組み合わせます。
対照的に、この論文では、実証、修正、好みからの学習を統合するアルゴリズム形式主義を導入します。
私たちのアプローチでは、人間がロボットに教えたいタスクについて何も仮定しません。
代わりに、人間の入力を近くの代替入力と比較することによって、報酬モデルをゼロから学習します。
まず、人間のデモンストレーション、修正、好みに一致する報酬モデルのアンサンブルをトレーニングする損失関数を導出します。
フィードバックの種類と順序は人間の教師次第です。ロボットがこのフィードバックを受動的または能動的に収集できるようにします。
次に、制約付き最適化を適用して、学習した報酬を望ましいロボットの軌道に変換します。
シミュレーションとユーザー調査を通じて、特にロボットが新しい目標や予期せぬ目標に直面した場合、私たちが提案するアプローチが既存のベースラインよりも物理的な人間の相互作用から操作タスクをより正確に学習することを実証します。
ユーザー調査のビデオは、https://youtu.be/FSUJsTYvEKU からご覧いただけます。

要約(オリジナル)

Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human’s intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human’s inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human’s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

arxiv情報

著者	Shaunak A. Mehta,Dylan P. Losey
発行日	2024-01-09 19:42:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー