Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

要約

ロボットが新しい環境やタスクに効果的に導入されるためには、介入中に人間が表現するフィードバックを理解できる必要がある。このフィードバックは、望ましくない行動を修正したり、追加の好みを示したりすることができます。既存の手法では、繰り返しインタラクションを行う必要があるか、事前に報酬の特徴を知っていることが前提となっており、データ効率が悪く、新しいタスクに移行することは困難である。我々は、人間のタスクをオブジェクト中心のサブタスクで記述し、物理的介入を特定のオブジェクトとの関係で解釈することで、これらの仮定を緩和する。我々の手法であるObject Preference Adaptation (OPA)は、2つの重要な段階から構成されている：1）多様な行動を生み出す基本方針を事前に学習させる、2）人間のフィードバックに従ってオンラインで更新する。高速かつシンプルな適応の鍵は、エージェントとオブジェクト間の一般的な相互作用のダイナミクスを固定し、オブジェクト固有のプリファレンスのみを更新することにあります。私たちの適応はオンラインで行われ、人間の介入は1回のみ（ワンショット）であり、訓練時には見られなかった新しい行動を生み出します。高価な人間のデモンストレーションの代わりに、安価な合成データで訓練された我々のポリシーは、物理的な7自由度ロボットの現実的なタスクにおいて、人間の摂動に正しく適応する。ビデオ、コード、補足資料が提供される。

要約(オリジナル)

For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new tasks. We relax these assumptions by describing human tasks in terms of object-centric sub-tasks and interpreting physical interventions in relation to specific objects. Our method, Object Preference Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating according to human feedback. The key to our fast, yet simple adaptation is that general interaction dynamics between agents and objects are fixed, and only object-specific preferences are updated. Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training. Trained on cheap synthetic data instead of expensive human demonstrations, our policy correctly adapts to human perturbations on realistic tasks on a physical 7DOF robot. Videos, code, and supplementary material are provided.

arxiv情報

著者	Alvin Shek,Bo Ying Su,Rui Chen,Changliu Liu
発行日	2023-06-02 09:37:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー