Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

要約

人はロボットの行動に対して様々な嗜好を持っている。このような嗜好を理解し推論するために、ロボットは、ロボットの行動がユーザーの嗜好とどの程度一致しているかを記述する報酬関数を学習することを目指している。ロボットの行動に関する優れた表現は、ユーザーがロボットに自分の好みを教えるのに必要な時間と労力を大幅に削減することができる。生データから学習された特徴には意味的な意味がなく、ユーザデータから学習された特徴は、ユーザに面倒なラベリングプロセスを要求する。我々の重要な洞察は、ロボットをカスタマイズするタスクを与えられたユーザは、探索的探索を通じてラベルを作成する内発的動機付けを持つということである。この新しいデータソースである探索行動を利用するために、我々は探索行動からの対比学習（CLEA）を提案し、ユーザが気にする特徴と一致する軌道特徴を学習する。我々は、Kuriロボットを用いたオープンエンドシグナルデザイン活動(N=25)でユーザが行った探索行動からCLEA特徴を学習し、別のユーザセット(N=42)を用いた2回目のユーザ研究によってCLEA特徴を評価した。CLEA特徴量は、完全性、単純性、最小性、説明可能性という4つの指標において、ユーザの嗜好を引き出す際に自己教師付き特徴量を上回った。

要約(オリジナル)

People have a variety of preferences for how robots behave. To understand and reason about these preferences, robots aim to learn a reward function that describes how aligned robot behaviors are with a user’s preferences. Good representations of a robot’s behavior can significantly reduce the time and effort required for a user to teach the robot their preferences. Specifying these representations — what ‘features’ of the robot’s behavior matter to users — remains a difficult problem; Features learned from raw data lack semantic meaning and features learned from user data require users to engage in tedious labeling processes. Our key insight is that users tasked with customizing a robot are intrinsically motivated to produce labels through exploratory search; they explore behaviors that they find interesting and ignore behaviors that are irrelevant. To harness this novel data source of exploratory actions, we propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. We learned CLEA features from exploratory actions users performed in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated CLEA features through a second user study with a different set of users (N=42). CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.

arxiv情報

著者	Nathaniel Dennler,Stefanos Nikolaidis,Maja Matarić
発行日	2025-01-02 17:26:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Contrastive Learning from Exploratory Actions: Leveraging Natural Interactions for Preference Elicitation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー