StROL: Stabilized and Robust Online Learning from Humans

要約

今日のロボットは、現在の対話中に人間の報酬機能をオンラインで学習できます。
このリアルタイム学習には、高速だが近似的な学習ルールが必要です。
人間の行動がうるさかったり、最適ではない場合、現在の近似ではロボットの学習が不安定になる可能性があります。
したがって、この論文では、人間の報酬パラメータを推論する際の勾配降下学習ルールのロバスト性と収束特性を強化することを目指します。
ロボットの学習アルゴリズムを、人間の真の (しかし未知の) 好みが平衡点となる、人間の好みパラメータにわたる動的システムとしてモデル化します。
これにより、リアプノフ安定性解析を実行して、ロボットの学習ダイナミクスが収束する条件を導き出すことができます。
私たちが提案するアルゴリズム (StROL) は、これらの安定条件をオフラインで利用して、元の学習ダイナミクスを修正します。つまり、人間の報酬の可能性が高いと思われる魅力の領域を拡大する修正項を導入します。
実際には、修正された学習ルールは、人間が騒がしく、偏見があり、最適ではない場合でも、人間が伝えようとしていることを正しく推測できます。
シミュレーションとユーザー調査を通じて、StROL はオンライン報酬学習の最先端のアプローチよりも正確な推定値をもたらし、後悔が少ないことがわかりました。
ここでビデオをご覧ください: https://youtu.be/uDGpkvJnY8g

要約(オリジナル)

Today’s robots can learn the human’s reward function online, during the current interaction. This real-time learning requires fast but approximate learning rules; when the human’s behavior is noisy or suboptimal, today’s approximations can result in unstable robot learning. Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human’s reward parameters. We model the robot’s learning algorithm as a dynamical system over the human preference parameters, where the human’s true (but unknown) preferences are the equilibrium point. This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot’s learning dynamics converge. Our proposed algorithm (StROL) takes advantage of these stability conditions offline to modify the original learning dynamics: we introduce a corrective term that expands the basins of attraction around likely human rewards. In practice, our modified learning rule can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal. Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning. See videos here: https://youtu.be/uDGpkvJnY8g

arxiv情報

著者	Shaunak A. Mehta,Forrest Meng,Andrea Bajcsy,Dylan P. Losey
発行日	2023-08-19 00:43:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StROL: Stabilized and Robust Online Learning from Humans

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー