Safe MPC Alignment with Human Directional Feedback

要約

セーフティクリティカルなロボットの計画や制御では、安全制約を手動で指定したり、デモンストレーションから学習したりすることが困難な場合があります。
この記事では、ロボットが人間のオンライン方向フィードバックを使用してモデル予測制御 (MPC) ポリシーの安全制約を学習するための、認証可能な位置合わせ方法を提案します。
私たちの知る限り、これは人間のフィードバックから安全制約を学習する最初の方法です。
提案された方法は、人間の方向フィードバックが利用可能な場合、ロボットをより安全な領域に誘導する傾向があるという経験的観察に基づいています。
この方法では、学習仮説空間を更新するために人間のフィードバックの指示のみが必要です。
これは証明可能であり、学習が成功した場合の人的フィードバックの総数に上限を与えるか、仮説の仕様の誤りを宣言します。つまり、真の暗黙的な安全制約が指定された仮説空間内で見つからないことを宣言します。
数値例と 2 つのシミュレーションゲームでのユーザー調査を使用して、提案された手法を評価しました。
さらに、移動式の注水タスクを実行する現実世界の Franka ロボットアームに、提案された方法を実装してテストしました。
この結果は、私たちの方法の有効性と効率性を実証しており、人間による数十回の方向修正でロボットが安全制約を首尾よく学習できることを示しています。

要約(オリジナル)

In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this article, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from human feedback. The proposed method is based on an empirical observation: human directional feedback, when available, tends to guide the robot toward safer regions. The method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning, or declaring the hypothesis misspecification, i.e., the true implicit safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method using numerical examples and user studies in two simulation games. Additionally, we implemented and tested the proposed method on a real-world Franka robot arm performing mobile water-pouring tasks. The results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.

arxiv情報

著者	Zhixian Xie,Wenlong Zhang,Yi Ren,Zhaoran Wang,George J. Pappas,Wanxin Jin
発行日	2025-01-08 01:16:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Safe MPC Alignment with Human Directional Feedback

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー