SAFE-GIL: SAFEty Guided Imitation Learning

要約

行動クローニングは模倣学習への一般的なアプローチであり、ロボットが専門のスーパーバイザーを観察して制御ポリシーを学習します。
ただし、動作の複製には「複合エラー」の問題があります。つまり、専門家のデモンストレーションから逸脱するとポリシーのエラーがさらに重なり、致命的なシステム障害につながる可能性があり、安全性が重要なアプリケーションでの使用が制限されます。
ポリシーに基づくデータ集約手法では、模倣ポリシーの展開と繰り返しのトレーニングを犠牲にして、この問題に対処できますが、これは退屈で計算量が法外にかかる可能性があります。
我々は、データ収集中に敵対的な妨害を通じて専門家を導く、ポリシー外の動作の複製手法である SAFE-GIL を提案します。
このアルゴリズムは、模倣エラーをシステムダイナミクスにおける敵対的な外乱として抽象化し、データ収集中にそれを注入して専門家を安全上の危機的な状態にさらし、修正措置を収集します。
私たちの手法は、安全性が重要な状態では専門家の行動をより厳密に再現するようにトレーニングにバイアスをかけ、それほど重要でない状態ではより大きな差異を許容します。
私たちは、自律ナビゲーションおよび自律走行タスクに関して、いくつかの動作複製手法および DAgger と私たちの手法を比較し、パフォーマンスがわずかに低下するものの、特にエラーの可能性が高いデータ量が少ない状況において、タスクの成功と安全性がより高いことを示しました。

要約(オリジナル)

Behavior Cloning is a popular approach to Imitation Learning, in which a robot observes an expert supervisor and learns a control policy. However, behavior cloning suffers from the ‘compounding error’ problem – the policy errors compound as it deviates from the expert demonstrations and might lead to catastrophic system failures, limiting its use in safety-critical applications. On-policy data aggregation methods are able to address this issue at the cost of rolling out and repeated training of the imitation policy, which can be tedious and computationally prohibitive. We propose SAFE-GIL, an off-policy behavior cloning method that guides the expert via adversarial disturbance during data collection. The algorithm abstracts the imitation error as an adversarial disturbance in the system dynamics, injects it during data collection to expose the expert to safety critical states, and collects corrective actions. Our method biases training to more closely replicate expert behavior in safety-critical states and allows more variance in less critical states. We compare our method with several behavior cloning techniques and DAgger on autonomous navigation and autonomous taxiing tasks and show higher task success and safety, especially in low data regimes where the likelihood of error is higher, at a slight drop in the performance.

arxiv情報

著者	Yusuf Umut Ciftci,Zeyuan Feng,Somil Bansal
発行日	2024-04-08 07:25:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAFE-GIL: SAFEty Guided Imitation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー