SAFE: Multitask Failure Detection for Vision-Language-Action Models

要約

ビジョン言語アクションモデル（VLA）は、多様な一連の操作タスクで有望なロボット行動を示していますが、すぐに新しいタスクに展開された場合、限られた成功率を達成します。
これらのポリシーが環境と安全に対話できるようにするには、ロボットが停止、バックトラック、または助けを求めることができるようにタイムリーなアラートを与える障害検出器が必要です。
ただし、既存の障害検出器は、1つまたはいくつかの特定のタスクでのみトレーニングおよびテストされますが、VLAは、目に見えないタスクや新しい環境でも障害を一般化および検出するために検出器を必要とします。
この論文では、マルチタスク障害検出問題を紹介し、VLASなどのジェネラリストロボットポリシーの障害検出器であるSafeを提案します。
VLA機能空間を分析し、VLAがタスクの成功と失敗について十分な高レベルの知識を持っていることがわかります。これは、さまざまなタスクにわたって一般的です。
この洞察に基づいて、VLAの内部機能から学習し、タスクの障害の可能性を示す単一のスカラーを予測するために安全に設計します。
SAFEは、成功したロールアウトと失敗したロールアウトの両方で訓練されており、目に見えないタスクで評価されます。
SAFEは、さまざまなポリシーアーキテクチャと互換性があります。
Simulated環境と実際の環境の両方で、OpenVLA、$ \ PI_0 $、および$ \ PI_0 $ -FASTでテストします。
安全性と多様なベースラインを比較し、安全性が最先端の障害検出パフォーマンスと、コンフォーマル予測を使用した精度と検出時間の最良のトレードオフを達成することを示しています。
より質的な結果は、https：//vla-safe.github.io/で見つけることができます。

要約(オリジナル)

While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out-of-the-box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existing failure detectors are trained and tested only on one or a few specific tasks, while VLAs require the detector to generalize and detect failures also in unseen tasks and novel environments. In this paper, we introduce the multitask failure detection problem and propose SAFE, a failure detector for generalist robot policies such as VLAs. We analyze the VLA feature space and find that VLAs have sufficient high-level knowledge about task success and failure, which is generic across different tasks. Based on this insight, we design SAFE to learn from VLA internal features and predict a single scalar indicating the likelihood of task failure. SAFE is trained on both successful and failed rollouts, and is evaluated on unseen tasks. SAFE is compatible with different policy architectures. We test it on OpenVLA, $\pi_0$, and $\pi_0$-FAST in both simulated and real-world environments extensively. We compare SAFE with diverse baselines and show that SAFE achieves state-of-the-art failure detection performance and the best trade-off between accuracy and detection time using conformal prediction. More qualitative results can be found at https://vla-safe.github.io/.

arxiv情報

著者	Qiao Gu,Yuanliang Ju,Shengxiang Sun,Igor Gilitschenski,Haruki Nishimura,Masha Itkina,Florian Shkurti
発行日	2025-06-11 16:59:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAFE: Multitask Failure Detection for Vision-Language-Action Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー