Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

要約

最近のロボット工学では、視覚言語アクション (VLA) モデルが変革的なアプローチとして登場し、エンドツーエンドの学習フレームワーク内で視覚と言語の入力を統合することでロボットが複雑なタスクを実行できるようになります。
VLA モデルは重要な機能を提供しますが、新たな攻撃対象領域も導入するため、敵対的な攻撃に対して脆弱になります。
これらの脆弱性はほとんど調査されていないため、このペーパーでは、VLA ベースのロボットシステムの堅牢性を体系的に定量化します。
ロボット実行特有の要求を認識し、私たちの攻撃目標はロボットシステムに固有の空間的および機能的特性をターゲットとしています。
特に、空間基盤を活用してロボットの動作を不安定にする非ターゲットの位置認識攻撃目標と、ロボットの軌道を操作する標的型攻撃目標を導入します。
さらに、小さくてカラフルなパッチをカメラの視野内に配置する敵対的パッチ生成アプローチを設計し、デジタル環境と物理環境の両方で効果的に攻撃を実行します。
私たちの評価では、タスクの成功率が著しく低下しており、一連のシミュレートされたロボットタスク全体で最大 100% 低下しており、現在の VLA アーキテクチャにおける重大なセキュリティギャップが浮き彫りになっています。
この研究では、これらの脆弱性を明らかにし、実用的な評価指標を提案することで、VLA ベースのロボットシステムの安全性の理解と強化の両方を進め、物理世界への展開に先立って堅牢な防御戦略を開発する必要性を強調しています。

要約(オリジナル)

Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework. While VLA models offer significant capabilities, they also introduce new attack surfaces, making them vulnerable to adversarial attacks. With these vulnerabilities largely unexplored, this paper systematically quantifies the robustness of VLA-based robotic systems. Recognizing the unique demands of robotic execution, our attack objectives target the inherent spatial and functional characteristics of robotic systems. In particular, we introduce an untargeted position-aware attack objective that leverages spatial foundations to destabilize robotic actions, and a targeted attack objective that manipulates the robotic trajectory. Additionally, we design an adversarial patch generation approach that places a small, colorful patch within the camera’s view, effectively executing the attack in both digital and physical environments. Our evaluation reveals a marked degradation in task success rates, with up to a 100\% reduction across a suite of simulated robotic tasks, highlighting critical security gaps in current VLA architectures. By unveiling these vulnerabilities and proposing actionable evaluation metrics, this work advances both the understanding and enhancement of safety for VLA-based robotic systems, underscoring the necessity for developing robust defense strategies prior to physical-world deployments.

arxiv情報

著者	Taowen Wang,Dongfang Liu,James Chenhao Liang,Wenhao Yang,Qifan Wang,Cheng Han,Jiebo Luo,Ruixiang Tang
発行日	2024-11-22 03:16:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー