Adversarial Attacks on Robotic Vision Language Action Models

要約

エンドツーエンド制御のためのビジョン言語アクションモデル（VLA）の出現は、10億パラメータースケールでマルチモーダル感覚入力を融合できるようにすることにより、ロボット工学の分野を再構築しています。
VLASの機能は、主にそのアーキテクチャに由来します。これは、多くの場合、フロンティアの大手言語モデル（LLM）に基づいています。
ただし、LLMは敵対的な誤用の影響を受けやすく、ロボット工学に固有の重要な物理的リスクを考えると、VLAがこれらの脆弱性を継承する程度に関して疑問が残っています。
これらの懸念に動機付けられて、この作業では、VLA制御ロボットに対する敵対的な攻撃の研究を開始します。
私たちの主なアルゴリズムの貢献は、VLAに対する完全な制御権限を取得するためのLLMジェイルブレイク攻撃の適応と適用です。
展開の先頭に一度適用されるテキスト攻撃は、一般的に使用されるVLAのアクション空間の完全な到達可能性を促進し、しばしばより長い視野にわたって持続することがわかります。
これは、現実の世界での攻撃は、害の概念に意味的にリンクする必要がないため、LLMの脱獄文献とは大きく異なります。
すべてのコードをhttps://github.com/eliotjones1/robogcgで利用できるようにします。

要約(オリジナル)

The emergence of vision-language-action models (VLAs) for end-to-end control is reshaping the field of robotics by enabling the fusion of multimodal sensory inputs at the billion-parameter scale. The capabilities of VLAs stem primarily from their architectures, which are often based on frontier large language models (LLMs). However, LLMs are known to be susceptible to adversarial misuse, and given the significant physical risks inherent to robotics, questions remain regarding the extent to which VLAs inherit these vulnerabilities. Motivated by these concerns, in this work we initiate the study of adversarial attacks on VLA-controlled robots. Our main algorithmic contribution is the adaptation and application of LLM jailbreaking attacks to obtain complete control authority over VLAs. We find that textual attacks, which are applied once at the beginning of a rollout, facilitate full reachability of the action space of commonly used VLAs and often persist over longer horizons. This differs significantly from LLM jailbreaking literature, as attacks in the real world do not have to be semantically linked to notions of harm. We make all code available at https://github.com/eliotjones1/robogcg .

arxiv情報

著者	Eliot Krzysztof Jones,Alexander Robey,Andy Zou,Zachary Ravichandran,George J. Pappas,Hamed Hassani,Matt Fredrikson,J. Zico Kolter
発行日	2025-06-03 19:43:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adversarial Attacks on Robotic Vision Language Action Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー