Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

要約

マルチモーダル視覚言語変換器のコマンド理解とゼロショット認識のための大規模言語モデルの前例のない機能を基盤として、視覚言語ナビゲーション (VLN) が、ロボットへの自然言語インターフェースに向けた複数の基本的な課題に対処する効果的な方法として浮上しました。
ナビゲーション。
ただし、そのような視覚言語モデルは、基礎となる埋め込み空間の意味論的な意味が欠如しているため、本質的に脆弱です。
最近開発された勾配ベースの最適化手順を使用して、視覚言語モデルのまったく異なる画像や無関係なテキストの表現に一致するように、画像を気づかないうちに変更できることを実証します。
これに基づいて、最小限の数の画像を敵対的に変更できるアルゴリズムを開発し、多数のランドマークを必要とするコマンドに対してロボットが選択したルートをたどるようにします。
我々は、最近提案された VLN システムを使用して実験的にそれを実証します。
特定のナビゲーションコマンドに対して、ロボットが大幅に異なるルートをたどるようにすることができます。
また、敵対的に変更された画像は、元の画像よりも追加のガウスノイズに対する感度がはるかに高いという事実に基づいて、そのような悪意のある変更を確実に検出する効率的なアルゴリズムも開発します。

要約(オリジナル)

Building on the unprecedented capabilities of large language models for command understanding and zero-shot recognition of multi-modal vision-language transformers, visual language navigation (VLN) has emerged as an effective way to address multiple fundamental challenges toward a natural language interface to robot navigation. However, such vision-language models are inherently vulnerable due to the lack of semantic meaning of the underlying embedding space. Using a recently developed gradient based optimization procedure, we demonstrate that images can be modified imperceptibly to match the representation of totally different images and unrelated texts for a vision-language model. Building on this, we develop algorithms that can adversarially modify a minimal number of images so that the robot will follow a route of choice for commands that require a number of landmarks. We demonstrate that experimentally using a recently proposed VLN system; for a given navigation command, a robot can be made to follow drastically different routes. We also develop an efficient algorithm to detect such malicious modifications reliably based on the fact that the adversarially modified images have much higher sensitivity to added Gaussian noise than the original images.

arxiv情報

著者	Chashi Mahiul Islam,Shaeke Salman,Montasir Shams,Xiuwen Liu,Piyush Kumar
発行日	2024-07-10 06:32:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー