TransFool: An Adversarial Attack against Neural Machine Translation Models

要約

ディープニューラルネットワークは、敵対的攻撃として知られる入力の小さな摂動に対して脆弱であることがわかっています。
この論文では、敵対的攻撃に対するニューラル機械翻訳 (NMT) モデルの脆弱性を調査し、TransFool と呼ばれる新しい攻撃アルゴリズムを提案します。
NMT モデルを騙すために、TransFool は複数項の最適化問題と勾配投影ステップに基づいて構築されています。
言語モデルの埋め込み表現を統合することにより、クリーンなサンプルとの高いレベルの意味的類似性を維持する、ソース言語で流暢な敵対的サンプルを生成します。
実験結果は、さまざまな翻訳タスクと NMT アーキテクチャにおいて、ホワイトボックス攻撃により、元の文と敵対的な文の間の意味上の類似性が高いままであるにもかかわらず、翻訳の品質が大幅に低下する可能性があることを示しています。
さらに、TransFool が未知のターゲットモデルに転送可能であることを示します。
最後に、自動評価と人間による評価に基づいて、TransFool は、ホワイトボックス設定とブラックボックス設定の両方で、既存の攻撃と比較して、成功率、意味論的な類似性、および流暢性の点で改善をもたらします。
したがって、TransFool を使用すると、NMT モデルの脆弱性をより適切に特徴付けることができ、現実のアプリケーション向けに強力な防御メカニズムとより堅牢な NMT システムを設計する必要性が概説されます。

要約(オリジナル)

Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.

arxiv情報

著者	Sahar Sadrizadeh,Ljiljana Dolamic,Pascal Frossard
発行日	2023-06-16 13:24:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TransFool: An Adversarial Attack against Neural Machine Translation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー