NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

要約

推論は当初から人工知能の中心的なトピックでした。
分散表現とニューラルネットワークに関する最近の進歩により、自然言語推論の最先端のパフォーマンスが向上し続けています。
ただし、モデルが結論に達するために実際の推論を実行するのか、それとも偽の相関に依存するのかは未解決の問題のままです。
敵対的攻撃は、被害者モデルのアキレス腱を評価するのに役立つ重要なツールであることが証明されています。
この研究では、論理形式主義に基づいて攻撃モデルを開発するという基本的な問題を調査します。
私たちは、アリストテレスの三段論法にまで遡り、自然言語推論のために密接に開発された古典論理形式である自然論理を中心とした体系的な攻撃を実行するための NatLog Attack を提案します。
提案されたフレームワークは、ラベル保持攻撃とラベル反転攻撃の両方を実行します。
既存の攻撃モデルと比較して、NatLog Attack は被害者モデルへのアクセスが少なくても、より優れた敵対的な例を生成できることを示します。
被害モデルは、ラベル反転設定ではより脆弱であることがわかります。
NatLog Attack は、既存および将来の NLI モデルの能力を主要な観点から調査するツールを提供します。推論の望ましい特性を理解するために、より多くのロジックベースの攻撃がさらに研究されることを期待しています。

要約(オリジナル)

Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles’ heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle’s syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models’ capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.

arxiv情報

著者	Zi’ou Zheng,Xiaodan Zhu
発行日	2023-07-06 08:32:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー