Adversarial Robustification via Text-to-Image Diffusion Models

要約

敵対的堅牢性は、ニューラルネットワークのエンコードが困難な特性であり、大量のトレーニングデータが必要であると従来考えられてきました。
しかし、既製のモデルを採用する最近のパラダイムでは、そのようなモデルのほとんどがもともと敵対的な堅牢性に関してトレーニングされていないにもかかわらず、トレーニングデータへのアクセスが実行不可能であるか現実的でないことがよくあります。
このペーパーでは、データを使用せずに敵対的な堅牢性を実現する、スケーラブルでモデルに依存しないソリューションを開発します。
私たちの直観では、最近のテキストから画像への拡散モデルは、ターゲットタスクを指定するために最適化できる「適応可能な」ノイズ除去装置であると考えられます。
これに基づいて、(a) 敵対的な攻撃に対する証明可能な保証を提供するノイズ除去と分類のパイプラインを開始すること、および (b) 新規のデータ変換を可能にするテキストから画像へのモデルから生成されたいくつかの合成参照画像を活用することを提案します。
適応スキーム。
私たちの実験は、事前トレーニングされたCLIPに適用されたデータフリースキームが、その多様なゼロショット分類導関数の（証明可能な）敵対的堅牢性を（精度を維持しながら）向上させることができ、完全なトレーニングデータを利用する以前のアプローチを大幅に上回っていることを示しています。
CLIP だけでなく、私たちのフレームワークが他の視覚的分類器を効率的に強化するために簡単に適用できることも実証します。

要約(オリジナル)

Adversarial robustness has been conventionally believed as a challenging property to encode for neural networks, requiring plenty of training data. In the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data. Our intuition is to view recent text-to-image diffusion models as ‘adaptable’ denoisers that can be optimized to specify target tasks. Based on this, we propose: (a) to initiate a denoise-and-classify pipeline that offers provable guarantees against adversarial attacks, and (b) to leverage a few synthetic reference images generated from the text-to-image model that enables novel adaptation schemes. Our experiments show that our data-free scheme applied to the pre-trained CLIP could improve the (provable) adversarial robustness of its diverse zero-shot classification derivatives (while maintaining their accuracy), significantly surpassing prior approaches that utilize the full training data. Not only for CLIP, we also demonstrate that our framework is easily applicable for robustifying other visual classifiers efficiently.

arxiv情報

著者	Daewon Choi,Jongheon Jeong,Huiwon Jang,Jinwoo Shin
発行日	2024-07-26 10:49:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adversarial Robustification via Text-to-Image Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー