Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

要約

以前の研究では、ブラックボックスモデルの機能が完全な確率で盗まれる可能性があることが確認されています。
ただし、より実用的なハードラベル設定では、既存の方法が壊滅的なパフォーマンスの低下に悩まされていることがわかります。
これは、確率予測に豊富な情報が不足していることと、ハードラベルによって引き起こされるオーバーフィッティングが原因であると主張しています。
この目的のために、2 つの消去ベースのモジュールで構成される \emph{black-box dissector} と呼ばれる新しいハードラベルモデル盗用方法を提案します。
1 つは、被害者モデルのハードラベルに隠されている情報容量を増やすように設計された CAM 主導の消去戦略です。
もう 1 つは、代替モデルのソフトラベルを利用してオーバーフィッティングを軽減する、ランダム消去ベースの自己知識蒸留モジュールです。
広く使用されている 4 つのデータセットでの広範な実験では、私たちの方法が最先端の方法よりも優れており、最大で $8.27\%$ 改善されていることが一貫して実証されています。
また、実際の API と防御方法に対する方法の有効性と実用的な可能性を検証します。
さらに、私たちの方法は、他の下流のタスク \emph{i.e.} を促進し、敵対的攻撃を転送します。

要約(オリジナル)

Previous studies have verified that the functionality of black-box models can be stolen with full probability outputs. However, under the more practical hard-label setting, we observe that existing methods suffer from catastrophic performance degradation. We argue this is due to the lack of rich information in the probability prediction and the overfitting caused by hard labels. To this end, we propose a novel hard-label model stealing method termed \emph{black-box dissector}, which consists of two erasing-based modules. One is a CAM-driven erasing strategy that is designed to increase the information capacity hidden in hard labels from the victim model. The other is a random-erasing-based self-knowledge distillation module that utilizes soft labels from the substitute model to mitigate overfitting. Extensive experiments on four widely-used datasets consistently demonstrate that our method outperforms state-of-the-art methods, with an improvement of at most $8.27\%$. We also validate the effectiveness and practical potential of our method on real-world APIs and defense methods. Furthermore, our method promotes other downstream tasks, \emph{i.e.}, transfer adversarial attacks.

arxiv情報

著者	Yixu Wang,Jie Li,Hong Liu,Yan Wang,Yongjian Wu,Feiyue Huang,Rongrong Ji
発行日	2022-09-26 15:31:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing Attack

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー