Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses

要約

機械学習（ML）の使用は、さまざまなドメインでますます普及しており、理解とその安全性を確保することの重要性を強調しています。
差し迫った懸念の1つは、MLアプリケーションが盗む攻撃をモデル化する脆弱性です。
これらの攻撃には、クラウドベースのサービスやオンチップ人工知能インターフェイスに見られるような限られたクエリ反応相互作用を通じて、学習モデルを回復しようとする敵が含まれます。
既存の文献はさまざまな攻撃と防衛戦略を提案していますが、これらはしばしば理論的基盤と標準化された評価基準を欠いています。
これに応じて、この作業は「モデルプライバシー」と呼ばれるフレームワークを提示し、モデル盗む攻撃と防御を包括的に分析するための基盤を提供します。
脅威モデルと目的のための厳密な定式化を確立し、攻撃戦略と防衛戦略の良さを定量化する方法を提案し、MLモデルのユーティリティとプライバシーの基本的なトレードオフを分析します。
私たちの開発された理論は、MLモデルのセキュリティを強化するための貴重な洞察を提供し、特に効果的な防御のための摂動の攻撃固有の構造の重要性を強調しています。
さまざまな学習シナリオを通じて、ディフェンダーの観点からモデルプライバシーの適用を実証します。
広範な実験は、提案されたフレームワークの下で開発された防衛メカニズムの洞察と有効性を裏付けています。

要約(オリジナル)

The use of machine learning (ML) has become increasingly prevalent in various domains, highlighting the importance of understanding and ensuring its safety. One pressing concern is the vulnerability of ML applications to model stealing attacks. These attacks involve adversaries attempting to recover a learned model through limited query-response interactions, such as those found in cloud-based services or on-chip artificial intelligence interfaces. While existing literature proposes various attack and defense strategies, these often lack a theoretical foundation and standardized evaluation criteria. In response, this work presents a framework called “Model Privacy”, providing a foundation for comprehensively analyzing model stealing attacks and defenses. We establish a rigorous formulation for the threat model and objectives, propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models. Our developed theory offers valuable insights into enhancing the security of ML models, especially highlighting the importance of the attack-specific structure of perturbations for effective defenses. We demonstrate the application of model privacy from the defender’s perspective through various learning scenarios. Extensive experiments corroborate the insights and the effectiveness of defense mechanisms developed under the proposed framework.

arxiv情報

著者	Ganghua Wang,Yuhong Yang,Jie Ding
発行日	2025-02-21 16:29:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー