Memory-Efficient Differentially Private Training with Gradient Random Projection

要約

差動プライバシー（DP）は、ニューラルネットワークトレーニング中に機密データを保護しますが、DP-Adamなどの標準的な方法は、サンプルあたりの勾配クリッピングのために高いメモリオーバーヘッドに苦しみ、スケーラビリティが制限されます。
DP-Grape（Gradient Random Projection）を紹介します。DP-Grapeは、1次DPアプローチと同等のユーティリティを維持しながら、メモリ使用量を大幅に削減するDPトレーニング方法です。
DP-Grapeは、DPをGaloreに直接適用するのではなく、3つの重要な修正を導入します。（1）勾配は投影後に民営化され、（2）ランダムガウスマトリックスはSVDベースのサブスペースを置き換え、（3）バックプロパンジ中に投影が適用されます。
これらの貢献により、費用のかかるSVD計算の必要性がなくなり、大幅なメモリの節約が可能になり、ユーティリティの改善につながります。
低次元のサブスペースで動作しているにもかかわらず、私たちの理論分析は、DP-GrapeがDP-SGDに匹敵するプライバシー – 有効性のトレードオフを達成することを示しています。
私たちの広範な経験的実験は、DPグレープが精度やトレーニング時間を犠牲にすることなくDPトレーニングのメモリフットプリントを減らすことができることを示しています。
特に、DP-Grapeは、トレーニング前の視力変圧器の場合、DP-Adamと比較してRoberta-Largeを微調整すると70％以上を63％以上削減し、同様のパフォーマンスを達成します。
さらに、DPグレープは、最大67億パラメーターのOPTなどの大規模なモデルを微調整することを実証します。

要約(オリジナル)

Differential privacy (DP) protects sensitive data during neural network training, but standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping, limiting scalability. We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage while maintaining utility on par with first-order DP approaches. Rather than directly applying DP to GaLore, DP-GRAPE introduces three key modifications: (1) gradients are privatized after projection, (2) random Gaussian matrices replace SVD-based subspaces, and (3) projection is applied during backpropagation. These contributions eliminate the need for costly SVD computations, enable substantial memory savings, and lead to improved utility. Despite operating in lower-dimensional subspaces, our theoretical analysis shows that DP-GRAPE achieves a privacy-utility trade-off comparable to DP-SGD. Our extensive empirical experiments show that DP-GRAPE can reduce the memory footprint of DP training without sacrificing accuracy or training time. In particular, DP-GRAPE reduces memory usage by over 63% when pre-training Vision Transformers and over 70% when fine-tuning RoBERTa-Large as compared to DP-Adam, while achieving similar performance. We further demonstrate that DP-GRAPE scales to fine-tuning large models such as OPT with up to 6.7 billion parameters.

arxiv情報

著者	Alex Mulrooney,Devansh Gupta,James Flemings,Huanyu Zhang,Murali Annavaram,Meisam Razaviyayn,Xinwei Zhang
発行日	2025-06-18 16:05:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Memory-Efficient Differentially Private Training with Gradient Random Projection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー