Optimistic Verifiable Training by Controlling Hardware Nondeterminism

要約

AI システムのコンピューティング需要の増大により、必要なリソースが不足しているクライアントに代わってモデルをトレーニングするサービスの出現につながりました。
ただし、トレーニングの正確性を確保し、データポイズニングなどの潜在的なトレーニング時の攻撃から保護することには課題が伴います。
検証可能なトレーニングに関する既存の研究は、主に 2 つのクラスに分類されます。1 つは暗号化技術が必要なため拡張が難しい証明ベースのシステム、もう 1 つはトレーニングプロセスを再現する信頼できる第三者の監査人を考慮した「楽観的」な方法です。
後者の主な課題は、トレーニング中の GPU タイプ間のハードウェア非決定性により、監査人がトレーニングプロセスを正確に複製できないため、そのようなスキームは堅牢ではないことです。
ターゲットモデルよりも高い精度でのトレーニング、中間計算ステップ後の丸め、および適応しきい値処理手順に基づく丸め決定の保存を組み合わせて、非決定性を適切に制御する方法を提案します。
3 つの異なる NVIDIA GPU (A40、Titan XP、RTX 2080 Ti) にわたって、ResNet-50 (23M) および GPT-2 (117M) モデルのフルトレーニングと微調整の両方において、FP32 精度での正確なトレーニングレプリケーションを実現します。
当社の検証可能なトレーニングスキームは、証明ベースのシステムと比較して、ストレージと時間のコストを大幅に削減します。

要約(オリジナル)

The increasing compute demands of AI systems has led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which struggle to scale due to requiring cryptographic techniques, and ‘optimistic’ methods that consider a trusted third-party auditor who replicates the training process. A key challenge with the latter is that hardware nondeterminism between GPU types during training prevents an auditor from replicating the training process exactly, and such schemes are therefore non-robust. We propose a method that combines training in a higher precision than the target model, rounding after intermediate computation steps, and storing rounding decisions based on an adaptive thresholding procedure, to successfully control for nondeterminism. Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of ResNet-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems.

arxiv情報

著者	Megha Srivastava,Simran Arora,Dan Boneh
発行日	2024-03-14 17:44:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー