Zero redundancy distributed learning with differential privacy

要約

大規模モデルを使用したディープラーニングは、幅広い領域で大きな成功を収めています。
ただし、数十億のパラメーターでこれらのモデルをトレーニングすることは、特に差分プライバシー (DP) によるプライバシー保護体制下では、トレーニング速度、メモリコスト、通信効率の点で非常に困難です。
一方で、DP 最適化は、単一 GPU では標準の非プライベート最適化と同等の効率を実現しますが、複数の GPU では、既存の DP 分散学習 (パイプライン並列など) の効率が大幅に低下します。
一方、Zero Redundancy Optimizer (ZeRO) は、標準的な分散学習に対する最先端のソリューションであり、大規模モデルで優れたトレーニング効率を示しますが、DP と互換性を持って動作させるのは技術的に複雑です。
この研究では、訓練可能な DP モデルのサイズをスケールアップするための新しい体系的なソリューション DP-ZeRO を開発します。
GPT-100B に準拠し、(II) 標準の ZeRO と同じ計算効率と通信効率を実現します。(III) 混合精度の DP トレーニングを可能にします。
当社の DP-ZeRO は、標準の ZeRO と同様に、任意のサイズでモデルをトレーニングできる可能性があり、トレーニング可能なパラメーターの数の点で世界最大の DP モデルで評価されています。

要約(オリジナル)

Deep learning using large models have achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of the training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, DP optimization has comparable efficiency to the standard non-private optimization on a single GPU, but on multiple GPUs, existing DP distributed learning (such as pipeline parallel) has suffered from significantly worse efficiency. On the other hand, the Zero Redundancy Optimizer (ZeRO) is a state-of-the-art solution to the standard distributed learning, exhibiting excellent training efficiency on large models, but to work compatibly with DP is technically complicated. In this work, we develop a new systematic solution, DP-ZeRO, (I) to scale up the trainable DP model size, e.g. to GPT-100B, (II) to obtain the same computation and communication efficiency as the standard ZeRO, and (III) to enable mixed-precision DP training. Our DP-ZeRO, like the standard ZeRO, has the potential to train models with arbitrary size and is evaluated on the world’s largest DP models in terms of the number of trainable parameters.

arxiv情報

著者	Zhiqi Bu,Justin Chiu,Ruixuan Liu,Sheng Zha,George Karypis
発行日	2023-11-20 14:58:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Zero redundancy distributed learning with differential privacy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー