Error Feedback Can Accurately Compress Preconditioners

要約

ディープネットワークの規模で 2 次情報を活用することは、ディープラーニングの現在のオプティマイザーのパフォーマンスを向上させるための主なアプローチの 1 つです。
しかし、フルマトリックス Adagrad (GGT) やマトリックスフリー近似曲率 (M-FAC) などの正確なフルマトリックスプリコンディショニングのための既存のアプローチは、中規模モデルに適用した場合でも、膨大なストレージコストが発生します。
勾配のスライディングウィンドウ。メモリ要件はモデルの次元で倍増します。
この論文では、収束を損なうことなく、実際にプリコンディショナーを最大 2 桁圧縮するために適用できる、効率的で実装が簡単なエラーフィードバック手法によってこの問題に対処します。
具体的には、私たちのアプローチでは、勾配情報がプリコンディショナーに供給される前に、スパース化または低ランク圧縮によって勾配情報を圧縮し、圧縮誤差を将来の反復にフィードバックします。
視覚用のディープニューラルネットワークに関する広範な実験により、このアプローチにより、精度に影響を与えることなくフルマトリックスプリコンディショナーを最大 2 桁圧縮でき、フルマトリックス Adagrad (GGT) の実装におけるフルマトリックスプリコンディショニングのメモリオーバーヘッドが効果的に除去できることが示されました。
および自然な勾配 (M-FAC)。
私たちのコードは https://github.com/IST-DASLab/EFCP で入手できます。

要約(オリジナル)

Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.

arxiv情報

著者	Ionut-Vlad Modoranu,Aleksei Kalinov,Eldar Kurtic,Dan Alistarh
発行日	2023-06-09 17:58:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Error Feedback Can Accurately Compress Preconditioners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー