Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

要約

データの量が増え続けることと分散化された性質に加え、最新のモデルのサイズが増大していることにより、分散ディープラーニング (DDL) はトレーニングに推奨されるパラダイムとして定着しています。
ただし、数百万から数十億のパラメータを含む DL モデルを頻繁に同期すると、通信のボトルネックが生じ、スケーラビリティが大幅に妨げられます。
さらに悪いことに、DDL アルゴリズムは通常、過度に単純で定期的かつ厳格な同期スケジュールに依存するため、貴重な帯域幅を無駄にし、帯域幅に制約のあるフェデレーション設定では実用的ではなくなります。
これらの欠点に対処するために、モデルの分散の値に基づいて同期を動的にトリガーする、通信効率の高い DDL 戦略である Federated Dynamic Averaging (FDA) を提案します。
幅広い学習タスクにわたる広範な実験を通じて、FDA が従来の通信効率の高いアルゴリズムと最先端の通信効率の高いアルゴリズムの両方と比較して、通信コストを桁違いに削減できることを実証しました。
注目すべきことに、現場で遭遇するトレードオフとは対照的に、FDA は収束速度を犠牲にすることなくこれを達成しています。
さらに、FDA がさまざまなデータ異質性設定にわたって堅牢なパフォーマンスを維持していることを示します。

要約(オリジナル)

Driven by the ever-growing volume and decentralized nature of data, coupled with the escalating size of modern models, distributed deep learning (DDL) has been entrenched as the preferred paradigm for training. However, frequent synchronization of DL models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth, and make themselves less practical in bandwidth-constrained federated settings, by relying on overly simplistic, periodic, and rigid synchronization schedules. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Remarkably, FDA achieves this without sacrificing convergence speed – in stark contrast to the trade-offs encountered in the field. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings.

arxiv情報

著者	Michail Theologitis,Georgios Frangias,Georgios Anestis,Vasilis Samoladas,Antonios Deligiannakis
発行日	2024-05-31 16:34:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー