Boosting Asynchronous Decentralized Learning with Model Fragmentation

要約

分散学習（Decentralized Learning: DL）は、ウェブ上のノードが生データを共有することなく機械学習モデルを共同学習することを可能にする新しい技術である。他のノードよりも計算速度や通信速度が遅いノードなどの「はぐれノード」に対処することは、DLにおける重要な課題である。DivShareは、通信速度が遅いノードが存在する場合でも、高速なモデル収束を実現する新しい非同期DLアルゴリズムである。DivShareは、ノードのモデルをパラメータのサブセットに断片化し、完全なモデルを逐次交換する代わりに、計算と並行して各サブセットを他のノードのランダムなサンプルに送信することでこれを実現する。より小さなフラグメントを転送することで、集合的な帯域幅をより効率的に使用することができ、低速のネットワークリンクを持つノードが、モデルパラメータの少なくとも一部で迅速に貢献することが可能になる。DivShareの収束を理論的に証明することで、我々の知る限り、遅延を伴う非同期通信の影響を考慮したDLアルゴリズムの収束に関する初の正式な証明を提供する。AD-PSGDとSwiftという2つの最新DLベースラインと、CIFAR-10とMovieLensという2つの標準データセットを用いて、DivShareを実験的に評価した。その結果、CIFAR-10データセットにおいて、DivShareはAD-PSGDと比較して、通信の遅延を考慮することで、時間対精度を最大3.9倍低下させることがわかった。また、ベースラインと比較して、DivShareはCIFAR-10およびMovieLensデータセットにおいて、それぞれ最大19.4%の精度向上と9.5%のテスト損失低減を達成している。

要約(オリジナル)

Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e., nodes with slower compute or communication than others, is a key challenge in DL. We present DivShare, a novel asynchronous DL algorithm that achieves fast model convergence in the presence of communication stragglers. DivShare achieves this by having nodes fragment their models into parameter subsets and send, in parallel to computation, each subset to a random sample of other nodes instead of sequentially exchanging full models. The transfer of smaller fragments allows more efficient usage of the collective bandwidth and enables nodes with slow network links to quickly contribute with at least some of their model parameters. By theoretically proving the convergence of DivShare, we provide, to the best of our knowledge, the first formal proof of convergence for a DL algorithm that accounts for the effects of asynchronous communication with delays. We experimentally evaluate DivShare against two state-of-the-art DL baselines, AD-PSGD and Swift, and with two standard datasets, CIFAR-10 and MovieLens. We find that DivShare with communication stragglers lowers time-to-accuracy by up to 3.9x compared to AD-PSGD on the CIFAR-10 dataset. Compared to baselines, DivShare also achieves up to 19.4% better accuracy and 9.5% lower test loss on the CIFAR-10 and MovieLens datasets, respectively.

arxiv情報

著者	Sayan Biswas,Anne-Marie Kermarrec,Alexis Marouani,Rafael Pires,Rishi Sharma,Martijn de Vos
発行日	2025-02-03 18:24:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Boosting Asynchronous Decentralized Learning with Model Fragmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー