Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method

要約

モデルとデータセットのサイズの継続的な成長により、並列トレーニング方法は、機械学習（ML）にますます関連しています。
ディープニューラルネットワーク（DNNS）をトレーニングするために、追加の前提条件化された信託地域戦略（APTS）のバリアントを提案します。
提案されているAPTSメソッドは、データ並列アプローチを利用して、非線形最適化戦略で採用されている非線形前委員会を構築します。
確率勾配降下（SGD）および適応モーメント推定（ADAM）の一般的な雇用とは対照的に、どちらも勾配降下（GD）アルゴリズムのバリアントであるADAM）は、各反復のステップサイズを暗黙的に調整し、それによって必要性を削除します。
高価なハイパーパラメーターチューニング。
MNISTおよびCIFAR-10データセットを使用して、提案されたAPTSバリアントのパフォーマンスを実証します。
得られた結果は、ここで提案されているAPTSバリアントがSGDとADAMに匹敵する検証精度を達成し、並行トレーニングを可能にし、高価なハイパーパラメーターチューニングの必要性を排除することを示しています。

要約(オリジナル)

Parallel training methods are increasingly relevant in machine learning (ML) due to the continuing growth in model and dataset sizes. We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs). The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization strategy. In contrast to the common employment of Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), which are both variants of gradient descent (GD) algorithms, the APTS method implicitly adjusts the step sizes in each iteration, thereby removing the need for costly hyperparameter tuning. We demonstrate the performance of the proposed APTS variant using the MNIST and CIFAR-10 datasets. The results obtained indicate that the APTS variant proposed here achieves comparable validation accuracy to SGD and Adam, all while allowing for parallel training and obviating the need for expensive hyperparameter tuning.

arxiv情報

著者	Samuel A. Cruz Alegría,Ken Trotti,Alena Kopaničáková,Rolf Krause
発行日	2025-02-07 18:11:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー