Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks

要約

確率的勾配降下（SGD）は、現代の深い学習最適化の基礎ですが、高解像度データで畳み込みニューラルネットワーク（CNN）をトレーニングするとますます非効率になります。
このペーパーでは、マルチスケール確率勾配降下（Multiscale-SGD）を紹介します。これは、粗からファインまでのトレーニング戦略を活用して、勾配をわずかなコストで推定し、モデルの精度を維持しながらSGDタイプ方法の計算効率を改善する新しい最適化アプローチです。
マルチスケールSGDが効果的であるための理論的基準を導き出し、標準の畳み込みを使用できる一方で、騒々しいデータの最適ではないことを示します。
これにより、解像度全体で一貫したグラデーションの動作を確保し、マルチスケールトレーニングに適した、学習可能なスケールに依存しないメッシュフリーの畳み込み（MFC）の新しいクラスを導入することになります。
広範な経験的検証を通じて、実際には、（i）マルチスケールSGDアプローチを使用して、さまざまなタスクのさまざまなアーキテクチャをトレーニングできること、および（ii）ノイズが有意でない場合、標準的な畳み込みはマルチスケールトレーニングフレームワークの恩恵を受けることを実証します。
私たちの結果は、深いネットワークの効率的なトレーニングのための新しいパラダイムを確立し、高解像度およびマルチスケール学習タスクの実用的なスケーラビリティを可能にします。

要約(オリジナル)

Stochastic Gradient Descent (SGD) is the foundation of modern deep learning optimization but becomes increasingly inefficient when training convolutional neural networks (CNNs) on high-resolution data. This paper introduces Multiscale Stochastic Gradient Descent (Multiscale-SGD), a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost, improving the computational efficiency of SGD type methods while preserving model accuracy. We derive theoretical criteria for Multiscale-SGD to be effective, and show that while standard convolutions can be used, they can be suboptimal for noisy data. This leads us to introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions, making them well-suited for multiscale training. Through extensive empirical validation, we demonstrate that in practice, (i) our Multiscale-SGD approach can be used to train various architectures for a variety of tasks, and (ii) when the noise is not significant, standard convolutions benefit from our multiscale training framework. Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.

arxiv情報

著者	Niloufar Zakariaei,Shadab Ahamed,Eldad Haber,Moshe Eliasof
発行日	2025-03-12 16:05:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー