Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems

要約

深層学習手法は、通常、確率的勾配降下 (SGD) 最適化手法によって訓練されたディープニューラルネットワーク (DNN) のクラスで構成され、現在、データ駆動型学習の問題や、最適制御 (OC) などの科学技術計算タスクにおいて広く普及しています。
) および偏微分方程式 (PDE) の問題。
実際に関連する学習タスクでは、多くの場合、考慮されているクラスの DNN をトレーニングするために、ありきたりな標準 SGD 最適化手法が使用されず、代わりに、一般的な Adam オプティマイザーなど、標準 SGD 手法のより高度な適応型および高速化されたバリアントが使用されます。
古典的な Polyak-Ruppert 平均化アプローチに触発されたこの研究では、Adam オプティマイザーの平均化バリアントを適用して、PDE および OC 問題の形式で模範的な科学計算問題を近似的に解くように DNN をトレーニングします。
物理情報に基づくニューラルネットワーク (PINN)、深い後方確率微分方程式 (深い BSDE)、PDE の深いコルモゴロフ近似 (熱、ブラックショールズ、バーガーズ、
および Allen-Cahn PDE)、OC 問題の DNN 近似、および画像分類問題の DNN 近似 (ResNet for
CIFAR-10)。
各数値例では、使用されている Adam の平均バリアントは、特に科学的な機械学習の問題の状況において、標準の Adam および標準の SGD オプティマイザーよりも優れています。
この研究に関連する数値実験の Python ソースコードは、GitHub (https://github.com/deeplearningmethods/averaging-adam) にあります。

要約(オリジナル)

Deep learning methods – usually consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method – are nowadays omnipresent in data-driven learning problems as well as in scientific computing tasks such as optimal control (OC) and partial differential equation (PDE) problems. In practically relevant learning tasks, often not the plain-vanilla standard SGD optimization method is employed to train the considered class of DNNs but instead more sophisticated adaptive and accelerated variants of the standard SGD method such as the popular Adam optimizer are used. Inspired by the classical Polyak-Ruppert averaging approach, in this work we apply averaged variants of the Adam optimizer to train DNNs to approximately solve exemplary scientific computing problems in the form of PDEs and OC problems. We test the averaged variants of Adam in a series of learning problems including physics-informed neural network (PINN), deep backward stochastic differential equation (deep BSDE), and deep Kolmogorov approximations for PDEs (such as heat, Black-Scholes, Burgers, and Allen-Cahn PDEs), including DNN approximations for OC problems, and including DNN approximations for image classification problems (ResNet for CIFAR-10). In each of the numerical examples the employed averaged variants of Adam outperform the standard Adam and the standard SGD optimizers, particularly, in the situation of the scientific machine learning problems. The Python source codes for the numerical experiments associated to this work can be found on GitHub at https://github.com/deeplearningmethods/averaged-adam.

arxiv情報

著者	Steffen Dereich,Arnulf Jentzen,Adrian Riekert
発行日	2025-01-10 16:15:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー