Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

要約

私たちは、深いモデルのトレーニングプロセスをシミュレートするためのインコンテキスト学習 (ICL) に対するトランスフォーマーの機能を調査します。
私たちの主な貢献は、ICL を介した暗黙的な方法で勾配降下法によってディープニューラルネットワークをトレーニングするためにトランスフォーマーを使用する肯定的な例を提供したことです。
具体的には、ICLを介して$N$層ReLUネットワークの$L$勾配降下ステップをシミュレートできる$(2N+4)L$層変換器の明示的な構築を提供します。
また、与えられた誤差内での近似と ICL 勾配降下法の収束に対する理論的な保証も提供します。
さらに、Softmax ベースのトランスフォーマーを使用して、より実用的な設定まで分析を拡張します。
3 層、4 層、6 層のニューラルネットワークの合成データセットに関する調査結果を検証します。
結果は、ICL のパフォーマンスが直接トレーニングのパフォーマンスと一致することを示しています。

要約(オリジナル)

We investigate the transformer’s capability for in-context learning (ICL) to simulate the training process of deep models. Our key contribution is providing a positive example of using a transformer to train a deep neural network by gradient descent in an implicit fashion via ICL. Specifically, we provide an explicit construction of a $(2N+4)L$-layer transformer capable of simulating $L$ gradient descent steps of an $N$-layer ReLU network through ICL. We also give the theoretical guarantees for the approximation within any given error and the convergence of the ICL gradient descent. Additionally, we extend our analysis to the more practical setting using Softmax-based transformers. We validate our findings on synthetic datasets for 3-layer, 4-layer, and 6-layer neural networks. The results show that ICL performance matches that of direct training.

arxiv情報

著者	Weimin Wu,Maojiang Su,Jerry Yao-Chieh Hu,Zhao Song,Han Liu
発行日	2024-11-25 16:32:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー