Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks

要約

リカレントニューラルネットワークでは、勾配の消失と爆発の問題により、長期的な依存関係を学習することが主な困難になります。
多くの研究者がこの問題の解決に専念し、多くのアルゴリズムを提案しました。
これらのアルゴリズムは大きな成功を収めていますが、情報がどのように減衰するかを理解することは依然として未解決の問題です。
この論文では、リカレントニューラルネットワークにおける隠れ状態のダイナミクスを研究します。
我々は、重み行列の固有分解に基づいて隠れ状態空間を分析するための新しい視点を提案します。
線形状態空間モデルによる解析を開始し、活性化関数における情報保存の機能について説明します。
固有解析に基づいた長期依存関係について説明します。
また、回帰タスクと分類タスクの固有値の動作が異なることも指摘します。
よく訓練されたリカレントニューラルネットワークの観察から、一貫してパフォーマンスを向上させるリカレントニューラルネットワークの新しい初期化方法を提案しました。
これは、バニラ RNN、LSTM、および GRU に適用できます。
トミタグラマー、ピクセル単位の MNIST データセット、機械翻訳データセット (Multi30k) など、多くのデータセットでテストします。
これは、いくつかのタスクにおいて、Xavier イニシャライザや kaiming イニシャライザ、さらには IRNN や sp-RNN などの他の RNN 専用イニシャライザよりも優れたパフォーマンスを発揮します。

要約(オリジナル)

In recurrent neural networks, learning long-term dependency is the main difficulty due to the vanishing and exploding gradient problem. Many researchers are dedicated to solving this issue and they proposed many algorithms. Although these algorithms have achieved great success, understanding how the information decays remains an open problem. In this paper, we study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We start the analysis by linear state space model and explain the function of preserving information in activation functions. We provide an explanation for long-term dependency based on the eigen analysis. We also point out the different behavior of eigenvalues for regression tasks and classification tasks. From the observations on well-trained recurrent neural networks, we proposed a new initialization method for recurrent neural networks, which improves consistently performance. It can be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation datasets (Multi30k). It outperforms the Xavier initializer and kaiming initializer as well as other RNN-only initializers like IRNN and sp-RNN in several tasks.

arxiv情報

著者	Ran Dou,Jose Principe
発行日	2023-07-28 17:14:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー