The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

要約

最近の論文では、層の正規化（LN）のドロップイン置換として動的タン（DYT）を提案しています。
この方法は、実用的な観点から経験的に十分に動機付けられ、魅力的ですが、理論的基盤がありません。
この作業では、層の正規化と動的活性化関数の間の数学的関係に光を当てました。
特に、LNからDYTを導き出し、そうするために明確に定義された近似が必要であることを示します。
上記の近似をドロップすることにより、代替の活性化関数が取得され、これを動的逆平方根単位（dyisru）と呼びます。
Dyisruは層の正規化の正確な対応物であり、DYTよりも実際にLNに似ていることを数値的に示します。

要約(オリジナル)

A recent paper proposes Dynamic Tanh (DyT) as a drop-in replacement for layer normalization (LN). Although the method is empirically well-motivated and appealing from a practical point of view, it lacks a theoretical foundation. In this work, we shed light on the mathematical relationship between layer normalization and dynamic activation functions. In particular, we derive DyT from LN and show that a well-defined approximation is needed to do so. By dropping said approximation, an alternative activation function is obtained, which we call Dynamic Inverse Square Root Unit (DyISRU). DyISRU is the exact counterpart of layer normalization, and we demonstrate numerically that it indeed resembles LN more accurately than DyT does.

arxiv情報

著者	Felix Stollenwerk
発行日	2025-03-31 12:10:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー