Greedy Ordering of Layer Weight Matrices in Transformers Improves Translation

要約

以前の研究では、トランスフォーマーベースのエンコーダーデコーダーアーキテクチャの内部構造と機能を、マルチヘッドアテンションとフィードフォワードサブレイヤーのレベルで理解しようとしました。
解釈は、セルフアテンション、クロスアテンション、およびフィードフォワードサブレイヤーの組み合わせの可能性とともに、エンコーダーとデコーダーに焦点を当てています。
ただし、低レベルの構造を調べないと、サブレイヤーの並べ替えの背後にある動機について、限られた理解しか得られません。
サブレイヤーの抽象化に飛び込み、レイヤーの重み行列を並べ替えて、翻訳の品質を向上させることはできますか?
Heavy-Tailed Self-Regularization (HT-SR) メトリックによって測定されるように、エンコーダーの層の重み行列を貪欲に並べ替え、デコーダー行列をそれに応じて並べ替える AEIUOrder を提案します。
私たちの結果は、レイヤーの重み行列を貪欲に並べ替えて、十分に訓練された合計を最大化することで、モデルが表現を学習し、より効果的に翻訳を生成することを促進することを示唆しています。

要約(オリジナル)

Prior work has attempted to understand the internal structures and functionalities of Transformer-based encoder-decoder architectures on the level of multi-head attention and feed-forward sublayers. Interpretations have focused on the encoder and decoder, along with the combinatorial possibilities of the self-attention, cross-attention, and feed-forward sublayers. However, without examining the low-level structures, one gains limited understanding of the motivation behind sublayer reordering. Could we dive into the sublayer abstraction and permute layer weight matrices to improve the quality of translation? We propose AEIUOrder to greedily reorder layer weight matrices in the encoder by their well-trainedness, as measured by Heavy-Tailed Self-Regularization (HT-SR) metrics, and order the decoder matrices correspondingly. Our results suggest that greedily reordering layer weight matrices to maximize Total well-trainedness facilitates the model to learn representations and generate translations more effectively.

arxiv情報

著者	Elicia Ye
発行日	2023-03-17 00:08:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Greedy Ordering of Layer Weight Matrices in Transformers Improves Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー