A Meta-Learning Perspective on Transformers for Causal Language Modeling

要約

Transformer アーキテクチャは、大規模な因果言語モデルの開発において顕著になっています。
ただし、その機能を説明するメカニズムはよく理解されていません。
ここではトレーニングプロセスに焦点を当て、Transformer 内で発生する可能性のある内部最適化プロセスを説明することにより、因果言語モデリングタスク用にトレーニングされたときの Transformer アーキテクチャのメタ学習ビューを確立します。
さらに、内部最適化の内部から、Transformer ベースの因果言語モデル内で学習されたトークン表現の規範の特別な特性を発見し、理論的に分析します。
私たちの分析は、事前にトレーニングされた大規模な言語モデルと現実世界のデータに対して行われた実験によって裏付けられています。

要約(オリジナル)

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.

arxiv情報

著者	Xinbo Wu,Lav R. Varshney
発行日	2023-10-09 17:27:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Meta-Learning Perspective on Transformers for Causal Language Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー