Reparameterized LLM Training via Orthogonal Equivalence Transformation

要約

大規模な言語モデル（LLM）が人工知能の急速な進歩を推進していますが、これらの大規模なモデルは、フィールドの最も重要な課題の1つであり続けています。
この課題に対処するために、私たちは詩人を提案します。詩人は、直交の等価変換を使用してニューロンを最適化する新しい再分析されたトレーニングアルゴリズムを提案します。
具体的には、詩人は、学習可能な2つの直交行列と固定ランダム重量マトリックスで各ニューロンを修復します。
重量マトリックスのスペクトル特性の証明可能な保存のため、詩人は一般化を改善することで目的関数を安定に最適化することができます。
さらに、大規模なニューラルネットワークをトレーニングするために詩人を柔軟にスケーラブルにする効率的な近似を開発します。
広範な実験では、LLMSの訓練における詩人の有効性とスケーラビリティを検証します。

要約(オリジナル)

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field’s most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

arxiv情報

著者	Zeju Qiu,Simon Buchholz,Tim Z. Xiao,Maximilian Dax,Bernhard Schölkopf,Weiyang Liu
発行日	2025-06-09 17:59:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reparameterized LLM Training via Orthogonal Equivalence Transformation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー