WaveletGPT: Wavelets Meet Large Language Models

要約

大規模言語モデル (LLM) は、あらゆる科学分野と専門分野に影響を与える人工知能の進歩の新たな波をもたらしました。
これらは、前のコンテキストを考慮して次のトークンを予測するという単純な目的に基づいてトレーニングされます。
私たちは、テキスト、オーディオ、音楽など、周囲のほとんどのデータがマルチスケール構造に関連付けられている世界に住んでいます。
この論文では、構造を活用するために、事前トレーニング中に従来の信号処理のアイデア、つまりウェーブレットを LLM に注入します。
GPT スタイルの LLM アーキテクチャに \textbf{追加パラメータ} を追加することなく、テキスト、生のオーディオ、シンボリックミュージックでほぼ 2 倍の速度で同じ事前トレーニングパフォーマンスを達成します。
これは、中間の埋め込みに構造を課すことによって実現されます。
同じ数のトレーニングステップでトレーニングすると、パフォーマンスが大幅に向上します。これは、より大規模なニューラルアーキテクチャを事前トレーニングする場合に匹敵します。
私たちのアーキテクチャにより、すべての次のトークン予測で、すべての Transformer デコーダーブロック内の異なる時間解像度で中間埋め込みにアクセスできます。
この研究により、マルチレート信号処理のアイデアを従来の LLM 事前トレーニングに組み込む道が開かれることが期待されます。
さらに、単にスケールを追求するのではなく、内部構造を改善することでモデルのパフォーマンスを押し上げることを紹介します。

要約(オリジナル)

Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. They are trained on a simple objective: to predict the next token given the previous context. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure associated with it. This paper infuses LLMs with traditional signal processing ideas, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture, we achieve the same pre-training performance almost twice as fast in text, raw audio, and symbolic music. This is achieved by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a larger neural architecture. Our architecture allows every next token prediction access to intermediate embeddings at different temporal resolutions in every Transformer decoder block. This work will hopefully pave the way for incorporating multi-rate signal processing ideas into traditional LLM pre-training. Further, we showcase pushing model performance by improving internal structure instead of just going after scale.

arxiv情報

著者	Prateek Verma
発行日	2024-12-05 18:35:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

WaveletGPT: Wavelets Meet Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー