Wave Network: An Ultra-Small Language Model

要約

私たちは、新しい超小型言語モデルである Wave ネットワークにおける革新的なトークン表現と更新方法を提案します。
具体的には、複雑なベクトルを使用して各トークンを表し、入力テキストのグローバルセマンティクスとローカルセマンティクスの両方をエンコードします。
複素ベクトルは 2 つのコンポーネントで構成されます。1 つは入力テキストのグローバルセマンティクスを表すマグニチュードベクトル、もう 1 つは個々のトークンとグローバルセマンティクスの間の関係を表す位相ベクトルです。
AG News テキスト分類タスクの実験では、ランダムに初期化されたトークン埋め込みから複雑なベクトルを生成する場合、当社の単層 Wave ネットワークが波干渉で 90.91%、波変調で 91.66% の精度を達成し、BERT 事前使用を使用した単一の Transformer 層を上回るパフォーマンスを示しました。
トレーニング済みの埋め込みはそれぞれ 19.23% と 19.98% 向上し、事前トレーニング済みの埋め込みと事前トレーニング済みの埋め込みの精度に近づいています。
微調整された BERT ベースモデル (94.64%)。
さらに、BERT ベースと比較して、Wave Network はビデオメモリの使用量とトレーニング時間を 77.34%、Wave 変調中に 85.62% 削減します。
要約すると、テキスト分類において 1 億パラメータの BERT モデルに匹敵する精度を達成するために、240 万パラメータの小規模言語モデルを使用しました。

要約(オリジナル)

We propose an innovative token representation and update method in a new ultra-small language model: the Wave network. Specifically, we use a complex vector to represent each token, encoding both global and local semantics of the input text. A complex vector consists of two components: a magnitude vector representing the global semantics of the input text, and a phase vector capturing the relationships between individual tokens and global semantics. Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly initialized token embeddings, our single-layer Wave Network achieves 90.91% accuracy with wave interference and 91.66% with wave modulation – outperforming a single Transformer layer using BERT pre-trained embeddings by 19.23% and 19.98%, respectively, and approaching the accuracy of the pre-trained and fine-tuned BERT base model (94.64%). Additionally, compared to BERT base, the Wave Network reduces video memory usage and training time by 77.34% and 85.62% during wave modulation. In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.

arxiv情報

著者	Xin Zhang,Victor S. Sheng
発行日	2024-11-07 12:38:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Wave Network: An Ultra-Small Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー