Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner

要約

RWKV-7のような状態ベースのシーケンスモデルは、変圧器アーキテクチャの魅力的な代替品を提供し、短いコンテキストシナリオでより大きな表現力を実証し、\（\ text {tc}^0 \）複雑さクラスを超えて状態追跡を可能にし、より大きな表現力を実証します。
ただし、RWKV-7には、トークンパラメーターの相互作用とネイティブスケーラビリティのメカニズムがあり、再訓練なしで適応性と成長を制限します。
この論文では、注意メカニズムを完全に状態駆動型のアプローチに置き換えるRWKV-7の新しい拡張である\ textBf {Meta-state}を提案し、\ textBF {self-stateエンコーダー}（SSE）メカニズムを介したトークンパラメーターの相互作用を統合します。
SSEは、RWKV-7加重キー価値（WKV）状態の一部を変換重みとして再利用して、トークン処理の自己縁放電特性を保存しながら、新しいトレーニング可能なマトリックスまたはソフトマックス操作を導入することなく、線形の状態駆動型の方法でトークンパラメーターの相互作用をエンコードします。
Meta-Stateは、WKV状態とパラメータートークンを拡張し、再訓練なしで既存のパラメーターを再利用することにより、プログレッシブモデルのスケーリングをサポートします。
私たちのアプローチは、状態ベースのモデリング、トークンパラメーターの相互作用、スケーラブルなアーキテクチャの間のギャップを橋渡しし、線形の複雑さと一定のメモリ使用量を備えた効率的で適応可能なシーケンスモデリングのための柔軟なフレームワークを提供します。

要約(オリジナル)

State-based sequence models like RWKV-7 offer a compelling alternative to Transformer architectures, achieving linear complexity while demonstrating greater expressive power in short-context scenarios and enabling state tracking beyond the \(\text{TC}^0\) complexity class. However, RWKV-7 lacks mechanisms for token-parameter interactions and native scalability, limiting its adaptability and growth without retraining. In this paper, we propose \textbf{Meta-State}, a novel extension to RWKV-7 that replaces attention mechanisms with a fully state-driven approach, integrating token-parameter interactions through a \textbf{Self-State Encoder} (SSE) mechanism. The SSE repurposes a portion of the RWKV-7 Weighted Key-Value (WKV) state as transformation weights to encode token-parameter interactions in a linear, state-driven manner without introducing new trainable matrices or softmax operations, while preserving the autoregressive property of token processing. Meta-State supports progressive model scaling by expanding the WKV state and parameter tokens, reusing existing parameters without retraining. Our approach bridges the gap between state-based modeling, token-parameter interactions, and scalable architectures, offering a flexible framework for efficient and adaptable sequence modeling with linear complexity and constant memory usage.

arxiv情報

著者	Liu Xiao,Li Zhiyuan,Lin Yueyu
発行日	2025-04-11 04:14:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー