PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

要約

大規模な言語モデルの進化に伴い、従来の Transformer モデルでは、シーケンスの長さに対する計算の二次関数的な増加により、長いシーケンスに対する計算要求が厳しくなります。
Mamba は、生成 AI の分野で画期的なアーキテクチャとして登場しており、計算とメモリの複雑さを軽減しながら、長いシーケンスを処理する際に顕著な熟練を示しています。
それにもかかわらず、Mamba の既存のトレーニングフレームワークでは、可変長シーケンス入力の非効率性が生じます。
単一シーケンスのトレーニングでは GPU 使用率が低くなるか、可変長シーケンスを最大長までバッチ処理するとかなりのメモリと計算オーバーヘッドが発生します。
この問題に対処するために、私たちはさまざまなテンソル形状の下で Mamba のボトルネック演算子のパフォーマンスを分析し、可変長シーケンスを効率的に処理する高スループット Mamba である PackMamba を提案しました。
状態空間モデル (SSM) を深く掘り下げて、並列演算子を変更して、高いパフォーマンスを維持しながら個々のシーケンス間での情報の受け渡しを回避します。
NVIDIA A100 GPU での実験結果では、ベースラインの単一シーケンス処理スキームを超えるスループットが実証されています。1.4B モデルでは 3.06 倍、2.8B モデルでは 2.62 倍の高速化です。

要約(オリジナル)

With the evolution of large language models, traditional Transformer models become computationally demanding for lengthy sequences due to the quadratic growth in computation with respect to the sequence length. Mamba, emerging as a groundbreaking architecture in the field of generative AI, demonstrates remarkable proficiency in handling elongated sequences with reduced computational and memory complexity. Nevertheless, the existing training framework of Mamba presents inefficiency with variable-length sequence inputs. Either single-sequence training results in low GPU utilization, or batched processing of variable-length sequences to a maximum length incurs considerable memory and computational overhead. To address this problem, we analyze the performance of bottleneck operators in Mamba under diverse tensor shapes and proposed PackMamba, a high-throughput Mamba that efficiently handles variable-length sequences. Diving deep into state-space models (SSMs), we modify the parallel operators to avoid passing information between individual sequences while maintaining high performance. Experimental results on an NVIDIA A100 GPU demonstrate throughput exceeding the baseline single-sequence processing scheme: 3.06x speedup on the 1.4B model and 2.62x on the 2.8B model.

arxiv情報

著者	Haoran Xu,Ziqian Liu,Rong Fu,Zhongling Su,Zerui Wang,Zheng Cai,Zhilin Pei,Xingcheng Zhang
発行日	2024-08-07 16:13:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー