BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

要約

大規模な基礎モデルは、言語および視覚タスクにおいて優れたパフォーマンスを実証しました。
ただし、これらの大規模ネットワークに含まれる多数の高密度の行列ベクトル演算は、推論中に重大な計算上の課題を引き起こします。
これらの課題に対処するために、深層学習モデル内の線形層の重み行列に広く普及している効率的な構造を学習して活用するように設計された Block-Level Adaptive STRuctured (BLAST) マトリックスを導入します。
BLAST マトリックスは、データから学習したり、既存の重みマトリックスから計算したりできるさまざまなタイプの構造を表現できるため、既存の構造化マトリックスと比較して、大幅な柔軟性を提供します。
言語タスクと視覚タスクの両方を圧縮するために BLAST マトリックスを使用する効率を実証します。(i) ViT や GPT-2 などの中規模モデルの場合、BLAST 重みを使用したトレーニングによりパフォーマンスが向上し、複雑さが 70% および 40% 削減されることがわかります。
\％、それぞれ;
(ii) Llama-7B や DiT-XL などの大規模な基礎モデルの場合、BLAST マトリックスは 2 倍の圧縮を達成しながら、テストされたすべての構造化マトリックスの中で最も低いパフォーマンス低下を示します。
私たちのコードは \url{https://github.com/changwoolee/BLAST} で入手できます。

要約(オリジナル)

Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large networks pose significant computational challenges during inference. To address these challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, designed to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models. Compared to existing structured matrices, the BLAST matrix offers substantial flexibility, as it can represent various types of structures that are either learned from data or computed from pre-existing weight matrices. We demonstrate the efficiency of using the BLAST matrix for compressing both language and vision tasks, showing that (i) for medium-sized models such as ViT and GPT-2, training with BLAST weights boosts performance while reducing complexity by 70\% and 40\%, respectively; and (ii) for large foundation models such as Llama-7B and DiT-XL, the BLAST matrix achieves a 2x compression while exhibiting the lowest performance degradation among all tested structured matrices. Our code is available at \url{https://github.com/changwoolee/BLAST}.

arxiv情報

著者	Changwoo Lee,Soo Min Kwon,Qing Qu,Hun-Seok Kim
発行日	2024-10-28 17:56:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー