Efficient $1$-bit tensor approximations

要約

$\{-1, 1\}$ 値のベクトルのテンソル積の線形結合として、行列と任意次数テンソルの空間的に効率的な分解を提示します。
任意の行列 $A \in \mathbb{R}^{m \times n}$ に対して、 $$A – R_w = S_w C_w T_w^\top = \sum_{j=1}^w c_j \cdot \mathbf{s
}_j \mathbf{t}_j^\top$$ は、{\it $w$-width の $A$ の符号付きカット分解} です。
ここで、$\mathbf{c}_w \in \mathbb{R}^w,$ と $S_w、T_w$、およびベクトル $\mathbf{
s}_j、\mathbf{t}_j$ は $\{-1, 1\}$ 値です。
$(S_w, T_w, C_w)$ を格納するには、$w \cdot (m + n)$ ビットをパックし、必要なのは $w$ 浮動小数点数だけです。
$w$ の関数として、$\|R_w\|_F$ は、i.i.d を使用して #f32 行列に適用すると指数関数的な減衰を示します。
$\mathcal N (0, 1)$ エントリ。
$(S_w, T_w, C_w)$ が \textit{f16} 行列または \textit{bf16} 行列と同じメモリフットプリントを持つように $w$ を選択すると、相対誤差は同等になります。
私たちのアルゴリズムは、$20$ 行の疑似コードで効率的な符号付きカット分解を生成します。
これは、Frieze と Kannan の有名な 1999 年の論文 [1] からの簡単な修正を反映しています。
最初のアプリケーションとして、オープンな \textit{Mistral-7B-v0.1} 大規模言語モデルの重み行列を $50\%$ の空間圧縮に近似します。
注目すべきことに、すべての $226$ 剰余行列には相対誤差 $<6\%$ があり、拡張モデルは {\it Huggingface} リーダーボードの \textit{Mistral-7B-v0.1} とほぼ一致しています [2]。空間圧縮を $50\%$ から $25\%$ に下げると、ベンチマークのパフォーマンスがゆっくりと低下します。 \textit{avx2} および \textit{avx512} アーキテクチャ上の \textit{simd} 命令を使用して、オープンソースの \textit{rust} 実装 [3] を最適化します。また、アルゴリズムを行列から任意の次数のテンソルに拡張し、それを使用して最初の著者の猫アンガスの画像を圧縮します。

要約(オリジナル)

We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of $\{-1, 1\}$-valued vectors. For any matrix $A \in \mathbb{R}^{m \times n}$, $$A – R_w = S_w C_w T_w^\top = \sum_{j=1}^w c_j \cdot \mathbf{s}_j \mathbf{t}_j^\top$$ is a {\it $w$-width signed cut decomposition of $A$}. Here $C_w = ‘diag'(\mathbf{c}_w)$ for some $\mathbf{c}_w \in \mathbb{R}^w,$ and $S_w, T_w$, and the vectors $\mathbf{s}_j, \mathbf{t}_j$ are $\{-1, 1\}$-valued. To store $(S_w, T_w, C_w)$, we may pack $w \cdot (m + n)$ bits, and require only $w$ floating point numbers. As a function of $w$, $\|R_w\|_F$ exhibits exponential decay when applied to #f32 matrices with i.i.d. $\mathcal N (0, 1)$ entries. Choosing $w$ so that $(S_w, T_w, C_w)$ has the same memory footprint as a \textit{f16} or \textit{bf16} matrix, the relative error is comparable. Our algorithm yields efficient signed cut decompositions in $20$ lines of pseudocode. It reflects a simple modification from a celebrated 1999 paper [1] of Frieze and Kannan. As a first application, we approximate the weight matrices in the open \textit{Mistral-7B-v0.1} Large Language Model to a $50\%$ spatial compression. Remarkably, all $226$ remainder matrices have a relative error $<6\%$ and the expanded model closely matches \textit{Mistral-7B-v0.1} on the {\it huggingface} leaderboard [2]. Benchmark performance degrades slowly as we reduce the spatial compression from $50\%$ to $25\%$. We optimize our open source \textit{rust} implementation [3] with \textit{simd} instructions on \textit{avx2} and \textit{avx512} architectures. We also extend our algorithm from matrices to tensors of arbitrary order and use it to compress a picture of the first author's cat Angus.

arxiv情報

著者	Alex W. Neal Riasanovsky,Sarah El Kazdadi
発行日	2024-10-02 17:56:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient $1$-bit tensor approximations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー