Addition is almost all you need: Compressing neural networks with double binary factorization

要約

重量行列をバイナリマトリックスに置き換え、コストのかかる乗算を安価な追加に置き換えるバイナリ量子化アプローチは、大規模な言語モデル（LLMS）の増加する計算およびストレージ要件に対処するための計算効率的なアプローチを提供します。
ただし、重度の量子化制約（$ \ PM1 $）は、大幅な精度の劣化につながる可能性があります。
このホワイトペーパーでは、密な重量マトリックスを2つのバイナリ（符号）マトリックスの生成物に因数分解する斬新な方法であるダブルバイナリ因数分解（DBF）を提案します。
DBFは、バイナリ表現の効率的な利点を保持し、最先端の方法と競合する、または優れている圧縮率を達成します。
具体的には、1ビットあたりの1ビット範囲では、DBFは既存の二等層アプローチよりも優れています。
2ビットあたりの重量範囲では、DBFはQUIP \＃やQTIPなどの最高の量子化方法と競合します。
制限された圧縮レベルの選択を提供するほとんどの既存の圧縮手法とは異なり、DBFは、因数分解の中間寸法を調整することにより、圧縮比を細かく制御できます。
この利点に基づいて、以前に開発されたチャネル剪定基準に基づいて、DBFの不均一な層ごとの圧縮比を推定するためのアルゴリズムをさらに導入します。
https://github.com/usamec/double_binaryで利用可能なコード

要約(オリジナル)

Binary quantization approaches, which replace weight matrices with binary matrices and substitute costly multiplications with cheaper additions, offer a computationally efficient approach to address the increasing computational and storage requirements of Large Language Models (LLMs). However, the severe quantization constraint ($\pm1$) can lead to significant accuracy degradation. In this paper, we propose Double Binary Factorization (DBF), a novel method that factorizes dense weight matrices into products of two binary (sign) matrices, each accompanied by scaling vectors. DBF preserves the efficiency advantages of binary representations while achieving compression rates that are competitive with or superior to state-of-the-art methods. Specifically, in a 1-bit per weight range, DBF is better than existing binarization approaches. In a 2-bit per weight range, DBF is competitive with the best quantization methods like QuIP\# and QTIP. Unlike most existing compression techniques, which offer limited compression level choices, DBF allows fine-grained control over compression ratios by adjusting the factorization’s intermediate dimension. Based on this advantage, we further introduce an algorithm for estimating non-uniform layer-wise compression ratios for DBF, based on previously developed channel pruning criteria. Code available at: https://github.com/usamec/double_binary

arxiv情報

著者	Vladimír Boža,Vladimír Macko
発行日	2025-06-17 16:42:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Addition is almost all you need: Compressing neural networks with double binary factorization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー