QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

要約

最近、トレーニング後の量子化 (PTQ) が、長時間の再トレーニングなしで効率的なニューラルネットワークを生成することに多くの注目を集めています。
その低コストにもかかわらず、現在の PTQ 作品は非常に低いビット設定で失敗する傾向があります。
この研究では、活性化量子化をPTQ再構成に適切に組み込むことが最終的な精度に利益をもたらすことを先駆的に確認しています。
固有の理由を深く理解するために、理論的フレームワークが確立され、キャリブレーションおよびテストデータに対する最適化された低ビットモデルの平坦性が重要であることを示しています。
結論に基づいて、PTQ 中のアクティベーションの量子化をランダムにドロップする、QDROP と呼ばれるシンプルで効果的なアプローチが提案されています。
コンピュータビジョン (画像分類、オブジェクト検出) や自然言語処理 (テキスト分類、質問応答) などのさまざまなタスクに関する広範な実験により、その優位性が証明されています。
QDROP を使用すると、PTQ の制限が初めて 2 ビットのアクティブ化にプッシュされ、精度が最大 51.49% 向上します。
付加機能がなければ、QDROP は PTQ の新しい最先端技術を確立します。
コードは https://github.com/wimh966/QDrop で入手でき、MQBench (https://github.com/ModelTC/MQBench) に統合されています。

要約(オリジナル)

Recently, post-training quantization (PTQ) has driven much attention to produce efficient neural networks without long-time retraining. Despite its low cost, current PTQ works tend to fail under the extremely low-bit setting. In this study, we pioneeringly confirm that properly incorporating activation quantization into the PTQ reconstruction benefits the final accuracy. To deeply understand the inherent reason, a theoretical framework is established, indicating that the flatness of the optimized low-bit model on calibration and test data is crucial. Based on the conclusion, a simple yet effective approach dubbed as QDROP is proposed, which randomly drops the quantization of activations during PTQ. Extensive experiments on various tasks including computer vision (image classification, object detection) and natural language processing (text classification and question answering) prove its superiority. With QDROP, the limit of PTQ is pushed to the 2-bit activation for the first time and the accuracy boost can be up to 51.49%. Without bells and whistles, QDROP establishes a new state of the art for PTQ. Our code is available at https://github.com/wimh966/QDrop and has been integrated into MQBench (https://github.com/ModelTC/MQBench)

arxiv情報

著者	Xiuying Wei,Ruihao Gong,Yuhang Li,Xianglong Liu,Fengwei Yu
発行日	2023-02-21 11:24:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー