Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

要約

トークン圧縮は、不注意なトークンを削除したり、類似のトークンをマージしたりするなど、冗長なトークンの数を減らすことで、ビジョントランスフォーマー (ViT) のトレーニングと推論を迅速化します。
ただし、これらのアプローチをダウンストリームタスクに適用すると、トレーニング段階と推論段階の間で圧縮度が一致しない場合にパフォーマンスが大幅に低下するため、既製のトレーニング済みモデルへのトークン圧縮の適用が制限されます。
この論文では、2 つの段階間の圧縮度を分離するためのモデル算術フレームワークを提案します。
事前に、事前トレーニングされたモデルに対してパラメーター効率の高い高速自己蒸留ステージをさらに実行して、さまざまな圧縮度にわたるモデル間のギャップを記述する Token Compensator (ToCom) と呼ばれる小さなプラグインを取得します。
推論中に、トレーニングと推論の圧縮度が一致しない下流の既製モデルに ToCom を直接挿入して、追加のトレーニングを行わずに普遍的なパフォーマンスの向上を得ることができます。
20 を超える下流タスクに関する実験により、フレームワークの有効性が実証されました。
CIFAR100、きめ細かい視覚分類、および VTAB-1k では、ToCom は DeiT-B の平均パフォーマンスをそれぞれ最大 2.3%、1.5%、および 2.0% 向上させることができます。
コード: https://github.com/JieShibo/ToCom

要約(オリジナル)

Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks, these approaches suffer from significant performance drop when the compression degrees are mismatched between training and inference stages, which limits the application of token compression on off-the-shelf trained models. In this paper, we propose a model arithmetic framework to decouple the compression degrees between the two stages. In advance, we additionally perform a fast parameter-efficient self-distillation stage on the pre-trained models to obtain a small plugin, called Token Compensator (ToCom), which describes the gap between models across different compression degrees. During inference, ToCom can be directly inserted into any downstream off-the-shelf models with any mismatched training and inference compression degrees to acquire universal performance improvements without further training. Experiments on over 20 downstream tasks demonstrate the effectiveness of our framework. On CIFAR100, fine-grained visual classification, and VTAB-1k, ToCom can yield up to a maximum improvement of 2.3%, 1.5%, and 2.0% in the average performance of DeiT-B, respectively. Code: https://github.com/JieShibo/ToCom

arxiv情報

著者	Shibo Jie,Yehui Tang,Jianyuan Guo,Zhi-Hong Deng,Kai Han,Yunhe Wang
発行日	2024-08-13 10:36:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー