Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

要約

既存のパラメータ効率の高い微調整 (PEFT) 手法は、パラメータの効率を向上させることで、ビジョントランスフォーマー (ViT) の適応において大きな成功を収めています。
ただし、適応中の推論効率を高める方法の探求はまだ十分に検討されていません。
これにより、特にモデルの計算量が膨大な場合、事前トレーニングされた ViT モデルの広範な適用が制限されます。
この論文では、ViT 適応のためのパラメータと推論効率の両方を向上させる新しいアプローチであるダイナミックチューニング (DyT) を提案します。
具体的には、軽量のアダプターモジュールを使用することに加えて、有益なトークンと重要性の低いトークンを区別するトークンディスパッチャーを提案します。これにより、後者が元のブロックを動的にスキップできるようになり、推論中の冗長な計算が削減されます。
さらに、DyT のベストプラクティスを見つけるために、複数の設計バリアントを調査します。
最後に、専門家混合 (MoE) メカニズムにヒントを得て、適応パフォーマンスをさらに高める強化されたアダプターを導入します。
画像/ビデオ認識やセマンティックセグメンテーションなど、さまざまなタスクにわたって DyT を検証します。
たとえば、DyT は既存の PEFT 手法と比較して優れたパフォーマンスを達成しながら、VTAB-1K ベンチマークでは FLOP の 71% のみを引き起こします。

要約(オリジナル)

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.

arxiv情報

著者	Wangbo Zhao,Jiasheng Tang,Yizeng Han,Yibing Song,Kai Wang,Gao Huang,Fan Wang,Yang You
発行日	2024-10-16 14:18:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー