Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning

要約

タイトル：Hint-Aug：Foundation Vision Transformer からヒントを引き出して、ブーストされた few-shot パラメータ効率調整を実現する

要約：
– Foundation Vision Transformer（FViT）を downstream tasks で調整する需要が増えているが、データ制限がある場合（few-shot tuning）は、FViT のデータ飢餓状態のため、完全に FViT のポテンシャルを引き出すことは難しい。
– このチャレンジに対処するために、私たちは、pretrained FViT はすでに大規模な pretraining データから高度に代表的なフィーチャを学習しており、広く使われているパラメータ効率のチューニング中に完全に保持されているため、これらの学習されたフィーチャを利用して、チューニングデータを拡張することで few-shot FViT 調整の効果を高めることができると仮定している。
– したがって、私たちは Hint-based Data Augmentation（Hint-Aug）というフレームワークを提案し、事前学習済みの FViT から学習したフィーチャを使用して、チューニングサンプルの過学習部分を拡張することにより、few-shot FViT 調整をブーストすることを目的としています。
– Hint-Aug は、2つのキーイノベーションを統合しています。1つは、過学習を検出するための Attentive Over-fitting Detector（AOD）である。2つ目は、Confusion-based Feature Infusion（CFI）モジュールで、AOD で検出された過学習部分に pretrianed FViT から容易に混乱するフィーチャを注入して、調整中のフィーチャ多様性を向上させることを目的としている。
– 5つのデータセットと3つのパラメータ効率のチューニング技術に関する大規模な実験と除去実験により、Hint-Aug の効果が一貫して検証されています。たとえば、Pet データセットでは、Hint-Aug は SOTA データ拡張手法を50％減らして、2.22％高い正確度を達成しています。

要約(オリジナル)

Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs’ potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs’ data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug’s effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.

arxiv情報

著者	Zhongzhi Yu,Shang Wu,Yonggan Fu,Shunyao Zhang,Yingyan,Lin
発行日	2023-04-25 02:22:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー