Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

要約

ロングテール学習は広く注目を集めており、近年大きな進歩を遂げています。
ただし、事前にトレーニングされた事前知識があっても、モデルの末尾クラスに対する汎化パフォーマンスは依然として低いです。
有望な Sharpness-Aware Minimization (SAM) は、損失状況で平坦な最小値を求めることによりモデルの汎化能力を効果的に向上させることができますが、これには計算時間が 2 倍になるという代償が伴います。
SAM の更新ルールでは、各ステップで 2 つの連続した (並列化不可能な) 順伝播と逆伝播が必要になるためです。
この問題に対処するために、モデルの一般化を改善するために、各ステップで 1 ステップの勾配計算のみを必要とする、ランダム SAM プロンプトチューニング (RSAM-PT) と呼ばれる新しい方法を提案します。
具体的には、各勾配更新中にパラメータのランダムな近傍内で勾配降下方向を検索します。
テールクラスサンプルの影響を増幅し、過剰適合を回避するために、遅延再重み付けスキームを採用してテールクラスサンプルの重要性を高めます。
提案された RSAM-PT によって、特に尾部クラスの尾部データの分類精度が大幅に向上します。
RSAM-PT は、ベンチマークデータセット CIFAR100-LT (IF 100)、iNaturalist 2018、および Places-LT でそれぞれ 90.3\%、76.5\%、および 50.1\% という最先端のパフォーマンスを達成します。
ソースコードは https://github.com/Keke921/GNM-PT で一時的に入手できます。

要約(オリジナル)

Long-tail learning has garnered widespread attention and achieved significant progress in recent times. However, even with pre-trained prior knowledge, models still exhibit weaker generalization performance on tail classes. The promising Sharpness-Aware Minimization (SAM) can effectively improve the generalization capability of models by seeking out flat minima in the loss landscape, which, however, comes at the cost of doubling the computational time. Since the update rule of SAM necessitates two consecutive (non-parallelizable) forward and backpropagation at each step. To address this issue, we propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization, requiring only one-step gradient computation at each step. Specifically, we search for the gradient descent direction within a random neighborhood of the parameters during each gradient update. To amplify the impact of tail-class samples and avoid overfitting, we employ the deferred re-weight scheme to increase the significance of tail-class samples. The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes. RSAM-PT achieves the state-of-the-art performance of 90.3\%, 76.5\%, and 50.1\% on benchmark datasets CIFAR100-LT (IF 100), iNaturalist 2018, and Places-LT, respectively. The source code is temporarily available at https://github.com/Keke921/GNM-PT.

arxiv情報

著者	Mengke Li,Ye Liu,Yang Lu,Yiqun Zhang,Yiu-ming Cheung,Hui Huang
発行日	2024-10-28 13:58:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー