FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers

要約

近年、Vision Transformer（ViT）モデルは、様々なコンピュータビジョンタスクにおいて徐々に主流になりつつあり、モデルのロバスト性が注目されている。しかし、既存の大規模なモデルは、学習時の性能を優先する傾向があり、ロバスト性を軽視する可能性がある。本論文では、標準的に訓練されたモデルの敵対的頑健性を迅速かつ効果的に高めるために、敵対的微調整のための少数の追加パラメータを使用する方法を探るという新たな課題を設定する。この課題を解決するために、従来のLoRAモジュールの前に、学習可能なレイヤーの正規化を組み込んだ、新しいLNLoRAモジュールを開発する。さらに、学習可能なLNLoRAモジュールをViTベースのモデルの全ての主要コンポーネントに統合することにより、FullLoRA-ATフレームワークを提案する。 CIFAR-10、CIFAR-100、Imagenetteを用いた広範な実験により、提案するFullLoRA-ATフレームワークの優位性が実証された。FullLoRA-ATは、学習可能なパラメータの約5%しか必要としない一方で、フルファインチューニングに匹敵するロバスト性を達成する。これはまた、敵対的ファインチューニングによって引き起こされる、余分なモデル記憶領域と膨大な学習時間に関する懸念にも効果的に対処している。

要約(オリジナル)

In recent years, the Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks, and the robustness of the model has received increasing attention. However, existing large models tend to prioritize performance during training, potentially neglecting the robustness, which may lead to serious security concerns. In this paper, we establish a new challenge: exploring how to use a small number of additional parameters for adversarial finetuning to quickly and effectively enhance the adversarial robustness of a standardly trained model. To address this challenge, we develop the novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module, which helps mitigate magnitude differences in parameters between the adversarial and standard training paradigms. Furthermore, we propose the FullLoRA-AT framework by integrating the learnable LNLoRA modules into all key components of ViT-based models while keeping the pretrained model frozen, which can significantly improve the model robustness via adversarial finetuning in a parameter-efficient manner. Extensive experiments on CIFAR-10, CIFAR-100, and Imagenette demonstrate the superiority of our proposed FullLoRA-AT framework. It achieves comparable robustness with full finetuning while only requiring about 5% of the learnable parameters. This also effectively addresses concerns regarding extra model storage space and enormous training time caused by adversarial finetuning.

arxiv情報

著者	Zheng Yuan,Jie Zhang,Shiguang Shan
発行日	2024-01-03 14:08:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー