Preserving Locality in Vision Transformers for Class Incremental Learning

要約

タイトル：Class Incremental LearningにおけるVision Transformerにおけるローカル性の保存

要約：
– 現実世界のアプリケーションにおいて、学習済み分類モデルに新しいクラスを追加することが重要である。
– Vision Transformers(ViT)は最近、Class Incremental Learning(CIL)において素晴らしい性能を発揮している。
– 以前の研究では、ViTのブロックデザインやモデルの拡張に焦点を当てていた。
– しかし、本論文では、ViTが段階的に学習されると、注意層が徐々に局所的な特徴に関心を失うことがわかった。
– 本研究では、表現の転移性にとって重要な低レベルのローカル情報を保存することが有益であるため、注意レイヤーでローカル性を保存することを提案する。
– 本論文では、トレーニングプロセスが進むにつれて、モデルがより多くのローカル情報を保持するようにし、ローカル情報の重要性を強調するLocality-Preserved Attention(LPA)層を考案する。
– 具体的には、ローカル情報を直接バニラアテンションに組み込み、バニラアテンションの初期グラデーションを小さい初期値で重み付けして制御する。
– 多数の実験により、LPAによって促進された表現は、フォローアップタスクに転送しやすい低レベルの一般的な情報をより捉えることがわかった。
– 改良されたモデルは、CIFAR100とImageNet100で安定した性能向上を実現した。

要約(オリジナル)

Learning new classes without forgetting is crucial for real-world applications for a classification model. Vision Transformers (ViT) recently achieve remarkable performance in Class Incremental Learning (CIL). Previous works mainly focus on block design and model expansion for ViTs. However, in this paper, we find that when the ViT is incrementally trained, the attention layers gradually lose concentration on local features. We call this interesting phenomenon as \emph{Locality Degradation} in ViTs for CIL. Since the low-level local information is crucial to the transferability of the representation, it is beneficial to preserve the locality in attention layers. In this paper, we encourage the model to preserve more local information as the training procedure goes on and devise a Locality-Preserved Attention (LPA) layer to emphasize the importance of local features. Specifically, we incorporate the local information directly into the vanilla attention and control the initial gradients of the vanilla attention by weighting it with a small initial value. Extensive experiments show that the representations facilitated by LPA capture more low-level general information which is easier to transfer to follow-up tasks. The improved model gets consistently better performance on CIFAR100 and ImageNet100.

arxiv情報

著者	Bowen Zheng,Da-Wei Zhou,Han-Jia Ye,De-Chuan Zhan
発行日	2023-04-14 07:42:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Preserving Locality in Vision Transformers for Class Incremental Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー