Token-Label Alignment for Vision Transformers

要約

データ混合戦略 (CutMix など) は、畳み込みニューラルネットワーク (CNN) のパフォーマンスを大幅に向上させる能力を示しています。
トレーニング用の入力として 2 つの画像を混合し、同じ比率の混合ラベルを割り当てます。
それらはビジョントランスフォーマー (ViTs) に有効であることが示されていますが、データ混合戦略の可能性を抑制しているトークン変動現象を特定します。
入力トークンの寄与が順伝播として変動することを経験的に観察します。これにより、出力トークンに異なる混合比が生じる可能性があります。
したがって、元のデータ混合戦略によって計算されたトレーニングターゲットは不正確になり、トレーニングの効果が低下する可能性があります。
これに対処するために、変換されたトークンと元のトークンの間の対応を追跡して、各トークンのラベルを維持するトークンラベルアライメント (TL-Align) メソッドを提案します。
各レイヤーで計算された注意を再利用して、効率的なトークンとラベルのアライメントを行い、追加のトレーニングコストをごくわずかに抑えます。
広範な実験により、私たちの方法が画像分類、セマンティックセグメンテーション、客観的検出、および転移学習タスクでの ViT のパフォーマンスを向上させることが実証されています。
コードは https://github.com/Euphoria16/TL-Align で入手できます。

要約(オリジナル)

Data mixing strategies (e.g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs). They mix two images as inputs for training and assign them with a mixed label with the same ratio. While they are shown effective for vision transformers (ViTs), we identify a token fluctuation phenomenon that has suppressed the potential of data mixing strategies. We empirically observe that the contributions of input tokens fluctuate as forward propagating, which might induce a different mixing ratio in the output tokens. The training target computed by the original data mixing strategy can thus be inaccurate, resulting in less effective training. To address this, we propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token. We reuse the computed attention at each layer for efficient token-label alignment, introducing only negligible additional training costs. Extensive experiments demonstrate that our method improves the performance of ViTs on image classification, semantic segmentation, objective detection, and transfer learning tasks. Code is available at: https://github.com/Euphoria16/TL-Align.

arxiv情報

著者	Han Xiao,Wenzhao Zheng,Zheng Zhu,Jie Zhou,Jiwen Lu
発行日	2022-10-12 17:54:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Token-Label Alignment for Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー