ViT-Calibrator: Decision Stream Calibration for Vision Transformer

要約

タイトル：Vision TransformerのためのDecision Stream Calibration：ViT-Calibrator
要約：
– Transformersは優れた性能を持ち、さまざまなビジョンタスクに活用されているが、従来の手法は内部モデルアーキテクチャデザインの最適化に焦点を当てており、多くのトライアンドエラーによる高い負荷が生じている。
– 本論文では、新しいパラダイム「Decision Stream Calibration」を提案し、一般的なVision Transformerの性能を向上させることを目的としている。
– 学習手順における情報伝播メカニズムに焦点を当て、異なるトークンの相関関係と複数の次元の関連係数を探索することでこれを実現する。
– 更なる分析により、以下のことが分かった。
1. 最終決定は前景ターゲットのトークンに関連しており、前景ターゲットのトークン特徴量はできる限り次の層に伝達され、背景領域の無用なトークン特徴量は段階的に排除される。
2. 各カテゴリは、トークン内の特定の疎な次元にのみ関連付けられている。
– 上記の発見に基づいて、トークン伝播キャリブレーション段階と次元伝播キャリブレーション段階の2つの段階から成る二段階キャリブレーションスキームであるViT-Calibratorを設計した。
– 広く使用されるデータセットでの実験により、提案手法が有望な結果を達成できることが示された。ソースコードは付録に記載されている。

要約(オリジナル)

A surge of interest has emerged in utilizing Transformers in diverse vision tasks owing to its formidable performance. However, existing approaches primarily focus on optimizing internal model architecture designs that often entail significant trial and error with high burdens. In this work, we propose a new paradigm dubbed Decision Stream Calibration that boosts the performance of general Vision Transformers. To achieve this, we shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions. Upon further analysis, it was discovered that 1) the final decision is associated with tokens of foreground targets, while token features of foreground target will be transmitted into the next layer as much as possible, and the useless token features of background area will be eliminated gradually in the forward propagation. 2) Each category is solely associated with specific sparse dimensions in the tokens. Based on the discoveries mentioned above, we designed a two-stage calibration scheme, namely ViT-Calibrator, including token propagation calibration stage and dimension propagation calibration stage. Extensive experiments on commonly used datasets show that the proposed approach can achieve promising results. The source codes are given in the supplements.

arxiv情報

著者	Lin Chen,Zhijie Jia,Tian Qiu,Lechao Cheng,Jie Lei,Zunlei Feng,Mingli Song
発行日	2023-04-10 02:40:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー