Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

要約

ビジュアルドメインアダプテーション（DA）は、トレーニングされたモデルを分布シフト全体で見えないラベルのないドメインに転送しようとしますが、アプローチは通常、監視されたImageNet表現で初期化された畳み込みニューラルネットワークアーキテクチャのアダプテーションに焦点を当てます。
この作業では、オブジェクト認識（ますます人気が高まっているVision Transformer（ViT））と自己監視学習（SSL）に基づく最新の事前トレーニングに最新のアーキテクチャを適応させることに焦点を移します。
マスキングまたはトリミングによって生成された部分的な画像入力からの学習に基づく最近のSSLアプローチの設計に触発されて（欠落しているピクセルを予測することを学習するか、そのような拡張に対する表現の不変性を学習することによって）、単純な2段階の適応であるPACMACを提案します
自己監視型ViTのアルゴリズム。
PACMACは、最初にプールされたソースおよびターゲットデータに対してドメイン内SSLを実行してタスク識別機能を学習し、次に、新しい注意条件付きマスキング戦略を介して生成された一連の部分的なターゲット入力全体でモデルの予測整合性を調べて、自己の信頼できる候補を識別します
-トレーニング。
私たちのシンプルなアプローチは、標準のオブジェクト認識ベンチマークでViTと自己監視による初期化を使用する競合するメソッドよりも一貫したパフォーマンスの向上につながります。
https://github.com/virajprabhu/PACMACで入手可能なコード

要約(オリジナル)

Visual domain adaptation (DA) seeks to transfer trained models to unseen, unlabeled domains across distribution shift, but approaches typically focus on adapting convolutional neural network architectures initialized with supervised ImageNet representations. In this work, we shift focus to adapting modern architectures for object recognition — the increasingly popular Vision Transformer (ViT) — and modern pretraining based on self-supervised learning (SSL). Inspired by the design of recent SSL approaches based on learning from partial image inputs generated via masking or cropping — either by learning to predict the missing pixels, or learning representational invariances to such augmentations — we propose PACMAC, a simple two-stage adaptation algorithm for self-supervised ViTs. PACMAC first performs in-domain SSL on pooled source and target data to learn task-discriminative features, and then probes the model’s predictive consistency across a set of partial target inputs generated via a novel attention-conditioned masking strategy, to identify reliable candidates for self-training. Our simple approach leads to consistent performance gains over competing methods that use ViTs and self-supervised initializations on standard object recognition benchmarks. Code available at https://github.com/virajprabhu/PACMAC

arxiv情報

著者	Viraj Prabhu,Sriram Yenamandra,Aaditya Singh,Judy Hoffman
発行日	2022-06-16 14:46:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー