Self-Distilled Vision Transformer for Domain Generalization

要約

最近、いくつかのドメイン一般化（DG）手法が提案され、有望なパフォーマンスを示していますが、それらのほとんどすべてが畳み込みニューラルネットワーク（CNN）に基づいています。
多くの場合、i.i.dの仮定に基づいて構築された、標準ベンチマークでのCNNの優位性に挑戦しているビジョントランスフォーマー（ViT）のDGパフォーマンスの研究はほとんどまたはまったく進歩していません。
これにより、ViTの実際の展開は疑わしいものになります。
このホワイトペーパーでは、DGの問題に対処するためのViTの調査を試みます。
CNNと同様に、ViTも配布外のシナリオで苦労し、主な原因はソースドメインへの過剰適合です。
ViTのモジュラーアーキテクチャに触発されて、ViTの自己蒸留として造られたViTのシンプルなDGアプローチを提案します。
中間変圧器ブロックの非ゼロエントロピー監視信号をキュレートすることにより、入出力マッピング問題の学習を容易にすることにより、ソースドメインへの過剰適合を低減します。
さらに、新しいパラメーターを導入せず、さまざまなViTのモジュラー構成にシームレスにプラグインできます。
5つの挑戦的なデータセットで、さまざまなDGベースラインとさまざまなViTバックボーンを使用した場合の顕著なパフォーマンスの向上を経験的に示しています。
さらに、最近の最先端のDG法に対して良好なパフォーマンスを報告します。
事前にトレーニングされたモデルとともにコードは、https：//github.com/maryam089/SDViTで公開されています。

要約(オリジナル)

In recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks. Further, it does not introduce any new parameters and can be seamlessly plugged into the modular composition of different ViTs. We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets. Moreover, we report favorable performance against recent state-of-the-art DG methods. Our code along with pre-trained models are publicly available at: https://github.com/maryam089/SDViT

arxiv情報

著者	Maryam Sultana,Muzammal Naseer,Muhammad Haris Khan,Salman Khan,Fahad Shahbaz Khan
発行日	2022-07-25 17:57:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Distilled Vision Transformer for Domain Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー