Self-Distilled Vision Transformer for Domain Generalization

要約

最近、いくつかの領域汎化 (DG) 手法が提案され、有望なパフォーマンスが示されましたが、それらのほとんどすべてが畳み込みニューラルネットワーク (CNN) 上に構築されています。
ビジョントランスフォーマー (ViT) の DG パフォーマンスの研究はほとんど、またはまったく進んでいません。ViT は、多くの場合 i.i.d 仮定に基づいて構築された標準的なベンチマークで CNN の優位性に挑戦しています。
これにより、ViT の実際の展開が疑わしいものになります。
このホワイトペーパーでは、DG の問題に対処するための ViT の調査を試みます。
CNN と同様に、ViT も配布外のシナリオで苦労しており、主な原因はソースドメインへの過剰適合です。
ViT のモジュラーアーキテクチャに着想を得て、ViT の自己蒸留として造語された、ViT の単純な DG アプローチを提案します。
中間トランスフォーマーブロックの非ゼロエントロピー監視信号をキュレートすることにより、入出力マッピング問題の学習を容易にすることで、ソースドメインへの過適合を減らします。
さらに、新しいパラメーターを導入せず、さまざまな ViT のモジュール構成にシームレスにプラグインできます。
5 つの困難なデータセットで、さまざまな DG ベースラインとさまざまな ViT バックボーンを使用して、顕著なパフォーマンスの向上を経験的に示しています。
さらに、最近の最先端の DG メソッドに対して良好なパフォーマンスを報告します。
私たちのコードと事前トレーニング済みのモデルは、https://github.com/maryam089/SDViT で公開されています。

要約(オリジナル)

In recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks. Further, it does not introduce any new parameters and can be seamlessly plugged into the modular composition of different ViTs. We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets. Moreover, we report favorable performance against recent state-of-the-art DG methods. Our code along with pre-trained models are publicly available at: https://github.com/maryam089/SDViT

arxiv情報

著者	Maryam Sultana,Muzammal Naseer,Muhammad Haris Khan,Salman Khan,Fahad Shahbaz Khan
発行日	2022-08-12 14:40:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Distilled Vision Transformer for Domain Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー