Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

要約

自己教師あり学習は、限られたラベル付きデータから知識を獲得するための最も有望なアプローチの 1 つです。
近年の大幅な進歩にも関わらず、自己教師ありモデルはモデルの信頼性と不確実性についての洞察を容易に提供しないため、実践者にとって課題となっています。
この問題に取り組むのは簡単なことではありません。これは主に、明示的なラベルに依存せずに事前トレーニング中に学習した潜在表現を利用できる手法の実装に複雑さが伴うためです。
これを動機として、不確実性と距離認識を自己教師あり学習 (SSL) パイプラインに統合する新しい確率的ビジョントランスフォーマーを導入します。
従来の決定論的ベクトル埋め込みの代わりに、私たちの新しい確率的ビジョン変換器は画像パッチを楕円ガウス分布埋め込みにエンコードします。
特に、これらの確率的表現埋め込みのアテンション行列は、Wasserstein 距離ベースのアテンションを使用して計算され、これらの埋め込みの分布的性質を効果的に利用しています。
さらに、事前トレーニングと微調整プロセスの両方に対して、ワッサーシュタイン距離に基づく正則化項を提案し、それによって距離認識を潜在表現に組み込みます。
私たちは、分布内の一般化、分布外の検出、データセットの破損、半教師あり設定、他のデータセットやタスクへの学習の転移など、さまざまなタスクにわたって広範な実験を実行します。
私たちが提案した手法は、優れた精度とキャリブレーションを実現し、さまざまなデータセットに対する幅広い実験において自己教師ありベースラインを上回りました。

要約(オリジナル)

Self-supervised learning is one of the most promising approaches to acquiring knowledge from limited labeled data. Despite the substantial advancements made in recent years, self-supervised models have posed a challenge to practitioners, as they do not readily provide insight into the model’s confidence and uncertainty. Tackling this issue is no simple feat, primarily due to the complexity involved in implementing techniques that can make use of the latent representations learned during pre-training without relying on explicit labels. Motivated by this, we introduce a new stochastic vision transformer that integrates uncertainty and distance awareness into self-supervised learning (SSL) pipelines. Instead of the conventional deterministic vector embedding, our novel stochastic vision transformer encodes image patches into elliptical Gaussian distributional embeddings. Notably, the attention matrices of these stochastic representational embeddings are computed using Wasserstein distance-based attention, effectively capitalizing on the distributional nature of these embeddings. Additionally, we propose a regularization term based on Wasserstein distance for both pre-training and fine-tuning processes, thereby incorporating distance awareness into latent representations. We perform extensive experiments across different tasks such as in-distribution generalization, out-of-distribution detection, dataset corruption, semi-supervised settings, and transfer learning to other datasets and tasks. Our proposed method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on a variety of datasets.

arxiv情報

著者	Franciskus Xaverius Erick,Mina Rezaei,Johanna Paula Müller,Bernhard Kainz
発行日	2023-11-30 15:53:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Stochastic Vision Transformers with Wasserstein Distance-Aware Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー