Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization

要約

大規模な事前に訓練されたモデルの最近の進歩は、少ないショット学習で有望な結果を示しました。
ただし、2次元分布（OOD）データ、つまり相関シフトと多様性シフトに関する一般化能力は徹底的に調査されていません。
調査によると、かなりの量のトレーニングデータがあっても、OOD一般化における標準的な経験的リスク最小化方法（ERM）よりも優れたパフォーマンスを実現できる方法はほとんどありません。
この少ないショットOOD一般化のジレンマは、深いニューラルネットワーク一般化研究の挑戦的な方向として浮上し、パフォーマンスは少数のショットの例とOOD一般化エラーに過度に適合することに苦しんでいます。
この論文では、より広い監督ソースを活用して、この問題に対処するために、新しいベイジアンクロスモーダル画像アライメント学習方法（ベイズ-CAL）を探ります。
具体的には、このモデルは、勾配直交化の損失と不変リスク最小化（IRM）損失を伴うベイジアンモデリングアプローチを介して、テキスト表現のみが微調整されるように設計されています。
ベイジアンアプローチは、トレーニング中に観察された基本クラスの過剰適合を避け、より広い目に見えないクラスへの一般化を改善するために、本質的に導入されています。
専用の損失は、画像機能の因果関係と非カジュアルな部分を解き放つことにより、より良い画像テキストアラインメントを実現するために導入されます。
数値実験は、ベイズ・カルが2次元分布シフトで最先端のOOD一般化パフォーマンスを達成したことを示しています。
さらに、Clipのようなモデルと比較して、Bayes-Calは、目に見えないクラスでより安定した一般化パフォーマンスをもたらします。
私たちのコードは、https：//github.com/linllll/bayescalで入手できます。

要約(オリジナル)

Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.

arxiv情報

著者	Lin Zhu,Xinbing Wang,Chenghu Zhou,Nanyang Ye
発行日	2025-04-22 10:59:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー