Leveraging Text-to-Image Generation for Handling Spurious Correlation

要約

経験的リスク最小化でトレーニングされた深いニューラルネットワーク（ERM）は、トレーニングデータとテストデータの両方が同じドメインから来る場合、うまく機能しますが、多くの場合、分散分布サンプルに一般化できません。
画像分類では、これらのモデルは、ラベルと画像の無関係な特徴の間にしばしば存在する偽の相関に依存している可能性があり、それらの機能が存在しない場合に信頼できない予測を行います。
偽りの相関問題に対処するためのテキストから画像（T2I）拡散モデルを使用してトレーニングサンプルを生成する手法を提案します。
まず、テキストの反転メカニズムによってサンプルの因果成分に関連する視覚的特徴のための最適なトークンを計算します。
次に、言語セグメンテーション法と拡散モデルを活用して、因果成分と他のクラスの要素を組み合わせることにより、新しいサンプルを生成します。
また、ERMモデルの予測確率と属性スコアに基づいて生成されたサンプルを細心の注意を払って剪定して、目標の正しい構成を確保します。
最後に、拡張データセットのERMモデルを再訓練します。
このプロセスは、この相関が存在しない慎重に作成されたサンプルから学習することにより、モデルの偽の相関への依存を減らします。
私たちの実験は、さまざまなベンチマークで、私たちの手法が既存の最先端の方法よりも優れた最悪のグループの精度を達成することを示しています。

要約(オリジナル)

Deep neural networks trained with Empirical Risk Minimization (ERM) perform well when both training and test data come from the same domain, but they often fail to generalize to out-of-distribution samples. In image classification, these models may rely on spurious correlations that often exist between labels and irrelevant features of images, making predictions unreliable when those features do not exist. We propose a technique to generate training samples with text-to-image (T2I) diffusion models for addressing the spurious correlation problem. First, we compute the best describing token for the visual features pertaining to the causal components of samples by a textual inversion mechanism. Then, leveraging a language segmentation method and a diffusion model, we generate new samples by combining the causal component with the elements from other classes. We also meticulously prune the generated samples based on the prediction probabilities and attribution scores of the ERM model to ensure their correct composition for our objective. Finally, we retrain the ERM model on our augmented dataset. This process reduces the model’s reliance on spurious correlations by learning from carefully crafted samples for in which this correlation does not exist. Our experiments show that across different benchmarks, our technique achieves better worst-group accuracy than the existing state-of-the-art methods.

arxiv情報

著者	Aryan Yazdan Parast,Basim Azam,Naveed Akhtar
発行日	2025-03-21 15:28:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging Text-to-Image Generation for Handling Spurious Correlation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー