Stable Vision Concept Transformers for Medical Diagnosis

要約

透明性は医療分野で最も重要な懸念であり、研究者が説明可能なAI（XAI）の領域を掘り下げるように促します。
これらのXAIメソッドの中で、コンセプトボトルネックモデル（CBMS）は、最近多くの注目を集めている概念機能を抽出するための概念的なレイヤーを生成することにより、モデルの潜在的な空間を人間的に理解しやすい高レベルの概念に制限することを目指しています。
ただし、既存の方法は、概念機能のみに依存してモデルの予測を決定します。モデルの予測は、医療画像内の本質的な特徴の埋め込みを見落としています。
元のモデルと概念ベースのモデルの間のこのユーティリティギャップに対処するために、Vision Concept Transformer（VCT）を提案します。
さらに、その利点にもかかわらず、CBMはモデルのパフォーマンスに悪影響を与えることがわかっており、入力摂動に直面した場合に安定した説明を提供できないため、医療分野での適用が制限されています。
この忠実さの問題に対処するために、このペーパーでは、VCTに基づいた安定したビジョンコンセプトトランス（SVCT）をさらに提案します。VCTは、視覚変圧器（VIT）をバックボーンとして活用し、概念レイヤーを組み込んでいます。
SVCTは、概念機能を採用して、画像機能を融合させることにより、意思決定機能を強化し、非拡散スムージングの統合を通じてモデルの忠実さを保証します。
4つの医療データセットでの包括的な実験は、VCTとSVCTがベースラインと比較して解釈可能でありながら精度を維持していることを示しています。
さらに、摂動にさらされた場合でも、SVCTモデルは一貫して忠実な説明を提供し、医療分野のニーズを満たしています。

要約(オリジナル)

Transparency is a paramount concern in the medical field, prompting researchers to delve into the realm of explainable AI (XAI). Among these XAI methods, Concept Bottleneck Models (CBMs) aim to restrict the model’s latent space to human-understandable high-level concepts by generating a conceptual layer for extracting conceptual features, which has drawn much attention recently. However, existing methods rely solely on concept features to determine the model’s predictions, which overlook the intrinsic feature embeddings within medical images. To address this utility gap between the original models and concept-based models, we propose Vision Concept Transformer (VCT). Furthermore, despite their benefits, CBMs have been found to negatively impact model performance and fail to provide stable explanations when faced with input perturbations, which limits their application in the medical field. To address this faithfulness issue, this paper further proposes the Stable Vision Concept Transformer (SVCT) based on VCT, which leverages the vision transformer (ViT) as its backbone and incorporates a conceptual layer. SVCT employs conceptual features to enhance decision-making capabilities by fusing them with image features and ensures model faithfulness through the integration of Denoised Diffusion Smoothing. Comprehensive experiments on four medical datasets demonstrate that our VCT and SVCT maintain accuracy while remaining interpretable compared to baselines. Furthermore, even when subjected to perturbations, our SVCT model consistently provides faithful explanations, thus meeting the needs of the medical field.

arxiv情報

著者	Lijie Hu,Songning Lai,Yuan Hua,Shu Yang,Jingfeng Zhang,Di Wang
発行日	2025-06-05 17:43:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Stable Vision Concept Transformers for Medical Diagnosis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー