Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging

要約

視覚変圧器（VITS）は最近、医療画像の問題で優れた性能を示していますが、畳み込みニューラルネットワークなどの以前のアーキテクチャと同様の説明可能性の問題に直面しています。
最近の研究の取り組みは、VITの意思決定プロセスの一部である注意マップが、特に自己監視学習で前提としたモデルで、予測に影響を与える領域に影響を与えることにより、説明可能性の問題に潜在的に対処できることを示唆しています。
この作業では、注意マップの視覚的な説明を、医療画像の問題に他の一般的に使用される方法と比較します。
そのために、（1）結腸ポリープ、（2）乳房腫瘍、（3）食道炎症、および（4）骨骨折とハードウェアインプラントの識別を含む4つの異なる医療画像データセットを使用します。
前述のデータセットでの大規模な実験により、さまざまな監視された自己監視や自己監視の前のvitsを使用して、注意マップは特定の条件下で有望であることを示し、一般的に説明可能性のGRADCAMを上回りますが、変圧器固有の解釈可能性方法によってアウトパフォームされていることがわかります。
私たちの調査結果は、解釈可能性の方法としての注意マップの有効性はコンテキスト依存性であり、堅牢な医療意思決定に必要な包括的な洞察を一貫して提供しないため、制限される可能性があることを示しています。

要約(オリジナル)

Although Vision Transformers (ViTs) have recently demonstrated superior performance in medical imaging problems, they face explainability issues similar to previous architectures such as convolutional neural networks. Recent research efforts suggest that attention maps, which are part of decision-making process of ViTs can potentially address the explainability issue by identifying regions influencing predictions, especially in models pretrained with self-supervised learning. In this work, we compare the visual explanations of attention maps to other commonly used methods for medical imaging problems. To do so, we employ four distinct medical imaging datasets that involve the identification of (1) colonic polyps, (2) breast tumors, (3) esophageal inflammation, and (4) bone fractures and hardware implants. Through large-scale experiments on the aforementioned datasets using various supervised and self-supervised pretrained ViTs, we find that although attention maps show promise under certain conditions and generally surpass GradCAM in explainability, they are outperformed by transformer-specific interpretability methods. Our findings indicate that the efficacy of attention maps as a method of interpretability is context-dependent and may be limited as they do not consistently provide the comprehensive insights required for robust medical decision-making.

arxiv情報

著者	Minjae Chung,Jong Bum Won,Ganghyun Kim,Yujin Kim,Utku Ozbulak
発行日	2025-03-12 16:52:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー