Sources of performance variability in deep learning-based polyp detection

要約

検証指標は、科学の進歩を確実に追跡し、方法の潜在的な臨床翻訳を決定するための重要な前提条件です。
最近のイニシアチブは、画像分析の問題におけるメトリック関連の落とし穴を理解するための包括的な理論的枠組みを開発することを目的としていますが、特定のアプリケーションに対する一般的およびまれな落とし穴の具体的な影響に関する実験的証拠は不足しています。
結腸癌スクリーニングの文脈で文献のこのギャップに対処します。
私たちの貢献は 2 つあります。
まず、2022 年の生物医学画像に関する IEEE 国際シンポジウム (ISBI) と併せて実施された、結腸癌検出に関する内視鏡コンピュータービジョンチャレンジ (EndoCV) の勝利ソリューションを紹介します。
ハイパーパラメータの影響だけでなく、メトリクスの選択が不適切な場合の結果。
6 つの臨床センターからの患者データを使用して実施された包括的な検証研究に基づいて、一般的に適用されるすべてのオブジェクト検出メトリックが、センター間の大きな変動の影響を受けることがわかりました。
さらに、私たちの結果は、コンピュータービジョンコミュニティで使用される標準的なハイパーパラメーターの適応が、一般的に臨床的に最も妥当な結果につながらないことを明確に示しています。
最後に、臨床的関連性によく対応するローカリゼーション基準を提示します。
私たちの仕事は、自動結腸がんスクリーニングアプリケーションにおける一般的な検証戦略を再考するための第一歩になる可能性があります.

要約(オリジナル)

Validation metrics are a key prerequisite for the reliable tracking of scientific progress and for deciding on the potential clinical translation of methods. While recent initiatives aim to develop comprehensive theoretical frameworks for understanding metric-related pitfalls in image analysis problems, there is a lack of experimental evidence on the concrete effects of common and rare pitfalls on specific applications. We address this gap in the literature in the context of colon cancer screening. Our contribution is twofold. Firstly, we present the winning solution of the Endoscopy computer vision challenge (EndoCV) on colon cancer detection, conducted in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2022. Secondly, we demonstrate the sensitivity of commonly used metrics to a range of hyperparameters as well as the consequences of poor metric choices. Based on comprehensive validation studies performed with patient data from six clinical centers, we found all commonly applied object detection metrics to be subject to high inter-center variability. Furthermore, our results clearly demonstrate that the adaptation of standard hyperparameters used in the computer vision community does not generally lead to the clinically most plausible results. Finally, we present localization criteria that correspond well to clinical relevance. Our work could be a first step towards reconsidering common validation strategies in automatic colon cancer screening applications.

arxiv情報

著者	Thuy Nuong Tran,Tim Adler,Amine Yamlahi,Evangelia Christodoulou,Patrick Godau,Annika Reinke,Minu Dietlinde Tizabi,Peter Sauer,Tillmann Persicke,Jörg Gerhard Albert,Lena Maier-Hein
発行日	2022-11-17 17:44:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sources of performance variability in deep learning-based polyp detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー