Metrics reloaded: Pitfalls and recommendations for image analysis validation

要約

機械学習 (ML) アルゴリズムの検証における欠陥が過小評価されている世界的な問題であることを示す証拠が増えています。
特に自動生物医学画像解析では、選択されたパフォーマンスメトリクスがドメインの関心を反映していないことが多いため、科学の進歩を適切に測定できず、ML 技術の実践への移行が妨げられています。
これを克服するために、私たちの大規模な国際専門家コンソーシアムは、問題を意識した指標の選択において研究者を導く包括的なフレームワークである Metrics Reloaded を作成しました。
アプリケーションドメイン全体での ML 方法論の収束に続いて、Metrics Reloaded は検証方法論の収束を促進します。
このフレームワークは、多段階の Delphi プロセスで開発され、問題のフィンガープリントという新しい概念に基づいています。これは、ドメインの関心からオブジェクトのプロパティまで、メトリクスの選択に関連するすべての側面をキャプチャする、特定の問題の構造化された表現です。
ターゲット構造、データセット、およびアルゴリズム出力。
問題のフィンガープリントに基づいて、ユーザーは適切な検証指標を選択して適用するプロセスをガイドされ、潜在的な落とし穴を認識します。
Metrics Reloaded は、画像、オブジェクト、またはピクセルレベルでの分類タスク、つまり画像レベルの分類、オブジェクト検出、セマンティックセグメンテーション、およびインスタンスセグメンテーションタスクとして解釈できる画像分析の問題を対象としています。
ユーザーエクスペリエンスを向上させるために、Metrics Reloaded オンラインツールにフレームワークを実装しました。これは、最も一般的な検証指標の弱点、長所、および特定の推奨事項を調査するためのアクセスポイントも提供します。
ドメイン全体でのフレームワークの幅広い適用性は、さまざまな生物学的および医療画像分析のユースケースのインスタンス化によって実証されています。

要約(オリジナル)

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint – a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.

arxiv情報

著者	Lena Maier-Hein,Annika Reinke,Patrick Godau,Minu D. Tizabi,Florian Büttner,Evangelia Christodoulou,Ben Glocker,Fabian Isensee,Jens Kleesiek,Michal Kozubek,Mauricio Reyes,Michael A. Riegler,Manuel Wiesenfarth,Emre Kavur,Carole H. Sudre,Michael Baumgartner,Matthias Eisenmann,Doreen Heckmann-Nötzel,A. Tim Rädsch,Laura Acion,Michela Antonelli,Tal Arbel,Spyridon Bakas,Arriel Benis,Matthew Blaschko,M. Jorge Cardoso,Veronika Cheplygina,Beth A. Cimini,Gary S. Collins,Keyvan Farahani,Luciana Ferrer,Adrian Galdran,Bram van Ginneken,Robert Haase,Daniel A. Hashimoto,Michael M. Hoffman,Merel Huisman,Pierre Jannin,Charles E. Kahn,Dagmar Kainmueller,Bernhard Kainz,Alexandros Karargyris,Alan Karthikesalingam,Hannes Kenngott,Florian Kofler,Annette Kopp-Schneider,Anna Kreshuk,Tahsin Kurc,Bennett A. Landman,Geert Litjens,Amin Madani,Klaus Maier-Hein,Anne L. Martel,Peter Mattson,Erik Meijering,Bjoern Menze,Karel G. M. Moons,Henning Müller,Brennan Nichyporuk,Felix Nickel,Jens Petersen,Nasir Rajpoot,Nicola Rieke,Julio Saez-Rodriguez,Clara I. Sánchez,Shravya Shetty,Maarten van Smeden,Ronald M. Summers,Abdel A. Taha,Aleksei Tiulpin,Sotirios A. Tsaftaris,Ben Van Calster,Gaël Varoquaux,Paul F. Jäger
発行日	2023-02-10 10:03:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Metrics reloaded: Pitfalls and recommendations for image analysis validation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー