Embracing Massive Medical Data

要約

スキャン数が増加し、クラスが拡大し、ソースが変化する膨大な医療データが利用可能になるにつれ、固定された有限のデータセットに対してAIを複数回通過させて学習させるという、一般的な学習パラダイムは大きな課題に直面している。第一に、このような膨大なデータでAIを一度に訓練することは、新しいスキャン／ソース／クラスが次々と登場するため、現実的ではない。第二に、新しいスキャン／ソース／クラスでAIを継続的に訓練することは、AIが新しいデータを学習する際に古いデータを忘れてしまう、あるいはその逆の、壊滅的な忘却につながる可能性がある。これら2つの課題に対処するため、我々は膨大な医療データからAIを学習させるオンライン学習法を提案する。ランダムに選択したデータサンプルでAIを繰り返し訓練する代わりに、本手法は、データの独自性と予測の不確実性に基づいて、現在のAIモデルにとって最も重要なサンプルを特定し、これらの選択的なデータサンプルでAIを訓練する。一般的な学習パラダイムと比較して、本手法は、継続的なデータストリームに対する学習を可能にすることでデータ効率を向上させるだけでなく、そうでなければ忘れられてしまう可能性のある重要なデータサンプルを選択的にAIに学習させることで、壊滅的な忘却を緩和し、多臓器および腫瘍のセグメンテーションにおけるダイススコアで15%上回る性能を発揮する。コードはhttps://github.com/MrGiovanni/OnlineLearning。

要約(オリジナル)

As massive medical data become available with an increasing number of scans, expanding classes, and varying sources, prevalent training paradigms — where AI is trained with multiple passes over fixed, finite datasets — face significant challenges. First, training AI all at once on such massive data is impractical as new scans/sources/classes continuously arrive. Second, training AI continuously on new scans/sources/classes can lead to catastrophic forgetting, where AI forgets old data as it learns new data, and vice versa. To address these two challenges, we propose an online learning method that enables training AI from massive medical data. Instead of repeatedly training AI on randomly selected data samples, our method identifies the most significant samples for the current AI model based on their data uniqueness and prediction uncertainty, then trains the AI on these selective data samples. Compared with prevalent training paradigms, our method not only improves data efficiency by enabling training on continual data streams, but also mitigates catastrophic forgetting by selectively training AI on significant data samples that might otherwise be forgotten, outperforming by 15% in Dice score for multi-organ and tumor segmentation. The code is available at https://github.com/MrGiovanni/OnlineLearning

arxiv情報

著者	Yu-Cheng Chou,Zongwei Zhou,Alan Yuille
発行日	2024-07-05 17:50:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Embracing Massive Medical Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー