FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

要約

基礎モデルは、医療ドメインでますます効果的になりつつあり、下流のタスクに容易に適応できる大規模なデータセットで事前に訓練されたモデルを提供しています。
進歩にもかかわらず、胎児の超音波画像は、固有の複雑さのために基礎モデルの困難なドメインのままであり、多くの場合、ペアのマルチモーダルデータが不足しているため、かなりの追加トレーニングと限界に直面する必要があります。
これらの課題を克服するために、胎児超音波画像の普遍的な表現を生成できるビジョン言語基盤モデルであるFetalClipを紹介します。
FetalClipは、210,035の胎児超音波画像とテキストと組み合わせた多様なデータセットでマルチモーダル学習アプローチを使用して事前に訓練されました。
これは、これまでの基礎モデル開発に使用されるこの種の最大のペアデータセットを表しています。
このユニークなトレーニングアプローチにより、FetalClipは胎児超音波画像に存在する複雑な解剖学的特徴を効果的に学習することで、さまざまなダウンストリームアプリケーションに使用できる堅牢な表現をもたらすことができます。
分類、妊娠年齢推定、先天性心臓欠陥（CHD）検出、胎児構造のセグメンテーションなど、さまざまな胎児超音波アプリケーションにわたる広範なベンチマークでは、フェタルクリップはすべてのベースラインを上回り、限られたラベルデータを使用しても顕著な一般化性と強力なパフォーマンスを示しました。
より広範な科学コミュニティの利益のために、FetalClipモデルを公開する予定です。

要約(オリジナル)

Foundation models are becoming increasingly effective in the medical domain, offering pre-trained models on large datasets that can be readily adapted for downstream tasks. Despite progress, fetal ultrasound images remain a challenging domain for foundation models due to their inherent complexity, often requiring substantial additional training and facing limitations due to the scarcity of paired multimodal data. To overcome these challenges, here we introduce FetalCLIP, a vision-language foundation model capable of generating universal representation of fetal ultrasound images. FetalCLIP was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text. This represents the largest paired dataset of its kind used for foundation model development to date. This unique training approach allows FetalCLIP to effectively learn the intricate anatomical features present in fetal ultrasound images, resulting in robust representations that can be used for a variety of downstream applications. In extensive benchmarking across a range of key fetal ultrasound applications, including classification, gestational age estimation, congenital heart defect (CHD) detection, and fetal structure segmentation, FetalCLIP outperformed all baselines while demonstrating remarkable generalizability and strong performance even with limited labeled data. We plan to release the FetalCLIP model publicly for the benefit of the broader scientific community.

arxiv情報

著者	Fadillah Maani,Numan Saeed,Tausifa Saleem,Zaid Farooq,Hussain Alasmawi,Werner Diehl,Ameera Mohammad,Gareth Waring,Saudabi Valappi,Leanne Bricker,Mohammad Yaqub
発行日	2025-02-20 18:30:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー