A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation

要約

光干渉断層撮影 (OCT) 画像の正確なセグメンテーションは、網膜疾患の診断とモニタリングに不可欠です。
ただし、ピクセルレベルのアノテーションには労働集約的な性質があるため、大規模なデータセットを使用した教師あり学習のスケーラビリティが制限されます。
弱監視セマンティックセグメンテーション (WSSS) は、画像レベルのラベルを活用することで有望な代替手段を提供します。
この研究では、構造ガイダンスとテキスト駆動戦略を統合して高品質の疑似ラベルを生成し、セグメンテーションのパフォーマンスを大幅に向上させる新しい WSSS アプローチを提案します。
視覚情報の点では、私たちの方法は生の画像特徴とOCT画像からの構造的特徴を交換する2つの処理モジュールを採用し、病変が発生する可能性が高い場所を特定するようにモデルを導きます。
テキスト情報に関しては、クロスドメインソースからの大規模な事前トレーニング済みモデルを利用して、ラベル情報によるテキストガイダンスと、ローカルの意味論的特徴と一貫した合成記述を組み合わせる 2 つのテキスト処理モジュールによる合成記述の統合を実装します。
これらの視覚的コンポーネントとテキストコンポーネントをマルチモーダルフレームワーク内で融合することにより、私たちのアプローチは病変位置特定の精度を高めます。
3 つの OCT データセットに関する実験結果は、私たちの方法が最先端のパフォーマンスを達成していることを実証し、医用画像診断の精度と効率を向上させる可能性を強調しています。

要約(オリジナル)

Accurate segmentation of Optical Coherence Tomography (OCT) images is crucial for diagnosing and monitoring retinal diseases. However, the labor-intensive nature of pixel-level annotation limits the scalability of supervised learning with large datasets. Weakly Supervised Semantic Segmentation (WSSS) provides a promising alternative by leveraging image-level labels. In this study, we propose a novel WSSS approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels, significantly improving segmentation performance. In terms of visual information, our method employs two processing modules that exchange raw image features and structural features from OCT images, guiding the model to identify where lesions are likely to occur. In terms of textual information, we utilize large-scale pretrained models from cross-domain sources to implement label-informed textual guidance and synthetic descriptive integration with two textual processing modules that combine local semantic features with consistent synthetic descriptions. By fusing these visual and textual components within a multimodal framework, our approach enhances lesion localization accuracy. Experimental results on three OCT datasets demonstrate that our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.

arxiv情報

著者	Jiaqi Yang,Nitish Mehta,Xiaoling Hu,Chao Chen,Chia-Ling Tsai
発行日	2024-11-19 16:20:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー