Evaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors

要約

ニューラルネットワークは、特にがんの検出と分類において、医療診断の標準的な手法となっています。
この作業では、磁気共鳴画像（MRI）およびコンピューター断層撮影（CT）スキャンのいくつかのデータセットで、Swin TransformerやMaxvitを含むVision Transformers Architecturesのパフォーマンスを評価します。
脳、肺、腎臓の腫瘍を備えた3つのトレーニングセットの画像を使用しました。
各データセットには、脳膠腫や髄膜腫から良性および悪性肺の状態、嚢胞や癌などの腎臓の異常に至るまで、さまざまな分類ラベルが含まれています。
この作業の目的は、各データセットにおけるニューラルネットワークの動作と、さまざまな画像のモダリティと腫瘍クラスを組み合わせることの利点を分析することです。
複合データセットと個々のデータセットでモデルを微調整することにより、いくつかの実験を設計しました。
結果は、SWINトランスが高精度を提供し、個々のデータセットで平均で最大99 \％、複合データセットで99.4 \％精度を達成することを明らかにしました。
この研究は、さまざまな画像のモダリティと機能へのトランスベースのモデルの適応性を強調しています。
ただし、限られた注釈付きデータや解釈可能性の問題を含む課題は続きます。
将来の作業は、他の画像のモダリティを組み込み、診断機能を強化することにより、この研究を拡大します。
これらのモデルを多様なデータセットに統合すると、精密医療の大幅な進歩を遂げると、より効率的で包括的なヘルスケアソリューションへの道が開かれます。

要約(オリジナル)

Neural networks have become the standard technique for medical diagnostics, especially in cancer detection and classification. This work evaluates the performance of Vision Transformers architectures, including Swin Transformer and MaxViT, in several datasets of magnetic resonance imaging (MRI) and computed tomography (CT) scans. We used three training sets of images with brain, lung, and kidney tumors. Each dataset includes different classification labels, from brain gliomas and meningiomas to benign and malignant lung conditions and kidney anomalies such as cysts and cancers. This work aims to analyze the behavior of the neural networks in each dataset and the benefits of combining different image modalities and tumor classes. We designed several experiments by fine-tuning the models on combined and individual datasets. The results revealed that the Swin Transformer provided high accuracy, achieving up to 99\% on average for individual datasets and 99.4\% accuracy for the combined dataset. This research highlights the adaptability of Transformer-based models to various image modalities and features. However, challenges persist, including limited annotated data and interpretability issues. Future work will expand this study by incorporating other image modalities and enhancing diagnostic capabilities. Integrating these models across diverse datasets could mark a significant advance in precision medicine, paving the way for more efficient and comprehensive healthcare solutions.

arxiv情報

著者	Óscar A. Martín,Javier Sánchez
発行日	2025-06-16 15:10:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー