Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis

要約

急性および癒しが困難な創傷の効果的な認識は、創傷診断に必要なステップです。
効率的な分類モデルは、創傷の専門家が創傷の種類を財務コストと時間コストを減らして分類するのに役立ち、最適な治療法の決定にも役立ちます。
従来の機械学習モデルは機能の選択に悩まされており、通常、正確な認識のための面倒なモデルです。
最近、Deep Learning（DL）は、創傷診断の強力なツールとして浮上しています。
DLは創傷タイプの認識を約束しているようですが、モデルの効率と精度を改善するための大きな範囲がまだあります。
この研究では、DLベースのマルチモーダル分類器が創傷画像と対応する場所を使用して開発され、糖尿病、圧力、手術、静脈潰瘍などの複数のクラスに分類しました。
ボディマップも作成され、位置データを提供するために、創傷の専門家が創傷の位置をより効果的にラベル付けするのに役立ちます。
このモデルは、ビジョントランスを使用して、入力画像から階層的特徴、離散ウェーブレット変換（DWT）層を抽出して低周波数および高周波成分をキャプチャし、変圧器を抽出して空間機能を抽出します。
ニューロンの数と重量ベクターの最適化は、3つの群れベースの最適化技術（Monster Gorilla Toner（MGTO）、グレーウルフ最適化（IGWO）の改善）を使用して実行されました。
評価の結果は、最適化アルゴリズムを使用した重量ベクトル最適化が診断精度を高め、創傷検出に非常に効果的なアプローチになる可能性があることを示しています。
元のボディマップを使用した分類では、提案されたモデルは、画像データを使用して0.8123の精度と、画像データと創傷位置の組み合わせを使用して0.8007の精度を達成することができました。
また、最適化モデルと組み合わせたモデルの精度は0.7801から0.8342まで変化しました。

要約(オリジナル)

Effective recognition of acute and difficult-to-heal wounds is a necessary step in wound diagnosis. An efficient classification model can help wound specialists classify wound types with less financial and time costs and also help in deciding on the optimal treatment method. Traditional machine learning models suffer from feature selection and are usually cumbersome models for accurate recognition. Recently, deep learning (DL) has emerged as a powerful tool in wound diagnosis. Although DL seems promising for wound type recognition, there is still a large scope for improving the efficiency and accuracy of the model. In this study, a DL-based multimodal classifier was developed using wound images and their corresponding locations to classify them into multiple classes, including diabetic, pressure, surgical, and venous ulcers. A body map was also created to provide location data, which can help wound specialists label wound locations more effectively. The model uses a Vision Transformer to extract hierarchical features from input images, a Discrete Wavelet Transform (DWT) layer to capture low and high frequency components, and a Transformer to extract spatial features. The number of neurons and weight vector optimization were performed using three swarm-based optimization techniques (Monster Gorilla Toner (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization Algorithm). The evaluation results show that weight vector optimization using optimization algorithms can increase diagnostic accuracy and make it a very effective approach for wound detection. In the classification using the original body map, the proposed model was able to achieve an accuracy of 0.8123 using image data and an accuracy of 0.8007 using a combination of image data and wound location. Also, the accuracy of the model in combination with the optimization models varied from 0.7801 to 0.8342.

arxiv情報

著者	Ramin Mousa,Hadis Taherinia,Khabiba Abdiyeva,Amir Ali Bengari,Mohammadmahdi Vahediahmar
発行日	2025-04-14 17:39:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー