LaPA: Latent Prompt Assist Model For Medical Visual Question Answering


Medical Visual Question Answering (Med-VQA) は、医療画像と質問に対する正解の予測を自動化することを目的としており、これにより医師の反復作業の削減と業務負荷の軽減を支援します。
この論文では、医療視覚的質問応答のための Latent Prompt Assist モデル (LaPA) を提案します。
公開されている 3 つの Med-VQA データセットに関する実験結果は、LaPA が最先端のモデル ARL を上回り、VQA-RAD、SLAKE、および VQA-2019 でそれぞれ 1.83%、0.63%、および 1.80% の改善を達成したことを示しています。

コードは で公開されています。


Medical visual question answering (Med-VQA) aims to automate the prediction of correct answers for medical images and questions, thereby assisting physicians in reducing repetitive tasks and alleviating their workload. Existing approaches primarily focus on pre-training models using additional and comprehensive datasets, followed by fine-tuning to enhance performance in downstream tasks. However, there is also significant value in exploring existing models to extract clinically relevant information. In this paper, we propose the Latent Prompt Assist model (LaPA) for medical visual question answering. Firstly, we design a latent prompt generation module to generate the latent prompt with the constraint of the target answer. Subsequently, we propose a multi-modal fusion block with latent prompt fusion module that utilizes the latent prompt to extract clinical-relevant information from uni-modal and multi-modal features. Additionally, we introduce a prior knowledge fusion module to integrate the relationship between diseases and organs with the clinical-relevant information. Finally, we combine the final integrated information with image-language cross-modal information to predict the final answers. Experimental results on three publicly available Med-VQA datasets demonstrate that LaPA outperforms the state-of-the-art model ARL, achieving improvements of 1.83%, 0.63%, and 1.80% on VQA-RAD, SLAKE, and VQA-2019, respectively. The code is publicly available at


著者 Tiancheng Gu,Kaicheng Yang,Dongnan Liu,Weidong Cai
発行日 2024-04-19 17:51:52+00:00
