CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

要約

医療ワークフローにおける深層学習ベースのソリューションの導入を制限する主な課題は、注釈付きデータの可用性と、そのようなシステムの解釈可能性の欠如です。
コンセプトボトルネックモデル (CBM) は、事前に定義された人間が解釈可能な一連の概念に基づいて最終的な疾患予測を制約することで、後者に取り組みます。
ただし、これらの概念ベースの説明によって解釈可能性が向上するということは、注釈の負担が増えることを意味します。
さらに、新しい概念を追加する必要がある場合は、システム全体を再トレーニングする必要があります。
数ショット設定でラージビジョン言語モデル (LVLM) が示す驚くべきパフォーマンスに触発され、前述の両方の課題に取り組む、シンプルでありながら効果的な方法論 CBVLM を提案します。
まず、コンセプトごとに、そのコンセプトが入力画像に存在するかどうかを LVLM に回答させます。
次に、LVLM に、以前の概念予測に基づいて画像を分類するように依頼します。
さらに、両方の段階で、コンテキスト内学習に最適な例を選択する役割を担う検索モジュールを組み込みます。
予測された概念に基づいて最終診断を行うことで説明可能性を確保し、LVLM の少数ショット機能を活用することでアノテーションのコストを大幅に削減します。
私たちは、4 つの医療データセットと 12 の LVLM (ジェネリックおよび医療の両方) にわたる広範な実験でアプローチを検証し、トレーニングを必要とせず、注釈付きのいくつかの例を使用するだけで、CBVLM が CBM およびタスク固有の教師あり手法よりも一貫して優れていることを示します。
詳細については、プロジェクトページ https://cristianopatricio.github.io/CBVLM/ をご覧ください。

要約(オリジナル)

The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.

arxiv情報

著者	Cristiano Patrício,Isabel Rio-Torto,Jaime S. Cardoso,Luís F. Teixeira,João C. Neves
発行日	2025-01-21 16:38:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー