Full Conformal Adaptation of Medical Vision-Language Models

要約

大規模に事前に訓練されたビジョン言語モデル（VLMS）は、前例のない移転性機能を示しており、医療画像分析に徐々に統合されています。
その差別的可能性は広く調査されていますが、その信頼性の側面は見過ごされ続けています。
この作業では、ますます人気のある分割コンフォーマル予測（SCP）フレームワークの下での動作を調査します。これは、ラベル付きキャリブレーションセットを活用することにより、出力セットの特定のエラーレベルを理論的に保証します。
ただし、VLMSのゼロショットパフォーマンスは本質的に制限されており、一般的な実践には、SCPの厳格な交換可能性の仮定を吸収できない少数のショット転送学習パイプラインが含まれます。
この問題を緩和するために、完全なコンフォーマル適応は、いくつかのショット適応セットを使用して各テストデータポイントで乳で動作する、事前に訓練された基礎モデルを共同で適応およびコンフォーマル化するための新しい設定を提案します。
さらに、このフレームワークは、このような導入アプローチの計算コストを緩和するVLMS用の新しいトレーニングフリーリニアプローバーソルバーであるSS-Textで補完します。
3つの異なるモダリティ特異的医療VLMと9つの適応タスクを使用して、包括的な実験を提供します。
私たちのフレームワークには、SCPとまったく同じデータが必要であり、同じカバレッジ保証を維持しながら、設定効率の最大27％の一貫した相対改善を提供します。

要約(オリジナル)

Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explored, its reliability aspect remains overlooked. This work investigates their behavior under the increasingly popular split conformal prediction (SCP) framework, which theoretically guarantees a given error level on output sets by leveraging a labeled calibration set. However, the zero-shot performance of VLMs is inherently limited, and common practice involves few-shot transfer learning pipelines, which cannot absorb the rigid exchangeability assumptions of SCP. To alleviate this issue, we propose full conformal adaptation, a novel setting for jointly adapting and conformalizing pre-trained foundation models, which operates transductively over each test data point using a few-shot adaptation set. Moreover, we complement this framework with SS-Text, a novel training-free linear probe solver for VLMs that alleviates the computational cost of such a transductive approach. We provide comprehensive experiments using 3 different modality-specialized medical VLMs and 9 adaptation tasks. Our framework requires exactly the same data as SCP, and provides consistent relative improvements of up to 27% on set efficiency while maintaining the same coverage guarantees.

arxiv情報

著者	Julio Silva-Rodríguez,Leo Fillioux,Paul-Henry Cournède,Maria Vakalopoulou,Stergios Christodoulidis,Ismail Ben Ayed,Jose Dolz
発行日	2025-06-06 13:32:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Full Conformal Adaptation of Medical Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー