DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction

要約

医療コーディングなど、高次元または極端なマルチラベルを予測するには、精度と解釈可能性の両方が必要です。
既存の研究は多くの場合、ローカルな解釈可能性の手法に依存しており、マルチラベルセット内の各ラベル予測の背後にある全体的なメカニズムの包括的な説明を提供できません。
私たちは、解釈不可能な密な埋め込みを疎な埋め込み空間に解きほぐす、DIctionary Label Attendant (\method) と呼ばれる機械的解釈可能モジュールを提案します。ここで、各非ゼロ要素 (辞書特徴) は、グローバルに学習された医学概念を表します。
人間による評価を通じて、私たちの疎な埋め込みは、密な埋め込みよりも少なくとも 50 パーセント人間が理解しやすいことを示しています。
大規模言語モデル (LLM) を活用した当社の自動辞書機能識別パイプラインは、各辞書機能の最も有効化されたトークンを調べて要約することで、何千もの学習済みの医学概念を明らかにします。
私たちは、解釈可能なスパース行列を通じて辞書の特徴と医療コードの関係を表現し、人による広範な注釈なしで競争力のあるパフォーマンスとスケーラビリティを維持しながら、モデルの予測のメカニズムとグローバルな理解を強化します。

要約(オリジナル)

Predicting high-dimensional or extreme multilabels, such as in medical coding, requires both accuracy and interpretability. Existing works often rely on local interpretability methods, failing to provide comprehensive explanations of the overall mechanism behind each label prediction within a multilabel set. We propose a mechanistic interpretability module called DIctionary Label Attention (\method) that disentangles uninterpretable dense embeddings into a sparse embedding space, where each nonzero element (a dictionary feature) represents a globally learned medical concept. Through human evaluations, we show that our sparse embeddings are more human understandable than its dense counterparts by at least 50 percent. Our automated dictionary feature identification pipeline, leveraging large language models (LLMs), uncovers thousands of learned medical concepts by examining and summarizing the highest activating tokens for each dictionary feature. We represent the relationships between dictionary features and medical codes through a sparse interpretable matrix, enhancing the mechanistic and global understanding of the model’s predictions while maintaining competitive performance and scalability without extensive human annotation.

arxiv情報

著者	John Wu,David Wu,Jimeng Sun
発行日	2024-09-16 17:45:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DILA: Dictionary Label Attention for Mechanistic Interpretability in High-dimensional Multi-label Medical Coding Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー