Towards Scalable and Cross-Lingual Specialist Language Models for Oncology

要約

臨床腫瘍学は、しばしば矛盾、情報の欠落、あいまいさを含む広大で非構造化されたデータを生成し、データ駆動型の意思決定のための信頼できる洞察を抽出することを困難にします。
一般的な大規模な言語モデル（LLMS）は、特殊な臨床用語、コンテキスト依存の解釈、マルチモーダルデータ統合など、ドメイン固有の推論の欠如により、これらの課題と闘っています。
これらの問題は、命令チューニング、検索された生成（RAG）、およびグラフベースの知識統合を組み合わせた、腫瘍学的、効率的で適応性のあるNLPフレームワークで対処します。
当社の軽量モデルは、名前付きエンティティ認識（がん診断の識別）、エンティティのリンク（エンティティを標準化されたオントロジーにリンクするなど）、TNMステージング、文書分類（病理学レポートからのがんサブ分類）、治療反応の予測などの腫瘍学固有のタスクで効果的であることが証明されています。
私たちのフレームワークは、適応性とリソース効率を強調しています。
ユニバーシティホスピタルチューリッヒ（USZ）で収集された最小限のドイツ語の指示を含めて、少量の英語データデータが言語間で知識を効果的に転送できるかどうかをテストします。
このアプローチは、軽量モデルの動機を反映しており、強力なパフォーマンスと計算コストの削減のバランスを取り、リソース制限されたヘルスケア設定に適しています。
腫瘍学データセットのモデルを検証し、名前付きエンティティ認識、関係抽出、およびドキュメント分類で強力な結果を示しました。

要約(オリジナル)

Clinical oncology generates vast, unstructured data that often contain inconsistencies, missing information, and ambiguities, making it difficult to extract reliable insights for data-driven decision-making. General-purpose large language models (LLMs) struggle with these challenges due to their lack of domain-specific reasoning, including specialized clinical terminology, context-dependent interpretations, and multi-modal data integration. We address these issues with an oncology-specialized, efficient, and adaptable NLP framework that combines instruction tuning, retrieval-augmented generation (RAG), and graph-based knowledge integration. Our lightweight models prove effective at oncology-specific tasks, such as named entity recognition (e.g., identifying cancer diagnoses), entity linking (e.g., linking entities to standardized ontologies), TNM staging, document classification (e.g., cancer subtype classification from pathology reports), and treatment response prediction. Our framework emphasizes adaptability and resource efficiency. We include minimal German instructions, collected at the University Hospital Zurich (USZ), to test whether small amounts of non-English language data can effectively transfer knowledge across languages. This approach mirrors our motivation for lightweight models, which balance strong performance with reduced computational costs, making them suitable for resource-limited healthcare settings. We validated our models on oncology datasets, demonstrating strong results in named entity recognition, relation extraction, and document classification.

arxiv情報

著者	Morteza Rohanian,Tarun Mehra,Nicola Miglino,Farhad Nooralahzadeh,Michael Krauthammer,Andreas Wicki
発行日	2025-03-11 11:34:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Scalable and Cross-Lingual Specialist Language Models for Oncology

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー