Enhancing LLMs with Smart Preprocessing for EHR Analysis

要約

大規模な言語モデル（LLM）は、自然言語処理において顕著な習熟度を示しています。
ただし、特に電子健康記録（EHRS）の処理において、ヘルスケアなどの繊細なドメインでの適用は、限られた計算リソースとプライバシーの懸念によって制約されています。
このペーパーでは、厳しいプライバシー要件と高性能GPUへのアクセスが制限された環境でのローカル展開のために最適化されたコンパクトLLMフレームワークを紹介します。
私たちのアプローチは、臨床ノートから重要な情報を抽出して強調するために、正規表現（regex）や検索の生成（RAG）を含むシンプルでありながら強力な前処理技術を活用しています。
長く非構造化されていないテキストを事前にフィルタリングすることにより、EHR関連のタスクでのより小さなLLMのパフォーマンスを向上させます。
私たちのフレームワークは、プライベートおよび公開されているデータセット（Mimic-IV）の両方でゼロショットと少数の学習パラダイムを使用して評価され、模倣IVで微調整されたLLMとの追加の比較があります。
実験結果は、私たちの前処理戦略がより小さなLLMのパフォーマンスを大幅に充電し、プライバシーに敏感でリソースに制約のあるアプリケーションに適していることを示しています。
この研究は、ローカル、安全な、効率的なヘルスケアアプリケーションのLLMパフォーマンスを最適化するための貴重な洞察を提供します。
プライバシー、計算の実現可能性、臨床的適用性に関連する課題に取り組む一方で、LLMの実世界の展開のための実用的なガイダンスを提供します。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing; however, their application in sensitive domains such as healthcare, especially in processing Electronic Health Records (EHRs), is constrained by limited computational resources and privacy concerns. This paper introduces a compact LLM framework optimized for local deployment in environments with stringent privacy requirements and restricted access to high-performance GPUs. Our approach leverages simple yet powerful preprocessing techniques, including regular expressions (regex) and Retrieval-Augmented Generation (RAG), to extract and highlight critical information from clinical notes. By pre-filtering long, unstructured text, we enhance the performance of smaller LLMs on EHR-related tasks. Our framework is evaluated using zero-shot and few-shot learning paradigms on both private and publicly available datasets (MIMIC-IV), with additional comparisons against fine-tuned LLMs on MIMIC-IV. Experimental results demonstrate that our preprocessing strategy significantly supercharges the performance of smaller LLMs, making them well-suited for privacy-sensitive and resource-constrained applications. This study offers valuable insights into optimizing LLM performance for local, secure, and efficient healthcare applications. It provides practical guidance for real-world deployment for LLMs while tackling challenges related to privacy, computational feasibility, and clinical applicability.

arxiv情報

著者	Yixiang Qu,Yifan Dai,Shilin Yu,Pradham Tanikella,Travis Schrank,Trevor Hackman,Didong Li,Di Wu
発行日	2025-04-24 13:07:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing LLMs with Smart Preprocessing for EHR Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー