Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

要約

背景: 大規模言語モデルの最近の進歩により、高品質の多言語医療データセットの必要性が浮き彫りになっています。
日本は CT スキャナの導入と利用において世界をリードしていますが、日本の大規模な放射線データセットの欠如が医療画像解析用の特殊な言語モデルの開発を妨げています。
目的: 機械翻訳を通じて包括的な日本語 CT レポートデータセットを開発し、構造化所見分類のための特殊な言語モデルを確立すること。
さらに、専門の放射線科医によるレビューを通じて、厳密に検証された評価データセットを作成します。
方法：CT-RATE データセット（21,304 人の患者からの 24,283 件の CT レポート）を GPT-4o mini を使用して日本語に翻訳しました。
トレーニングデータセットは 22,778 件の機械翻訳されたレポートで構成され、検証データセットには 150 件の放射線科医が改訂したレポートが含まれていました。
私たちは、日本の放射線医学レポートから 18 件の構造化所見を抽出するために、「tohaku-nlp/bert-base-japanese-v3」アーキテクチャに基づいて CT-BERT-JPN を開発しました。
結果: 翻訳指標は、BLEU スコアが 0.731 および 0.690 で、ROUGE スコアが所見セクションで 0.770 ～ 0.876、インプレッションセクションで 0.748 ～ 0.857 の範囲で優れたパフォーマンスを示しました。
CT-BERT-JPN は、リンパ節腫脹 (+14.2%)、小葉間中隔肥厚 (+10.9%)、無気肺 (+7.4%) を含む 18 疾患中 11 疾患において GPT-4o と比較して優れたパフォーマンスを示しました。
このモデルは、18 条件中 14 条件で 0.95 を超える F1 スコアを維持し、4 条件で満点を達成しました。
結論: 私たちの研究は、堅牢な日本語 CT レポートデータセットを確立し、構造化された所見分類のための特殊な言語モデルの有効性を実証しました。
機械翻訳と専門家による検証のハイブリッドアプローチにより、高品質を維持しながら大規模な医療データセットの作成が可能になります。

要約(オリジナル)

Background: Recent advances in large language models highlight the need for high-quality multilingual medical datasets. While Japan leads globally in CT scanner deployment and utilization, the lack of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Objective: To develop a comprehensive Japanese CT report dataset through machine translation and establish a specialized language model for structured finding classification. Additionally, to create a rigorously validated evaluation dataset through expert radiologist review. Methods: We translated the CT-RATE dataset (24,283 CT reports from 21,304 patients) into Japanese using GPT-4o mini. The training dataset consisted of 22,778 machine-translated reports, while the validation dataset included 150 radiologist-revised reports. We developed CT-BERT-JPN based on ‘tohoku-nlp/bert-base-japanese-v3’ architecture for extracting 18 structured findings from Japanese radiology reports. Results: Translation metrics showed strong performance with BLEU scores of 0.731 and 0.690, and ROUGE scores ranging from 0.770 to 0.876 for Findings and from 0.748 to 0.857 for Impression sections. CT-BERT-JPN demonstrated superior performance compared to GPT-4o in 11 out of 18 conditions, including lymphadenopathy (+14.2%), interlobular septal thickening (+10.9%), and atelectasis (+7.4%). The model maintained F1 scores exceeding 0.95 in 14 out of 18 conditions and achieved perfect scores in four conditions. Conclusions: Our study establishes a robust Japanese CT report dataset and demonstrates the effectiveness of a specialized language model for structured finding classification. The hybrid approach of machine translation and expert validation enables the creation of large-scale medical datasets while maintaining high quality.

arxiv情報

著者	Yosuke Yamagishi,Yuta Nakamura,Tomohiro Kikuchi,Yuki Sonoda,Hiroshi Hirakawa,Shintaro Kano,Satoshi Nakamura,Shouhei Hanaoka,Takeharu Yoshikawa,Osamu Abe
発行日	2024-12-20 13:59:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー