Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts

要約

このペーパーでは、性別の職業代名詞を中和することにより医学文献で使用される大規模な言語モデル（LLM）の性別バイアスを緩和するためのパイプラインを紹介します。
1965年から1980年の379,000のPubMed要約のデータセットが処理され、職業に関連する代名詞を特定して変更しました。
Bertベースのモデル、「洗練されたトレーニングによる現代の職業バイアス除去」または「Mobert」を開発し、これらの中和された要約で訓練され、そのパフォーマンスを元のデータセットで訓練した「1965bert」と比較しました。
Mobertは70％の包括的代替レートを達成し、1965bertは4％しか達しませんでした。
モバートのさらなる分析により、代名詞置換精度は、トレーニングデータの職業用語の頻度と相関していることが明らかになりました。
データセットを拡張し、パイプラインを改良してパフォーマンスを改善し、医療用途でより公平な言語モデリングを確保することを提案します。

要約(オリジナル)

This paper presents a pipeline for mitigating gender bias in large language models (LLMs) used in medical literature by neutralizing gendered occupational pronouns. A dataset of 379,000 PubMed abstracts from 1965-1980 was processed to identify and modify pronouns tied to professions. We developed a BERT-based model, ‘Modern Occupational Bias Elimination with Refined Training,’ or ‘MOBERT,’ trained on these neutralized abstracts, and compared its performance with ‘1965BERT,’ trained on the original dataset. MOBERT achieved a 70% inclusive replacement rate, while 1965BERT reached only 4%. A further analysis of MOBERT revealed that pronoun replacement accuracy correlated with the frequency of occupational terms in the training data. We propose expanding the dataset and refining the pipeline to improve performance and ensure more equitable language modeling in medical applications.

arxiv情報

著者	Elizabeth Schaefer,Kirk Roberts
発行日	2025-05-28 15:06:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー