Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation

要約

偏ったデータでトレーニングされたモデルに依存する一か八かのアプリケーションに対する懸念が高まっており、結果的に偏った予測が生成され、最も脆弱な人々に害を及ぼすことがよくあります。
特に、偏った医療データにより、健康関連アプリケーションやレコメンダーシステムが患者ケアを危険にさらし、健康転帰の格差を拡大する出力を作成する可能性があります。
「AI による公平性」と題された最近のフレームワークでは、研究者はモデルのバイアスを修正しようとするのではなく、AI を使用してデータのバイアスを取り除くことで根本原因に焦点を当てる必要があると主張しています。
このフレームワークに触発されて、私たちは LLM を含む NLP モデルを使用して医療カリキュラムにおけるバイアス検出に取り組み、大規模なコーパスからのバイアスについて医療専門家によって注釈が付けられた 4,105 件の抜粋を含むゴールドスタンダードデータセットで評価します。
私たちは、社会的識別子の用語を含む注釈のないテキストでネガティブサンプルのセットを強化する共著者による以前の研究を基礎にしています。
ただし、これらの用語の一部、特に人種や民族に関連する用語は、異なる意味をもつ場合があります (「脊髄白質」など)。
この問題に対処するために、単語センス曖昧さ回避モデルを使用して、無関係な文を削除することでデータセットの品質を向上させることを提案します。
次に、ゼロショットおよび数ショットプロンプトを使用して、BERT モデルと GPT モデルの微調整されたバリエーションを評価します。
多くの NLP タスクで SOTA とみなされる LLM はバイアス検出には不向きである一方、微調整された BERT モデルは一般に、評価されたすべてのメトリクスにわたって良好なパフォーマンスを発揮することがわかりました。

要約(オリジナル)

There have been growing concerns around high-stake applications that rely on models trained with biased data, which consequently produce biased predictions, often harming the most vulnerable. In particular, biased medical data could cause health-related applications and recommender systems to create outputs that jeopardize patient care and widen disparities in health outcomes. A recent framework titled Fairness via AI posits that, instead of attempting to correct model biases, researchers must focus on their root causes by using AI to debias data. Inspired by this framework, we tackle bias detection in medical curricula using NLP models, including LLMs, and evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus. We build on previous work by coauthors which augments the set of negative samples with non-annotated text containing social identifier terms. However, some of these terms, especially those related to race and ethnicity, can carry different meanings (e.g., ‘white matter of spinal cord’). To address this issue, we propose the use of Word Sense Disambiguation models to refine dataset quality by removing irrelevant sentences. We then evaluate fine-tuned variations of BERT models as well as GPT models with zero- and few-shot prompting. We found LLMs, considered SOTA on many NLP tasks, unsuitable for bias detection, while fine-tuned BERT models generally perform well across all evaluated metrics.

arxiv情報

著者	Gavin Butts,Pegah Emdad,Jethro Lee,Shannon Song,Chiman Salavati,Willmar Sosa Diaz,Shiri Dori-Hacohen,Fabricio Murai
発行日	2024-09-11 17:10:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー