AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

要約

アフリカ系アメリカ人の現地語英語 (AAVE) の自然言語理解 (NLU) におけるバイアスを検出することは、包括的な自然言語処理 (NLP) システムを開発するために重要です。
方言に起因するパフォーマンスの不一致に対処するために、AAVE および標準アメリカ英語の NLU タスクにおける大規模言語モデル (LLM) のパフォーマンスを評価するためのベンチマークである AAVENUE ({AAVE} {N}atural Language {U}understanding {E}valuation) を導入します。
(SAE)。
AAVENUE は、VALUE などの既存のベンチマークを基盤にして拡張し、確定的な構文および形態素変換を、少数ショットプロンプトによる LLM ベースの翻訳を活用するより柔軟な方法論に置き換え、GLUE および SuperGLUE ベンチマークからの主要なタスクを翻訳する際の評価指標全体のパフォーマンスを向上させます。
5 つの一般的な LLM と、流暢さ、BARTScore、品質、一貫性、理解可能性などの包括的な指標セットを使用して、AAVENUE 翻訳と VALUE 翻訳を比較します。
さらに、翻訳の信頼性を検証するために、流暢な AAVE スピーカーを採用しています。
私たちの評価では、LLM が AAVE 翻訳バージョンよりも SAE タスクで一貫して優れたパフォーマンスを発揮することが明らかになり、固有のバイアスが強調され、より包括的な NLP モデルの必要性が強調されています。
私たちはソースコードを GitHub でオープンソース化し、https://aavenue.live で私たちの成果を紹介する Web サイトを作成しました。

要約(オリジナル)

Detecting biases in natural language understanding (NLU) for African American Vernacular English (AAVE) is crucial to developing inclusive natural language processing (NLP) systems. To address dialect-induced performance discrepancies, we introduce AAVENUE ({AAVE} {N}atural Language {U}nderstanding {E}valuation), a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English (SAE). AAVENUE builds upon and extends existing benchmarks like VALUE, replacing deterministic syntactic and morphological transformations with a more flexible methodology leveraging LLM-based translation with few-shot prompting, improving performance across our evaluation metrics when translating key tasks from the GLUE and SuperGLUE benchmarks. We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability. Additionally, we recruit fluent AAVE speakers to validate our translations for authenticity. Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases and highlighting the need for more inclusive NLP models. We have open-sourced our source code on GitHub and created a website to showcase our work at https://aavenue.live.

arxiv情報

著者	Abhay Gupta,Philip Meng,Ece Yurtseven,Sean O’Brien,Kevin Zhu
発行日	2024-12-04 13:43:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー