AfroBench: How Good are Large Language Models on African Languages?

要約

メガなどの大規模な多言語評価には、高品質の評価データが不足しているため、既存のアフリカのデータセットの発見が限られているため、ほんの一握りのアフリカ言語しか含まれていません。
この表現の欠如は、多様な言語とタスクにわたる包括的なLLM評価を妨げます。
これらの課題に対処するために、Afrobenchを紹介します。これは、64のアフリカ言語、15のタスク、22のデータセットにわたるLLMのパフォーマンスを評価するためのマルチタスクベンチマークです。
Afrobenchは、9つの自然言語理解データセット、6つのテキスト生成データセット、6つの知識と質問に答えるタスク、および1つの数学的推論タスクで構成されています。
LLMSプロンプトのパフォーマンスを、BertおよびT5スタイルのモデルに基づいて微調整されたベースラインと比較する結果を提示します。
私たちの結果は、ほとんどのタスクにわたる英語やアフリカの言語など、高リソース言語間のパフォーマンスの大きなギャップを示唆しています。
しかし、パフォーマンスは、単一言語のデータリソースの可用性に基づいて異なります。
私たちの調査結果は、アフリカ言語のパフォーマンスが現在のLLMのハードルであり続けており、このギャップを埋めるための追加の努力の必要性を強調していることを確認しています。
https://mcgill-nlp.github.io/afrobench/

要約(オリジナル)

Large-scale multilingual evaluations, such as MEGA, often include only a handful of African languages due to the scarcity of high-quality evaluation data and the limited discoverability of existing African datasets. This lack of representation hinders comprehensive LLM evaluation across a diverse range of languages and tasks. To address these challenges, we introduce AfroBench — a multi-task benchmark for evaluating the performance of LLMs across 64 African languages, 15 tasks and 22 datasets. AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task. We present results comparing the performance of prompting LLMs to fine-tuned baselines based on BERT and T5-style models. Our results suggest large gaps in performance between high-resource languages, such as English, and African languages across most tasks; but performance also varies based on the availability of monolingual data resources. Our findings confirm that performance on African languages continues to remain a hurdle for current LLMs, underscoring the need for additional efforts to close this gap. https://mcgill-nlp.github.io/AfroBench/

arxiv情報

著者	Jessica Ojo,Odunayo Ogundepo,Akintunde Oladipo,Kelechi Ogueji,Jimmy Lin,Pontus Stenetorp,David Ifeoluwa Adelani
発行日	2025-02-26 15:16:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AfroBench: How Good are Large Language Models on African Languages?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー