The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

要約

122 の言語バリエーションにわたる多肢選択機械読解 (MRC) データセットである Belebele を紹介します。
自然言語理解 (NLU) ベンチマークの対象言語を大幅に拡大するこのデータセットにより、高、中、低リソース言語でのテキストモデルの評価が可能になります。
各質問は Flores-200 データセットの短い一節に基づいており、4 つの多肢選択式の回答があります。
質問は、一般的な言語理解の異なるレベルのモデルを区別するために慎重に精選されました。
英語のデータセットは、それ自体が最先端の言語モデルに挑戦するのに十分難しいことが判明しています。
このデータセットは完全に並列であるため、すべての言語にわたるモデルのパフォーマンスを直接比較できます。
このデータセットを使用して、多言語マスク言語モデル (MLM) と大規模言語モデル (LLM) の機能を評価します。
私たちは広範な結果を提示し、英語中心の LLM における多言語間の移行にもかかわらず、バランスの取れた多言語データで事前トレーニングされたはるかに小規模な MLM は依然としてはるかに多くの言語を理解していることを発見しました。
また、より大きな語彙サイズと意識的な語彙構築が、低リソース言語でのパフォーマンスの向上と相関していることも観察しています。
全体として、Belebele は NLP システムの多言語機能を評価および分析するための新しい道を開きます。

要約(オリジナル)

We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.

arxiv情報

著者	Lucas Bandarkar,Davis Liang,Benjamin Muller,Mikel Artetxe,Satya Narayan Shukla,Donald Husa,Naman Goyal,Abhinandan Krishnan,Luke Zettlemoyer,Madian Khabsa
発行日	2023-08-31 17:43:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー