MEGA: Multilingual Evaluation of Generative AI

要約

タイトル：MEGA：多言語評価による生成型AIの評価

要約：
– 意味理解や推論、言語生成など自然言語処理タスクにおいて、生成型AIモデルは印象的なパフォーマンスを発揮している。
– AIコミュニティが現在最も重要としている質問の1つは、これらのモデルの能力と限界についてであり、生成型AIの評価は非常に困難であることが明らかである。
– ほとんどの大言語モデル（LLM）に関する研究は英語に限定されており、これらのモデルが他の言語を理解・生成する能力については不明である。
– 我々は、8つの多様なタスクと33の言語タイプにわたる標準的なNLPベンチマークでモデルを評価する、生成型LLMの初めての包括的ベンチマーキングであるMEGAを提案する。
– 我々はまた、これらのタスクで生成型LLMのパフォーマンスを非自己回帰モデルの最先端（SOTA）と比較して、LLMの前世代と比較して生成型モデルがどの程度の性能を発揮するかを決定する。
– 我々は、言語を横断的に解析することで、モデルの性能に関する徹底的な分析を行い、なぜ現在の生成型LLMがすべての言語に対して最適でないのかについて議論する。
– 我々は、多言語設定で生成型LLMを評価するための枠組みを作成し、将来の進展の方向を提供する。

要約(オリジナル)

Generative AI models have impressive performance on many Natural Language Processing tasks such as language understanding, reasoning and language generation. One of the most important questions that is being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative Large Language Models (LLMs) are restricted to English and it is unclear how capable these models are at understanding and generating other languages. We present the first comprehensive benchmarking of generative LLMs – MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. We also compare the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and discuss some of the reasons why generative LLMs are currently not optimal for all languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.

arxiv情報

著者	Kabir Ahuja,Rishav Hada,Millicent Ochieng,Prachi Jain,Harshita Diddee,Samuel Maina,Tanuja Ganu,Sameer Segal,Maxamed Axmed,Kalika Bali,Sunayana Sitaram
発行日	2023-04-03 05:57:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

MEGA: Multilingual Evaluation of Generative AI

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー