IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

要約

大規模言語モデル (LLM) の採用が世界中で増加しているため、LLM が世界の言語の多様性を代表することが不可欠です。
インドは言語的に多様な人口 14 億の国です。
多言語 LLM 評価の研究を促進するために、私たちは IndicGenBench をリリースしました。これは、13 の文字と 4 つの言語ファミリーをカバーする 29 のインド言語の多様なセットにわたる、ユーザー向けの生成タスクで LLM を評価するための最大のベンチマークです。
IndicGenBench は、複数言語の要約、機械翻訳、複数言語の質問応答などのさまざまな生成タスクで構成されています。
IndicGenBench は、人間によるキュレーションを通じて既存のベンチマークを多くのインド言語に拡張し、過小評価されている多くのインド言語に多方向の並列評価データを初めて提供します。
当社は、GPT-3.5、GPT-4、PaLM-2、mT5、Gemma、BLOOM、LLaMA を含む幅広い独自のオープンソース LLM を IndicGenBench でさまざまな設定で評価します。
最大の PaLM-2 モデルは、ほとんどのタスクで最高のパフォーマンスを発揮しますが、英語と比較するとすべての言語でパフォーマンスに大きな差があり、より包括的な多言語モデルの開発にはさらなる研究が必要であることが示されています。
IndicGenBench は www.github.com/google-research-datasets/indic-gen-bench でリリースされます。

要約(オリジナル)

As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench – the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate a wide range of proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largest PaLM-2 models performs the best on most tasks, however, there is a significant performance gap in all languages compared to English showing that further research is needed for the development of more inclusive multilingual language models. IndicGenBench is released at www.github.com/google-research-datasets/indic-gen-bench

arxiv情報

著者	Harman Singh,Nitish Gupta,Shikhar Bharadwaj,Dinesh Tewari,Partha Talukdar
発行日	2024-04-25 17:57:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー