MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

要約

大規模言語モデル (LLM) は、高リソース言語では優れていますが、モンゴル語などの低リソース言語では顕著な課題に直面しています。
この論文では、能力を言語能力 (構文と意味論) と認知能力 (知識と推論) に分類することで、これらの課題に対処します。
これらの領域を体系的に評価するために、現代モンゴル語教科書 I に基づいて WebQSP および MGSM データセットで強化された特殊なデータセットである MM-Eval を開発しました。
Qwen2-7B-Instruct、GLM4-9b-chat、Llama3.1-8B-Instruct、GPT-4、DeepseekV2.5 を含むモデルでの予備実験により、次のことが明らかになりました。 1) すべてのモデルは、意味論的なタスクよりも構文論的なタスクで優れたパフォーマンスを示しました。
より深い言語理解のギャップ。
2) 知識タスクは緩やかな減少を示し、モデルが一般的な知識を高リソースのコンテキストから低リソースのコンテキストに伝達できることを示唆しています。
MM-Eval のリリースは、569 の構文、677 のセマンティクス、344 の知識、250 の推論タスクで構成されており、モンゴル語のような低リソース言語での NLP と LLM を進歩させるための貴重な洞察を提供します。
データセットは https://github.com/joenahm/MM-Eval で入手できます。

要約(オリジナル)

Large language models (LLMs) excel in high-resource languages but face notable challenges in low-resource languages like Mongolian. This paper addresses these challenges by categorizing capabilities into language abilities (syntax and semantics) and cognitive abilities (knowledge and reasoning). To systematically evaluate these areas, we developed MM-Eval, a specialized dataset based on Modern Mongolian Language Textbook I and enriched with WebQSP and MGSM datasets. Preliminary experiments on models including Qwen2-7B-Instruct, GLM4-9b-chat, Llama3.1-8B-Instruct, GPT-4, and DeepseekV2.5 revealed that: 1) all models performed better on syntactic tasks than semantic tasks, highlighting a gap in deeper language understanding; and 2) knowledge tasks showed a moderate decline, suggesting that models can transfer general knowledge from high-resource to low-resource contexts. The release of MM-Eval, comprising 569 syntax, 677 semantics, 344 knowledge, and 250 reasoning tasks, offers valuable insights for advancing NLP and LLMs in low-resource languages like Mongolian. The dataset is available at https://github.com/joenahm/MM-Eval.

arxiv情報

著者	Mengyuan Zhang,Ruihui Wang,Bo Xia,Yuan Sun,Xiaobing Zhao
発行日	2024-11-14 14:58:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー