OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

要約

ハンガリー語と詳細に焦点を当てたLLMSの最初のベンチマークであるOpenHuevalを紹介します。
OpenHuevalは、複数の起源から供給されたハンガリー固有の材料の膨大なコレクションから構築されています。
構造では、インターネットからの実際のユーザークエリの使用、LLMSの生成能力の評価を強調し、LLM-As-Judgeを使用して評価の多次元性と精度を高めるなど、LLMを評価するための最新の設計原則を組み込みました。
最終的に、OpenHuevalは、5つのタスクと3953の質問を備えた8つのハンガリー固有の次元を網羅しています。
その結果、OpenHuevalは、ハンガリー語とその詳細の文脈におけるLLMパフォーマンスの包括的な、詳細な、科学的に正確な評価を提供します。
従来のLLMと最近開発された大規模な推論モデルの両方を含む、現在の主流LLMを評価しました。
結果は、ハンガリー語と詳細に合わせた評価とモデルの最適化の重要な必要性を示しています。
また、OpenHuevalでLRMの思考プロセスを分析するためのフレームワークを確立し、英語以外の言語でこれらのモデルの本質的なパターンとメカニズムを明らかにし、ハンガリーは代表的な例として機能します。
https://github.com/opendatalab/openhuevalでOpenHuevalをリリースします。

要約(オリジナル)

We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs’ generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. Ultimately, OpenHuEval encompasses eight Hungarian-specific dimensions, featuring five tasks and 3953 questions. Consequently, OpenHuEval provides the comprehensive, in-depth, and scientifically accurate assessment of LLM performance in the context of the Hungarian language and its specifics. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate the significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also established the framework for analyzing the thinking processes of LRMs with OpenHuEval, revealing intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example. We will release OpenHuEval at https://github.com/opendatalab/OpenHuEval .

arxiv情報

著者	Haote Yang,Xingjian Wei,Jiang Wu,Noémi Ligeti-Nagy,Jiaxing Sun,Yinfan Wang,Zijian Győző Yang,Junyuan Gao,Jingchao Wang,Bowen Jiang,Shasha Wang,Nanjun Yu,Zihao Zhang,Shixin Hong,Hongwei Liu,Wei Li,Songyang Zhang,Dahua Lin,Lijun Wu,Gábor Prószéky,Conghui He
発行日	2025-03-27 13:40:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー