Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model–A Preliminary Release

要約

タイトル：Domain Mastery Benchmark：大規模言語モデルの包括的ドメイン知識評価のための常に更新されるベンチマーク–プレリリース
要約：本論文は、特定の主題、業界、分野、または特別な関心のある領域に対する深い理解、専門知識、および熟知を指すドメイン知識について述べています。現存するベンチマークは、すべてドメイン知識評価のための全体的な設計を欠いています。私たちは、ドメイン言語理解の本当の能力は包括的で深いベンチマークによってのみ公正に評価できると信じており、DomMa(ドンマ)というドメインマスタリーベンチマークを導入しました。DomMaは、中国の112の1次分類に基づいて継続的に更新される大量のデータセットと広範なドメインカバレッジを特徴とするLLMsのドメイン知識理解をテストすることを目的としています。DomMaには、中国語と英語の両方の100,000の質問が含まれ、中国の大学入学試験と学部試験から収集されています。さらに、LLMsにより適したベンチマークと評価プロセスの設計を提案しています。

要点：
– ベンチマークは、ドメイン知識に関して包括的な評価を提供する必要がある。
– DomMaは、大量のデータセットと広範なドメインカバレッジを特徴とするLLMsのドメイン知識理解をテストすることを目的とする。
– DomMaに含まれる100,000の質問は、中国の大学入学試験と学部試験から収集され、中国語と英語の両方で提供される。
– DomMaは、中国の112の1次分類に基づいて継続的に更新される。
– DomMaは、LLMsに適したベンチマークと評価プロセスの設計を提供する。

要約(オリジナル)

Domain knowledge refers to the in-depth understanding, expertise, and familiarity with a specific subject, industry, field, or area of special interest. The existing benchmarks are all lack of an overall design for domain knowledge evaluation. Holding the belief that the real ability of domain language understanding can only be fairly evaluated by an comprehensive and in-depth benchmark, we introduces the Domma, a Domain Mastery Benchmark. DomMa targets at testing Large Language Models (LLMs) on their domain knowledge understanding, it features extensive domain coverage, large data volume, and a continually updated data set based on Chinese 112 first-level subject classifications. DomMa consist of 100,000 questions in both Chinese and English sourced from graduate entrance examinations and undergraduate exams in Chinese college. We have also propose designs to make benchmark and evaluation process more suitable to LLMs.

arxiv情報

著者	Zhouhong Gu,Xiaoxuan Zhu,Haoning Ye,Lin Zhang,Zhuozhi Xiong,Zihan Li,Qianyu He,Sihang Jiang,Hongwei Feng,Yanghua Xiao
発行日	2023-04-23 15:11:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model–A Preliminary Release

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー