Evaluating Multimodal Generative AI with Korean Educational Standards

要約

このペーパーでは、韓国の国家教育テストを使用してマルチモーダル生成AIシステムを評価するために設計された新しいベンチマークである韓国の国家教育テストベンチマーク（Konet）を紹介します。
Konetは4つの試験で構成されています。韓国小学校一般教育開発テスト（Koeged）、Middle（Komged）、High（Kohged）、およびCollege Scholastic Ability Test（Kocsat）。
これらの試験は、厳格な基準と多様な質問で有名であり、さまざまな教育レベルにわたるAIパフォーマンスの包括的な分析を促進します。
韓国語に焦点を当てることにより、Konetは、推測されていない言語でのモデルパフォーマンスに関する洞察を提供します。
困難、被験者の多様性、および人為的エラー率を調べることにより、オープンソース、オープンアクセス、および閉じたAPIのモデルの範囲を評価します。
コードとデータセットビルダーは、https://github.com/naver-ai/konetで完全にオープンソースにされます。

要約(オリジナル)

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models – open-source, open-access, and closed APIs – by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

arxiv情報

著者	Sanghee Park,Geewook Kim
発行日	2025-02-21 12:46:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluating Multimodal Generative AI with Korean Educational Standards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー