CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

要約

大規模言語モデル (LLM) は、倫理的に関連する文脈でどのような応答をするでしょうか?
このペーパーでは、中国の LLM の道徳性評価のための大規模なベンチマーク CMoralEval を厳選します。
CMoralEval のデータソースは 2 つあります。1) 社会の話を交えて中国の道徳規範を議論する中国のテレビ番組、2) さまざまな新聞や道徳に関する学術論文からの中国の道徳的アノミーのコレクションです。
これらの情報源を使用して、多様性と信頼性を特徴とする道徳評価データセットを作成することを目指しています。
私たちは、伝統的な中国文化に根ざしているだけでなく、現代の社会規範とも一致する道徳分類法と一連の基本的な道徳原則を開発します。
CMoralEval でのインスタンスの効率的な構築とアノテーションを促進するために、AI 支援によるインスタンス生成を備えたプラットフォームを確立し、アノテーションプロセスを合理化します。
これらは、明示的な道徳シナリオ (14,964 インスタンス) と道徳的ジレンマシナリオ (15,424 インスタンス) の両方を網羅する CMoralEval をキュレートするのに役立ちます。それぞれに異なるデータソースからのインスタンスが含まれます。
私たちは、CMoralEval を使用して広範な実験を実施し、さまざまな中国の LLM を調べます。
実験結果は、CMoralEval が中国の LLM にとって困難なベンチマークであることを示しています。
データセットは \url{https://github.com/tjunlp-lab/CMoralEval} で公開されています。

要約(オリジナル)

What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on morality. With these sources, we aim to create a moral evaluation dataset characterized by diversity and authenticity. We develop a morality taxonomy and a set of fundamental moral principles that are not only rooted in traditional Chinese culture but also consistent with contemporary societal norms. To facilitate efficient construction and annotation of instances in CMoralEval, we establish a platform with AI-assisted instance generation to streamline the annotation process. These help us curate CMoralEval that encompasses both explicit moral scenarios (14,964 instances) and moral dilemma scenarios (15,424 instances), each with instances from different data sources. We conduct extensive experiments with CMoralEval to examine a variety of Chinese LLMs. Experiment results demonstrate that CMoralEval is a challenging benchmark for Chinese LLMs. The dataset is publicly available at \url{https://github.com/tjunlp-lab/CMoralEval}.

arxiv情報

著者	Linhao Yu,Yongqi Leng,Yufei Huang,Shang Wu,Haixin Liu,Xinmeng Ji,Jiahui Zhao,Jinwang Song,Tingting Cui,Xiaoqing Cheng,Tao Liu,Deyi Xiong
発行日	2024-08-19 09:15:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー