LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

要約

大規模言語モデル (LLM) は、自然言語処理タスクにおいて大幅な進歩を遂げ、法的領域において大きな可能性を示しています。
ただし、法的申請には高い基準の正確性、信頼性、公平性が要求されます。
既存の LLM の可能性と限界を慎重に評価せずに法制度に適用すると、法律実務において重大なリスクが生じる可能性があります。
この目的を達成するために、標準化された包括的な中国の法律ベンチマーク LexEval を導入します。
このベンチマークは、次の 3 つの側面で注目に値します。 (1) 能力モデリング: さまざまなタスクを整理するために、法的認知能力の新しい分類法を提案します。
(2) 規模: 私たちの知る限り、LexEval は現在中国最大の法的評価データセットであり、23 のタスクと 14,150 の質問で構成されています。
(3) データ: フォーマットされた既存のデータセット、試験データセット、および法律専門家によって新たに注釈が付けられたデータセットを利用して、LLM のさまざまな機能を包括的に評価します。
LexEval は、LLM が基本的な法律知識を適用する能力に焦点を当てるだけでなく、その適用に伴う倫理的問題の調査にも力を入れています。
私たちは 38 のオープンソースおよび商用 LLM を評価し、いくつかの興味深い結果を得ました。
この実験と調査結果は、中国の法制度と LLM 評価パイプラインの開発における課題と潜在的な解決策についての貴重な洞察を提供します。
LexEval データセットとリーダーボードは \url{https://github.com/CSHaitao/LexEval} で公開されており、継続的に更新されます。

要約(オリジナル)

Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks. (2) Scale: To our knowledge, LexEval is currently the largest Chinese legal evaluation dataset, comprising 23 tasks and 14,150 questions. (3) Data: we utilize formatted existing datasets, exam datasets and newly annotated datasets by legal experts to comprehensively evaluate the various capabilities of LLMs. LexEval not only focuses on the ability of LLMs to apply fundamental legal knowledge but also dedicates efforts to examining the ethical issues involved in their application. We evaluated 38 open-source and commercial LLMs and obtained some interesting findings. The experiments and findings offer valuable insights into the challenges and potential solutions for developing Chinese legal systems and LLM evaluation pipelines. The LexEval dataset and leaderboard are publicly available at \url{https://github.com/CSHaitao/LexEval} and will be continuously updated.

arxiv情報

著者	Haitao Li,You Chen,Qingyao Ai,Yueyue Wu,Ruizhe Zhang,Yiqun Liu
発行日	2024-11-26 15:35:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー