Can Large Language Models Predict the Outcome of Judicial Decisions?

要約

大規模な言語モデル（LLM）は、多様なドメインにわたって自然言語処理（NLP）に優れた機能を示しています。
ただし、アラビア語のような低リソース言語の法的判断予測（LJP）などの専門的なタスクへの適用は、既存のままです。
この作業では、サウジアラビアの商業裁判所の判決から収集および前処理されたアラビア語のLJPデータセットを開発することにより、このギャップに対処します。
LORAを使用したゼロショット、ワンショット、微調整などのさまざまな構成の下で、Llama-3.2-3bおよびLlama-3.1-8bを含む最先端のオープンソースLLMをベンチマークします。
さらに、LLMを使用して定量的なメトリック（Bleu、Rouge、Bertなど）と定性的評価（一貫性、法律言語、明確さなどを含む）を統合する包括的な評価フレームワークを採用しました。
我々の結果は、微調整された小規模モデルが、タスク固有のコンテキストで大きなモデルに匹敵するパフォーマンスを達成しながら、重要なリソース効率を提供することを示しています。
さらに、多様な一連の指示にモデルを微調整することの影響を調査し、より人間中心の適応性のあるLLMの開発に関する貴重な洞察を提供します。
データセット、コード、モデルを公開され、アラビア語の法的NLPの将来の研究のための強固な基盤を提供するために公開されました。

要約(オリジナル)

Large Language Models (LLMs) have shown exceptional capabilities in Natural Language Processing (NLP) across diverse domains. However, their application in specialized tasks such as Legal Judgment Prediction (LJP) for low-resource languages like Arabic remains underexplored. In this work, we address this gap by developing an Arabic LJP dataset, collected and preprocessed from Saudi commercial court judgments. We benchmark state-of-the-art open-source LLMs, including LLaMA-3.2-3B and LLaMA-3.1-8B, under varying configurations such as zero-shot, one-shot, and fine-tuning using LoRA. Additionally, we employed a comprehensive evaluation framework that integrates both quantitative metrics (such as BLEU, ROUGE, and BERT) and qualitative assessments (including Coherence, Legal Language, Clarity, etc.) using an LLM. Our results demonstrate that fine-tuned smaller models achieve comparable performance to larger models in task-specific contexts while offering significant resource efficiency. Furthermore, we investigate the impact of fine-tuning the model on a diverse set of instructions, offering valuable insights into the development of a more human-centric and adaptable LLM. We have made the dataset, code, and models publicly available to provide a solid foundation for future research in Arabic legal NLP.

arxiv情報

著者	Mohamed Bayan Kmainasi,Ali Ezzat Shahroor,Amani Al-Ghraibah
発行日	2025-02-28 18:27:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Large Language Models Predict the Outcome of Judicial Decisions?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー