BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP

要約

大規模言語モデル (LLM) は、言語生成やその他の言語固有のタスクにおける優れたスキルにより、NLP における最も重要なブレークスルーの 1 つとして浮上しています。
LLM は主に英語でさまざまなタスクで評価されていますが、ベンガル語 (バングラ) などのリソースが不足している言語ではまだ完全な評価を受けていません。
この目的を達成するために、この文書では、リソースがあまりないベンガル語でのパフォーマンスのベンチマークを行うための LLM の包括的な評価で構成される BenLLM-Eval を紹介します。
この点で、テキストの要約、質問応答、言い換え、自然言語推論、音訳、テキスト分類、人気のある LLM、つまり GPT-3.5 のゼロショット評価のための感情分析など、重要かつ多様なベンガル語 NLP タスクを選択します。
、LLaMA-2-13b-チャット、およびクロード-2。
私たちの実験結果は、一部のベンガル語 NLP タスクでは、ゼロショット LLM が現在の SOTA 微調整モデルと同等、またはそれ以上のパフォーマンスを達成できることを示しています。
ほとんどのタスクでは、現在の SOTA の結果と比較すると、そのパフォーマンスはかなり劣っています (LLaMA-2-13b-chat などのオープンソース LLM のパフォーマンスは著しく悪い)。
したがって、ベンガル語のようなリソースがそれほど多くない言語における LLM についての理解を深めるためのさらなる努力が求められています。

要約(オリジナル)

Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP for their impressive skills in language generation and other language-specific tasks. Though LLMs have been evaluated in various tasks, mostly in English, they have not yet undergone thorough evaluation in under-resourced languages such as Bengali (Bangla). To this end, this paper introduces BenLLM-Eval, which consists of a comprehensive evaluation of LLMs to benchmark their performance in the Bengali language that has modest resources. In this regard, we select various important and diverse Bengali NLP tasks, such as text summarization, question answering, paraphrasing, natural language inference, transliteration, text classification, and sentiment analysis for zero-shot evaluation of popular LLMs, namely, GPT-3.5, LLaMA-2-13b-chat, and Claude-2. Our experimental results demonstrate that while in some Bengali NLP tasks, zero-shot LLMs could achieve performance on par, or even better than current SOTA fine-tuned models; in most tasks, their performance is quite poor (with the performance of open-source LLMs like LLaMA-2-13b-chat being significantly bad) in comparison to the current SOTA results. Therefore, it calls for further efforts to develop a better understanding of LLMs in modest-resourced languages like Bengali.

arxiv情報

著者	Mohsinul Kabir,Mohammed Saidul Islam,Md Tahmid Rahman Laskar,Mir Tafseer Nayeem,M Saiful Bari,Enamul Hoque
発行日	2024-03-19 17:11:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー