PromptBench: A Unified Library for Evaluation of Large Language Models

要約

大規模言語モデル (LLM) の評価は、そのパフォーマンスを評価し、潜在的なセキュリティリスクを軽減するために重要です。
このペーパーでは、LLM を評価するための統合ライブラリである PromptBench を紹介します。
これは、研究者が簡単に使用および拡張できるいくつかの主要コンポーネントで構成されています。プロンプト構築、プロンプトエンジニアリング、データセットとモデルの読み込み、敵対的プロンプト攻撃、動的評価プロトコル、分析ツールです。
PromptBench は、新しいベンチマークの作成、ダウンストリームアプリケーションの展開、新しい評価プロトコルの設計における独自の研究を容易にする、研究目的のためのオープンで汎用的かつ柔軟なコードベースになるように設計されています。
コードは https://github.com/microsoft/promptbench で入手でき、継続的にサポートされます。

要約(オリジナル)

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purposes that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported.

arxiv情報

著者	Kaijie Zhu,Qinlin Zhao,Hao Chen,Jindong Wang,Xing Xie
発行日	2023-12-13 05:58:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PromptBench: A Unified Library for Evaluation of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー