InstructEval: Systematic Evaluation of Instruction Selection Methods

要約

インコンテキスト学習 (ICL) は、命令とデモンストレーションと呼ばれる注釈付きサンプルの小さなセットを使用して大規模言語モデル (LLM) を促すことによってタスクを実行します。
最近の研究では、ICL プロンプトで使用される入力の正確な詳細がパフォーマンスに大きな影響を与えることが示されており、これが命令選択アルゴリズムの動機となっています。
しかし、命令選択の効果は十分に解明されておらず、既存の分析はモデルとタスクの浅いサブセットに限定されており、洞察の一般化可能性が制限されています。
私たちは、これらの技術の徹底的な評価を行うための ICL 評価スイートである InstructEval を開発しています。
このスイートには、4 つのモデルファミリからのさまざまなスケールの 13 個のオープンソース LLM が含まれており、3 つのカテゴリにわたる 9 つのタスクをカバーしています。
このスイートを使用して、ICL に関連する 5 つの指標に基づいて 7 つの一般的な命令選択方法の相対的なパフォーマンスを評価します。
私たちの実験では、厳選された手動の指示、またはタスク固有の説明のない単純な指示を使用すると、自動指示誘導方法よりも全体的に優れた ICL パフォーマンスが得られることが多く、後者には一般化性が欠如していることが明らかになりました。
私たちは、命令選択アプローチのベンチマークを行い、この分野でより一般化可能な方法を可能にするための評価スイートをリリースします。

要約(オリジナル)

In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses restricted to shallow subsets of models and tasks, limiting the generalizability of their insights. We develop InstructEval, an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from four model families, and covers nine tasks across three categories. Using the suite, we evaluate the relative performance of seven popular instruction selection methods over five metrics relevant to ICL. Our experiments reveal that using curated manually-written instructions or simple instructions without any task-specific descriptions often elicits superior ICL performance overall than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches and enabling more generalizable methods in this space.

arxiv情報

著者	Anirudh Ajith,Chris Pan,Mengzhou Xia,Ameet Deshpande,Karthik Narasimhan
発行日	2023-07-16 10:14:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

InstructEval: Systematic Evaluation of Instruction Selection Methods

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー