The language of prompting: What linguistic properties make a prompt successful?

要約

最新世代のLLMは、多くの自然言語処理タスクにおいて、ゼロショットまたは数ショットの印象的なパフォーマンスを達成するようプロンプトを出すことができる。しかし、そのパフォーマンスはプロンプトの選択に大きく影響されるため、プロンプトをクラウドソーシングしたり、プロンプトを最適化する方法を設計したりすることに多大な努力が払われてきた。しかし、プロンプトの言語的特性がタスクのパフォーマンスとどのように相関するかについての体系的な理解はまだ不十分である。本研究では、意味的には等価であるが言語構造が異なるプロンプトに対して、事前に訓練され、インストラクションチューニングされた異なるサイズのLLMがどのようなパフォーマンスを示すかを調べる。我々は、ムード、時制、アスペクト、モダリティなどの文法的特性と、同義語の使用による語彙的意味の変化の両方を調査する。その結果、LLMは事前学習や指導調整データにおける言語使用を反映した、より低い当惑度のプロンプトで最適なパフォーマンスを達成するという一般的な仮定と矛盾する結果が得られた。プロンプトはデータセット間やモデル間でうまく伝達されず、一般にその性能は複雑度、単語頻度、曖昧さ、プロンプトの長さでは説明できない。この結果に基づき、プロンプト研究のより強固で包括的な評価基準を提案する。

要約(オリジナル)

The latest generation of LLMs can be prompted to achieve impressive zero-shot or few-shot performance in many NLP tasks. However, since performance is highly sensitive to the choice of prompts, considerable effort has been devoted to crowd-sourcing prompts or designing methods for prompt optimisation. Yet, we still lack a systematic understanding of how linguistic properties of prompts correlate with task performance. In this work, we investigate how LLMs of different sizes, pre-trained and instruction-tuned, perform on prompts that are semantically equivalent, but vary in linguistic structure. We investigate both grammatical properties such as mood, tense, aspect and modality, as well as lexico-semantic variation through the use of synonyms. Our findings contradict the common assumption that LLMs achieve optimal performance on lower perplexity prompts that reflect language use in pretraining or instruction-tuning data. Prompts transfer poorly between datasets or models, and performance cannot generally be explained by perplexity, word frequency, ambiguity or prompt length. Based on our results, we put forward a proposal for a more robust and comprehensive evaluation standard for prompting research.

arxiv情報

著者	Alina Leidinger,Robert van Rooij,Ekaterina Shutova
発行日	2023-11-03 15:03:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

The language of prompting: What linguistic properties make a prompt successful?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー