Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language

要約

私たちは、ティーザーメッセージ、討論、政治的に組み立てられたニュース、プロパガンダなど、私たちに影響を与えようとする多くの情報にさらされていますが、それらはすべて説得力のある言葉を使用しています。
大規模言語モデル (LLM) に対する最近の関心により、私たちは説得力のあるテキストを生成する LLM の能力を研究しています。
特定の分野や説得の種類に焦点を当てたこれまでの研究とは対照的に、私たちはさまざまな分野にわたって一般的な研究を実施し、LLM がどの程度説得力のあるテキストを生成するかを測定およびベンチマークします。これは、テキストを多少なりとも説得力のあるものに書き換えるように明示的に指示された場合と、どのような場合に行われるかの両方です。
言い換えるように指示されただけです。
この目的を達成するために、短いテキストと、説得力のある言語を増幅または減少させるために LLM によって書き換えられたテキストで構成されるペアの新しいデータセット Persuasive-Pairs を構築します。
説得力のある言語を実現するために、相対スケールでペアにマルチアノテーションを付けます。
このデータはそれ自体が貴重なリソースであるだけでなく、テキストペア間の説得力のある言語のスコアを予測する回帰モデルのトレーニングに使用できることも示します。
このモデルは、ドメイン全体で新しい LLM をスコアリングしてベンチマークできるため、異なる LLM の比較が容易になります。
最後に、さまざまなシステムプロンプトで観察された効果について説明します。
特に、LLaMA3 のシステムプロンプト内のさまざまな「ペルソナ」によって、言い換えを指示されただけの場合でも、テキスト内の説得力のある言語が大幅に変化することがわかりました。
これらの発見は、LLM で生成されたテキスト内の説得力のある言語を調査することの重要性を強調しています。

要約(オリジナル)

We are exposed to much information trying to influence us, such as teaser messages, debates, politically framed news, and propaganda – all of which use persuasive language. With the recent interest in Large Language Models (LLMs), we study the ability of LLMs to produce persuasive text. As opposed to prior work which focuses on particular domains or types of persuasion, we conduct a general study across various domains to measure and benchmark to what degree LLMs produce persuasive text – both when explicitly instructed to rewrite text to be more or less persuasive and when only instructed to paraphrase. To this end, we construct a new dataset, Persuasive-Pairs, of pairs each consisting of a short text and of a text rewritten by an LLM to amplify or diminish persuasive language. We multi-annotate the pairs on a relative scale for persuasive language. This data is not only a valuable resource in itself, but we also show that it can be used to train a regression model to predict a score of persuasive language between text pairs. This model can score and benchmark new LLMs across domains, thereby facilitating the comparison of different LLMs. Finally, we discuss effects observed for different system prompts. Notably, we find that different ‘personas’ in the system prompt of LLaMA3 change the persuasive language in the text substantially, even when only instructed to paraphrase. These findings underscore the importance of investigating persuasive language in LLM generated text.

arxiv情報

著者	Amalie Brogaard Pauli,Isabelle Augenstein,Ira Assent
発行日	2024-06-25 17:40:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー