Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities

要約

この論文では、さまざまなモデルサイズにわたる単言語および多言語 LLM で生成されたデータと、十分に確立された談話バイアスを調査する実験環境で人間の参加者によって提供されたデータを比較します。
私たちは、そのような比較を超えて、より一般的な談話理解能力の堅牢な代用として、談話バイアスのある LLM の能力を評価するためのベンチマークを開発することを目指しています。
より具体的には、心理言語学的研究により、参加者が 3 つの現象に関してバイアスを示すことが判明した暗黙的因果関係動詞を調査しました:\ (i) 共参照関係 (実験 1)、(ii) 一貫性関係 (実験 2)、
(iii) 特定の参照表現の使用 (実験 3 および 4)。
共参照バイアスに関しては、より人間らしいバイアスを示す最大の単一言語 LLM (ドイツ語ブルーム 6.4B) のみが見つかりました。
一貫性関係については、人間に通常見られる説明バイアスを示す LLM はありませんでした。
式の参照に関しては、すべての LLM がオブジェクトよりも単純な形式でサブジェクト引数を参照することを好みました。
しかし、人間のバイアスを調査した最近の研究とは対照的に、言及表現に対するバイアス効果は見つかりませんでした。

要約(オリジナル)

In this paper, we compare data generated with mono- and multilingual LLMs spanning a range of model sizes with data provided by human participants in an experimental setting investigating well-established discourse biases. Beyond the comparison as such, we aim to develop a benchmark to assess the capabilities of LLMs with discourse biases as a robust proxy for more general discourse understanding capabilities. More specifically, we investigated Implicit Causality verbs, for which psycholinguistic research has found participants to display biases with regard to three phenomena:\ the establishment of (i) coreference relations (Experiment 1), (ii) coherence relations (Experiment 2), and (iii) the use of particular referring expressions (Experiments 3 and 4). With regard to coreference biases we found only the largest monolingual LLM (German Bloom 6.4B) to display more human-like biases. For coherence relation, no LLM displayed the explanation bias usually found for humans. For referring expressions, all LLMs displayed a preference for referring to subject arguments with simpler forms than to objects. However, no bias effect on referring expression was found, as opposed to recent studies investigating human biases.

arxiv情報

著者	Florian Kankowski,Torgrim Solstad,Sina Zarriess,Oliver Bott
発行日	2025-01-22 16:07:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー