Bridging the Gap: In-Context Learning for Modeling Human Disagreement

要約

大規模な言語モデル（LLM）は、NLP分類タスクで強力なパフォーマンスを示しています。
しかし、彼らは通常、多数派の投票を通じて集約されたラベルに依存しています。これは、主観的な注釈に固有の人間の意見の不一致を曖昧にする可能性があります。
この研究では、LLMが複数の視点をキャプチャできるかどうかを調べ、ヘイトスピーチや攻撃的な言語検出などの主観的なタスクにおける注釈者の意見の不一致を反映しています。
ゼロショットおよび少数のショット設定でコンテキスト内学習（ICL）を使用し、3つのラベルモデリング戦略の4つのオープンソースLLMを評価し、集約されたハードラベルとハードラベルとソフトラベルを分類します。
少数のショットプロンプトでは、テキストの類似性（BM25、PLMベース）、注釈の意見の不一致（エントロピー）、ランキングの組み合わせ、および秩序化戦略（ランダム対カリキュラムベース）に基づくデモンストレーション選択方法を評価します。
結果は、ゼロショット設定では多視系の生成が実行可能であることを示していますが、少数のショットセットアップでは、人間の判断の全範囲をキャプチャできないことがよくあります。
迅速な設計とデモンストレーションの選択は、パフォーマンスに特に影響を与えますが、順序の例は影響が限られています。
これらの調査結果は、LLMSで主観性をモデル化することの課題と、より視点を認識し、社会的にインテリジェントなモデルを構築することの重要性を強調しています。

要約(オリジナル)

Large Language Models (LLMs) have shown strong performance on NLP classification tasks. However, they typically rely on aggregated labels-often via majority voting-which can obscure the human disagreement inherent in subjective annotations. This study examines whether LLMs can capture multiple perspectives and reflect annotator disagreement in subjective tasks such as hate speech and offensive language detection. We use in-context learning (ICL) in zero-shot and few-shot settings, evaluating four open-source LLMs across three label modeling strategies: aggregated hard labels, and disaggregated hard and soft labels. In few-shot prompting, we assess demonstration selection methods based on textual similarity (BM25, PLM-based), annotation disagreement (entropy), a combined ranking, and example ordering strategies (random vs. curriculum-based). Results show that multi-perspective generation is viable in zero-shot settings, while few-shot setups often fail to capture the full spectrum of human judgments. Prompt design and demonstration selection notably affect performance, though example ordering has limited impact. These findings highlight the challenges of modeling subjectivity with LLMs and the importance of building more perspective-aware, socially intelligent models.

arxiv情報

著者	Benedetta Muscato,Yue Li,Gizem Gezici,Zhixue Zhao,Fosca Giannotti
発行日	2025-06-06 14:24:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bridging the Gap: In-Context Learning for Modeling Human Disagreement

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー