Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

要約

大規模言語モデル (LLM) は優れたテキスト分類機能を示し、ゼロショット学習および少数ショット学習 (ZSL および FSL) シナリオに優れています。
ただし、これらは異なるデータセットでトレーニングされるため、パフォーマンスはこれらのモデル間のタスク間で大きく異なります。
最近の研究では、データアノテーションにおける人間によるラベルの変動を考慮することの重要性が強調されています。
ただし、この人間によるラベルのバリエーションが LLM にどのように適用されるかはまだ解明されていません。
このおそらくモデルの特殊化を考慮して、集約 LLM ラベルは個別のモデルよりも向上しますか (ヒューマン・アノテーターの場合と同様) と考えます。
私たちは、4 つの言語にわたる 5 つの主観的なタスクのアノテーターとして、最近の命令調整された 4 つの LLM を評価します。
ZSL および FSL セットアップを使用し、人間によるアノテーションによるラベル集約を行います。
確かに、集計は個々のモデルよりも大幅に優れており、多様なタスクや言語に特化することで恩恵を受けます。
驚くべきことに、FSL は選択されたサンプルの品質に依存するため、ZSL を超えることはありません。
しかし、それらを選択するための優れた情報理論的戦略はないようです。
単純な教師ありモデルにさえ匹敵する LLM 手法はないことがわかりました。
また、LLM と人間によるアノテーションの間の精度、コスト、道徳/倫理的考慮事項におけるトレードオフについても説明します。

要約(オリジナル)

Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and FSL) scenarios. However, since they are trained on different datasets, performance varies widely across tasks between those models. Recent studies emphasize the importance of considering human label variation in data annotation. However, how this human label variation also applies to LLMs remains unexplored. Given this likely model specialization, we ask: Do aggregate LLM labels improve over individual models (as for human annotators)? We evaluate four recent instruction-tuned LLMs as annotators on five subjective tasks across four languages. We use ZSL and FSL setups and label aggregation from human annotation. Aggregations are indeed substantially better than any individual model, benefiting from specialization in diverse tasks or languages. Surprisingly, FSL does not surpass ZSL, as it depends on the quality of the selected examples. However, there seems to be no good information-theoretical strategy to select those. We find that no LLM method rivals even simple supervised models. We also discuss the tradeoffs in accuracy, cost, and moral/ethical considerations between LLM and human annotation.

arxiv情報

著者	Flor Miriam Plaza-del-Arco,Debora Nozza,Dirk Hovy
発行日	2024-04-15 09:00:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー