Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

要約

命令チューニングは、大規模言語モデル (LLM) のトレーニングパイプラインに不可欠な部分となっており、大幅なパフォーマンス向上をもたらすことがわかっています。
これと直交する研究の流れとして、アノテーションエラー検出 (AED) が、ゴールドスタンダードラベルの品質問題を検出するツールとして登場しました。
しかし、これまでのところ、AED 手法の適用は識別可能な設定に限定されています。
AED 手法が生成 LLM を通じて普及しつつある生成設定にどの程度一般化できるかは未解決の問題です。
この研究では、命令チューニングデータに関する AED の最初の新しいベンチマーク、Donkii を紹介します。
これには、専門家による注釈と半自動手法で強化された 3 つの命令チューニングデータセットが含まれています。
3 つのデータセットすべてに明確なエラーが含まれており、場合によっては命令調整された LLM に直接伝播することがわかりました。
生成設定に対して 4 つの AED ベースラインを提案し、新たに導入したデータセットに基づいてそれらを包括的に評価します。
私たちの結果は、適切な AED 方法とモデルサイズを選択することが実際に重要であり、それによって実用的な推奨事項が得られることを示しています。
洞察を得るために、命令チューニングデータセットの品質がダウンストリームのパフォーマンスにどのような影響を与えるかを調べるための最初のケーススタディを提供します。

要約(オリジナル)

Instruction-tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality issues of gold-standard labels. But so far, the application of AED methods is limited to discriminative settings. It is an open question how well AED methods generalize to generative settings which are becoming widespread via generative LLMs. In this work, we present a first and new benchmark for AED on instruction-tuning data: Donkii. It encompasses three instruction-tuning datasets enriched with annotations by experts and semi-automatic methods. We find that all three datasets contain clear-cut errors that sometimes directly propagate into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them comprehensively on the newly introduced dataset. Our results demonstrate that choosing the right AED method and model size is indeed crucial, thereby deriving practical recommendations. To gain insights, we provide a first case-study to examine how the quality of the instruction-tuning datasets influences downstream performance.

arxiv情報

著者	Leon Weber-Genzel,Robert Litschko,Ekaterina Artemova,Barbara Plank
発行日	2023-09-04 15:34:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー