The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

要約

ハードトレーニングデータに正しくラベルを付けることが本質的に難しい場合、ハードテストデータで適切にパフォーマンスを発揮するようにモデルをトレーニングするにはどうすればよいでしょうか?
この問題はスケーラブルな監視問題と呼ばれており、言語モデルが継続的に改善されるにつれて注目が高まっています。
この論文では、現在の事前トレーニング済み言語モデルは、簡単なデータから難しいデータまで比較的うまく一般化することが多く、ハードデータで微調整された Oracle モデルと同等のパフォーマンスを発揮する場合さえあるという驚くべき結論を示します。
私たちは、経験的に多様な人間の硬度の 6 つの尺度 (学年レベルなど) と 1 つのモデルを含む、データポイントの硬度の 7 つの異なる尺度に対して、コンテキスト学習、線形分類器ヘッド、QLoRA などのシンプルな微調整手法を使用して、この種の簡単から難しい一般化を実証します。
-ベースの測定（損失ベース）。
さらに、ハードデータのモデルのパフォーマンスを最も重視する場合でも、ハードデータは一般にノイズが多く、収集コストがかかるため、微調整にはハードデータよりも簡単なデータを収集する方が良い場合があることを示します。
私たちの実験では、最大 70b のサイズのオープンモデルと、3 年生の科学の質問から大学レベルの STEM の質問、一般知識の雑学まで、さまざまな難易度の質問を含む 4 つの公開されている質問応答データセットを使用します。
LM における簡単から難しい一般化は、研究されたタスクに対して驚くほど強力であると結論付けています。
私たちのコードは、https://github.com/allenai/easy-to-hard-generalization から入手できます。

要約(オリジナル)

How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data. We demonstrate this kind of easy-to-hard generalization using simple finetuning methods like in-context learning, linear classifier heads, and QLoRA for seven different measures of datapoint hardness, including six empirically diverse human hardness measures (like grade level) and one model-based measure (loss-based). Furthermore, we show that even if one cares most about model performance on hard data, it can be better to collect easy data rather than hard data for finetuning, since hard data is generally noisier and costlier to collect. Our experiments use open models up to 70b in size and four publicly available question-answering datasets with questions ranging in difficulty from 3rd grade science questions to college level STEM questions and general-knowledge trivia. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied. Our code is available at: https://github.com/allenai/easy-to-hard-generalization

arxiv情報

著者	Peter Hase,Mohit Bansal,Peter Clark,Sarah Wiegreffe
発行日	2024-06-05 14:10:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー