Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

要約

限られたラベル付きデータを使用して NLP タスクを解決する場合、研究者は、それ以上更新せずに一般的な大規模な言語モデルを使用することも、少数のラベル付きサンプルを使用して特殊な小さなモデルを調整することもできます。
この研究では、性能の差異を考慮しながら、特殊な小型モデルが一般的な大型モデルを上回るパフォーマンスを発揮するために必要なラベル付きサンプルの数に関する研究ギャップに対処します。
7 つの言語モデルでの微調整、指示の調整、プロンプトおよびコンテキスト内学習の動作を観察することで、さまざまな特性を持つ 8 つの代表的なテキスト分類タスクにわたるパフォーマンスの損益分岐点を特定します。
特殊なモデルは、一般的なモデルと同等以上のサンプルを得るには、多くの場合、わずかなサンプル (平均 10 ドルから 1000 ドル) しか必要としないことがわかります。
同時に、必要なラベルの数はデータセットまたはタスクの特性に大きく依存し、この数はバイナリデータセット (最大 $5000$) よりもマルチクラスデータセット (最大 $100$) の方が大幅に低くなります。
パフォーマンスの差異を考慮すると、必要なラベルの数は平均で $100 ～ 200\%$ 増加し、特定のケースでは最大 $1500\%$ 増加することもあります。

要約(オリジナル)

When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance variance into consideration. By observing the behaviour of fine-tuning, instruction-tuning, prompting and in-context learning on 7 language models, we identify such performance break-even points across 8 representative text classification tasks of varying characteristics. We show that the specialised models often need only few samples (on average $10 – 1000$) to be on par or better than the general ones. At the same time, the number of required labels strongly depends on the dataset or task characteristics, with this number being significantly lower on multi-class datasets (up to $100$) than on binary datasets (up to $5000$). When performance variance is taken into consideration, the number of required labels increases on average by $100 – 200\%$ and even up to $1500\%$ in specific cases.

arxiv情報

著者	Branislav Pecher,Ivan Srba,Maria Bielikova
発行日	2024-04-26 08:20:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー