Vygotsky Distance: Measure for Benchmark Task Similarity

要約

評価は現代の自然言語処理において重要な役割を果たします。
最新の NLP ベンチマークのほとんどは、テストセットの外に適用されたモデルの一般化の可能性を保証することも、モデルの評価に必要なリソース消費を最小限に抑えることも試みない、任意のタスクのセットで構成されています。
この論文では、ベンチマークタスク間の類似性を計算するための理論的手段と実践的なアルゴリズムを紹介します。この類似性の尺度を「ヴィゴツキー距離」と呼びます。
この類似性の尺度の中心的な考え方は、タスク自体の特性ではなく、特定のタスクに対する「生徒」の相対的なパフォーマンスに基づいているということです。
2 つのタスクがヴィゴツキー距離の点で互いに近い場合、モデルはそれらのタスクに対して同様の相対的なパフォーマンスを示す傾向があります。
したがって、タスク間のヴィゴツキー距離を知ることで、高い検証品質を維持しながら、評価タスクの数を大幅に減らすことができます。
GLUE、SuperGLUE、CLUE、RussianSuperGLUE などのさまざまなベンチマークの実験では、含まれるタスクに関して、NLP ベンチマークの大部分が少なくとも 40% 小さくなる可能性があることが実証されています。
最も重要なことは、ヴィゴツキー距離は新しいタスクの検証にも使用できるため、将来の NLP モデルの一般化の可能性が高まることです。

要約(オリジナル)

Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to minimize the resource consumption needed for model evaluation. This paper presents a theoretical instrument and a practical algorithm to calculate similarity between benchmark tasks, we call this similarity measure ‘Vygotsky distance’. The core idea of this similarity measure is that it is based on relative performance of the ‘students’ on a given task, rather that on the properties of the task itself. If two tasks are close to each other in terms of Vygotsky distance the models tend to have similar relative performance on them. Thus knowing Vygotsky distance between tasks one can significantly reduce the number of evaluation tasks while maintaining a high validation quality. Experiments on various benchmarks, including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast majority of NLP benchmarks could be at least 40% smaller in terms of the tasks included. Most importantly, Vygotsky distance could also be used for the validation of new tasks thus increasing the generalization potential of the future NLP models.

arxiv情報

著者	Maxim K. Surkov,Ivan P. Yamshchikov
発行日	2024-02-26 12:09:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vygotsky Distance: Measure for Benchmark Task Similarity

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー