Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

要約

リーダーボードシステムは、研究者が自然言語処理 (NLP) モデルを客観的に評価することを可能にし、通常、所定の設定で特定のタスクで優れたパフォーマンスを発揮するモデルを特定するために使用されます。
ただし、特定のテストデータセットでの評価は、モデルの多くのパフォーマンス指標の 1 つにすぎないと主張します。
この論文では、リーダーボードの競争は、現実世界の設定で最高のパフォーマンスを発揮するモデルを特定することも目的とすべきであると主張しています。
現在のリーダーボードシステムの 3 つの問題を強調します。(1) 単一の静的テストセットの使用、(2) テストと実際のアプリケーションとの不一致 (3) リーダーボード中心の競争がテストセットに偏る傾向
.
解決策として、現在のリーダーボードシステムのこれらの問題に対処するリーダーボードシステムの新しいパラダイムを提案します。
この研究を通じて、より現実世界中心のリーダーボード競争へのパラダイムシフトを誘発したいと考えています。

要約(オリジナル)

Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting. However, we argue that evaluation on a given test dataset is just one of many performance indications of the model. In this paper, we claim leaderboard competitions should also aim to identify models that exhibit the best performance in a real-world setting. We highlight three issues with current leaderboard systems: (1) the use of a single, static test set, (2) discrepancy between testing and real-world application (3) the tendency for leaderboard-centric competition to be biased towards the test set. As a solution, we propose a new paradigm of leaderboard systems that addresses these issues of current leaderboard system. Through this study, we hope to induce a paradigm shift towards more real -world-centric leaderboard competitions.

arxiv情報

著者	Chanjun Park,Hyeonseok Moon,Seolhwa Lee,Jaehyung Seo,Sugyeong Eo,Heuiseok Lim
発行日	2023-03-20 06:13:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー