Reproducibility in NLP: What Have We Learned from the Checklist?

要約

NLP における科学の進歩は、研究者の主張の再現性にかかっています。
*CL カンファレンスは 2020 年に NLP 再現性チェックリストを作成しました。このチェックリストは投稿時に著者が記入し、含めるべき重要な情報を思い出させます。
私たちは、チェックリストに対する 10,405 件の匿名回答を調査することにより、チェックリストの最初の分析を提供します。
まず、チェックリストの導入後、効率、検証パフォーマンス、概要統計、ハイパーパラメーターに関する情報の報告が増加しているという証拠が見つかりました。
さらに、「はい」の回答が多い提出物ほど受け入れ率が高くなることがわかります。
新しいデータを収集する提出物の 44% は、そうでない提出物に比べて受け入れられる可能性が 5% 低いことがわかりました。
これらの投稿の平均的な査読者評価の再現性も、他の投稿と比べて 2% 低くなります。
コードをオープンソース化していると主張する投稿は 46% のみであることがわかりました。ただし、オープンソースであると主張している投稿は、そうでない投稿と比較して再現性スコアが 8% 高く、どの項目においても最も高い結果となっています。
私たちは、NLP の再現性の状態について何が推測できるかを議論し、a) 期限から 1 週間後にコードと付録の提出を許可すること、b) データのチェックリストによってデータセットの再現性を測定することなど、今後のカンファレンスに向けた一連の推奨事項を提供します。
収集の習慣。

要約(オリジナル)

Scientific progress in NLP rests on the reproducibility of researchers’ claims. The *CL conferences created the NLP Reproducibility Checklist in 2020 to be completed by authors at submission to remind them of key information to include. We provide the first analysis of the Checklist by examining 10,405 anonymous responses to it. First, we find evidence of an increase in reporting of information on efficiency, validation performance, summary statistics, and hyperparameters after the Checklist’s introduction. Further, we show acceptance rate grows for submissions with more Yes responses. We find that the 44% of submissions that gather new data are 5% less likely to be accepted than those that did not; the average reviewer-rated reproducibility of these submissions is also 2% lower relative to the rest. We find that only 46% of submissions claim to open-source their code, though submissions that do have 8% higher reproducibility score relative to those that do not, the most for any item. We discuss what can be inferred about the state of reproducibility in NLP, and provide a set of recommendations for future conferences, including: a) allowing submitting code and appendices one week after the deadline, and b) measuring dataset reproducibility by a checklist of data collection practices.

arxiv情報

著者	Ian Magnusson,Noah A. Smith,Jesse Dodge
発行日	2023-06-16 00:39:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reproducibility in NLP: What Have We Learned from the Checklist?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー