SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

要約

科学文書の学習された表現は、さらに微調整することなく、下流のタスクに対する貴重な入力特徴として機能します。
ただし、これらの表現を評価するための既存のベンチマークは、関連するタスクの多様性を捉えることができません。
これに応えて、科学文書表現のトレーニングと評価のための初の包括的なベンチマークである SciRepEval を紹介します。
これには、分類、回帰、ランキング、検索の 4 つの形式にわたる 24 の挑戦的で現実的なタスクが含まれており、そのうち 8 つは新しいタスクです。
次に、このベンチマークを使用して、科学文書表現モデルの一般化能力を研究し、改善します。
SPECTER や SciNCL などの最先端のモデルがタスク形式全体で一般化するのにどのように苦労し、単純なマルチタスクトレーニングでは改善できないことを示します。
ただし、ドキュメントごとに複数の埋め込みを学習し、それぞれが異なる形式に合わせて調整される新しいアプローチにより、パフォーマンスを向上させることができます。
私たちはタスク形式固有の制御コードとアダプターを実験し、それらが既存の単一埋め込みの最先端のものより絶対的に 2 ポイント以上優れていることを発見しました。
私たちは、コミュニティが使用および構築できるように、SPECTER2 と呼ばれるマルチフォーマットモデルのファミリーをリリースします。

要約(オリジナル)

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 24 challenging and realistic tasks, 8 of which are new, across four formats: classification, regression, ranking and search. We then use this benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters and find they outperform the existing single-embedding state-of-the-art by over 2 points absolute. We release the resulting family of multi-format models, called SPECTER2, for the community to use and build on.

arxiv情報

著者	Amanpreet Singh,Mike D’Arcy,Arman Cohan,Doug Downey,Sergey Feldman
発行日	2023-11-13 18:25:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー