GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity Extraction Focused on Machine Learning Models and Datasets

要約

固有表現認識 (NER) モデルは、情報抽出 (IE) やテキスト理解などのさまざまな NLP タスクにおいて重要な役割を果たします。
学術論文では、機械学習のモデルとデータセットへの参照はさまざまなコンピューターサイエンスの出版物の基本的な構成要素であり、識別のために正確なモデルが必要です。
NER の進歩にも関わらず、既存のグランドトゥルースデータセットは、ML モデルやモデルアーキテクチャなどのきめ細かいタイプを別個のエンティティタイプとして扱っていないため、ベースラインモデルはそれらをそのように認識できません。
このペーパーでは、手動で注釈が付けられた 100 件の科学出版物全文のコーパスと、ML モデルとデータセットを中心とした 10 種類のエンティティの最初のベースラインモデルをリリースします。
ML モデルとデータセットがどのように言及され利用されているかを微妙に理解できるように、当社のデータセットには「BERT ベースのモデル」や「イメージ CNN」などの非公式な言及に対する注釈も含まれています。
モデルトレーニングを複製するためのグラウンドトゥルースデータセットとコードは、https://data.gesis.org/gsap/gsap-ner で見つけることができます。

要約(オリジナル)

Named Entity Recognition (NER) models play a crucial role in various NLP tasks, including information extraction (IE) and text understanding. In academic writing, references to machine learning models and datasets are fundamental components of various computer science publications and necessitate accurate models for identification. Despite the advancements in NER, existing ground truth datasets do not treat fine-grained types like ML model and model architecture as separate entity types, and consequently, baseline models cannot recognize them as such. In this paper, we release a corpus of 100 manually annotated full-text scientific publications and a first baseline model for 10 entity types centered around ML models and datasets. In order to provide a nuanced understanding of how ML models and datasets are mentioned and utilized, our dataset also contains annotations for informal mentions like ‘our BERT-based model’ or ‘an image CNN’. You can find the ground truth dataset and code to replicate model training at https://data.gesis.org/gsap/gsap-ner.

arxiv情報

著者	Wolfgang Otto,Matthäus Zloch,Lu Gan,Saurav Karmakar,Stefan Dietze
発行日	2023-11-16 12:43:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity Extraction Focused on Machine Learning Models and Datasets

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー