OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

要約

私たちは、オープンに利用可能な固有表現認識 (NER) データセットの標準化されたコレクションである OpenNER 1.0 を紹介します。
OpenNER には、51 言語にわたる 34 のデータセットが含まれており、さまざまな名前付きエンティティオントロジーで注釈が付けられています。
アノテーション形式の問題を修正し、元のデータセットを統一表現に標準化し、エンティティタイプ名をコーパス全体でより一貫性のあるものにマッピングし、多言語およびマルチオントロジー NER での研究を可能にする構造でコレクションを提供します。
最近のモデルのパフォーマンスを比較し、NER での将来の研究を促進するために、3 つの事前トレーニング済み多言語モデルを使用したベースラインモデルを提供します。

要約(オリジナル)

We present OpenNER 1.0, a standardized collection of openly available named entity recognition (NER) datasets. OpenNER contains 34 datasets spanning 51 languages, annotated in varying named entity ontologies. We correct annotation format issues, standardize the original datasets into a uniform representation, map entity type names to be more consistent across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER. We provide baseline models using three pretrained multilingual language models to compare the performance of recent models and facilitate future research in NER.

arxiv情報

著者	Chester Palen-Michel,Maxwell Pickering,Maya Kruse,Jonne Sälevä,Constantine Lignos
発行日	2024-12-12 18:55:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー