RED$^{\rm FM}$: a Filtered and Multilingual Relation Extraction Dataset

要約

関係抽出 (RE) は、テキスト内のエンティティ間の関係を識別するタスクであり、関係事実の取得を可能にし、自然言語と構造化された知識の間のギャップを埋めることができます。
ただし、現在の RE モデルは、特に英語以外の言語を使用する場合、関係タイプのカバレッジが低い小規模なデータセットに依存することがよくあります。
このペーパーでは、上記の問題に対処し、多言語 RE システムのトレーニングと評価を可能にする 2 つの新しいリソースを提供します。
まず、SRED$^{\rm FM}$ を紹介します。これは、18 言語、400 のリレーションタイプ、13 のエンティティタイプをカバーし、合計 4,000 万以上のトリプレットインスタンスをカバーする、自動的にアノテーションが付けられたデータセットです。
2 番目に、多言語 RE システムの評価を可能にする 7 言語用の人間によって修正された小規模なデータセットである RED$^{\rm FM}$ を提案します。
これらの新しいデータセットの有用性を実証するために、エンティティタイプを含むトリプレットを複数の言語で抽出する、最初のエンドツーエンドの多言語 RE モデルである mREBEL を実験します。
リソースとモデルのチェックポイントを https://www.github.com/babelscape/rebel でリリースします。

要約(オリジナル)

Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English. In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems. First, we present SRED$^{\rm FM}$, an automatically annotated dataset covering 18 languages, 400 relation types, 13 entity types, totaling more than 40 million triplet instances. Second, we propose RED$^{\rm FM}$, a smaller, human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems. To demonstrate the utility of these novel datasets, we experiment with the first end-to-end multilingual RE model, mREBEL, that extracts triplets, including entity types, in multiple languages. We release our resources and model checkpoints at https://www.github.com/babelscape/rebel

arxiv情報

著者	Pere-Lluís Huguet Cabot,Simone Tedeschi,Axel-Cyrille Ngonga Ngomo,Roberto Navigli
発行日	2023-06-19 09:25:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RED$^{\rm FM}$: a Filtered and Multilingual Relation Extraction Dataset

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー