Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

要約

最新のエンティティリンク (EL) システムでは人気バイアスが根付いていますが、英語以外の言語ではテールおよび新興エンティティに焦点を当てたデータセットがありません。
私たちは、英語以外の少数ショットおよびゼロショットの EL チャレンジの空席を埋める、中国語の新しいベンチマークである Hansel を紹介します。
Hansel のテストセットは人間による注釈付けとレビューが行われ、ゼロショット EL データセットを収集する新しい方法で作成されています。
ニュース、ソーシャルメディアの投稿、その他の Web 記事に含まれる 10,000 の多様なドキュメントをカバーしており、ウィキデータを対象ナレッジベースとしています。
我々は、既存の最先端の EL システムが Hansel ではパフォーマンスが悪い (Few-Shot で R@1 が 36.6%) ことを実証します。
次に、データセットの Few-Shot で 46.2%、Zero-Shot で 76.6% の R@1 をスコアする強力なベースラインを確立します。
また、ベースラインが TAC-KBP2015 の中国エンティティリンクタスクで競争力のある結果を達成していることも示します。

要約(オリジナル)

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

arxiv情報

著者	Zhenran Xu,Zifei Shan,Yuxin Li,Baotian Hu,Bing Qin
発行日	2023-10-29 14:35:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー