CLIMATELI: Evaluating Entity Linking on Climate Change Data

要約

気候変動 (CC) は世界的に重要な差し迫ったテーマであり、社会科学から自然言語処理 (NLP) に至るまで、研究分野全体で注目が高まっています。
CC は、学術出版物からソーシャルメディアフォーラムに至るまで、さまざまな環境やコミュニケーションプラットフォームでも議論されています。
このようなデータで誰が、何が言及されているかを理解することは、CC について新たな洞察を得るために重要な最初のステップです。
我々は、3,087 のエンティティスパンを Wikipedia にリンクする、最初の手動で注釈が付けられた CC データセットである CLIMATELI (CLIMATe Entity LInking) を紹介します。
CLIMATELI (CLIMATe Entity LInking) を使用して、さまざまなジャンルにわたる CC トピックに関する既存のエンティティリンク (EL) システムを評価し、CC エンティティの自動フィルタリング方法を提案します。
EL モデルのパフォーマンスは、トークンレベルとエンティティレベルの両方で人間よりも著しく遅れていることがわかりました。
非名目エンティティおよび/または非 CC エンティティを保持または除外する範囲内でのテストは、モデルのパフォーマンスに特に影響を与えます。

要約(オリジナル)

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models’ performances.

arxiv情報

著者	Shijia Zhou,Siyao Peng,Barbara Plank
発行日	2024-06-24 15:36:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CLIMATELI: Evaluating Entity Linking on Climate Change Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー