Cross-Domain Keyword Extraction with Keyness Patterns

要約

ドメイン依存性と注釈の主観性により、教師付きキーワード抽出には課題が生じます。
二次キーネスパターンがコミュニティレベルで存在し、注釈付きのキーワード抽出データセットから学習可能であるという前提に基づいて、この論文では、独立した特徴 (サブ言語ドメインや言語ドメインなど) で構成されるキーネスパターンでキーワードをランク付けする、キーワード抽出への教師付きランキングアプローチを提案します。
用語の長さ)、および依存特徴の 3 つのカテゴリ (ヒューリスティック特徴、特異性特徴、および代表性特徴)。
このアプローチでは、2 つの畳み込みニューラルネットワークベースのモデルを使用してキーワードデータセットからキーネスパターンを学習し、ブートストラップサンプリング戦略で 2 つのモデルをトレーニングすることでアノテーションの主観性を克服します。
実験では、このアプローチが、一般的な教師ありキーワード抽出において 10 個のキーワードデータセットで、平均上位 10 F 尺度 0.316 の最先端のパフォーマンスを達成するだけでなく、平均上位 10 F 尺度の堅牢なクロスドメインパフォーマンスも達成することが実証されています。
トレーニングプロセスで除外された 4 つのデータセットの 10-F 測定値は 0.346。
このようなクロスドメインの堅牢性は、コミュニティレベルのキーネスパターンの数が限られており、言語ドメインから暫定的に独立しているという事実、独立した特徴と依存する特徴の区別、過剰なリスクとネガティブなトレーニングの欠如のバランスを取るサンプリングトレーニング戦略に起因すると考えられます。
データ。

要約(オリジナル)

Domain dependence and annotation subjectivity pose challenges for supervised keyword extraction. Based on the premises that second-order keyness patterns are existent at the community level and learnable from annotated keyword extraction datasets, this paper proposes a supervised ranking approach to keyword extraction that ranks keywords with keyness patterns consisting of independent features (such as sublanguage domain and term length) and three categories of dependent features — heuristic features, specificity features, and representavity features. The approach uses two convolutional-neural-network based models to learn keyness patterns from keyword datasets and overcomes annotation subjectivity by training the two models with bootstrap sampling strategy. Experiments demonstrate that the approach not only achieves state-of-the-art performance on ten keyword datasets in general supervised keyword extraction with an average top-10-F-measure of 0.316 , but also robust cross-domain performance with an average top-10-F-measure of 0.346 on four datasets that are excluded in the training process. Such cross-domain robustness is attributed to the fact that community-level keyness patterns are limited in number and temperately independent of language domains, the distinction between independent features and dependent features, and the sampling training strategy that balances excess risk and lack of negative training data.

arxiv情報

著者	Dongmei Zhou,Xuri Tang
発行日	2024-09-27 13:19:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cross-Domain Keyword Extraction with Keyness Patterns

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー