Automatic Aspect Extraction from Scientific Texts


この論文では、タスク、貢献、方法、結論などの側面で注釈が付けられたロシア語の科学テキストのクロスドメイン データセットと、微調整された多言語 BERT モデルに基づく側面抽出のベースライン アルゴリズムを紹介します。
コードとデータセットは \url{} で入手できます。


Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at \url{}.


著者 Anna Marshalova,Elena Bruches,Tatiana Batura
発行日 2023-10-06 07:59:54+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.CL, cs.LG パーマリンク