Benchmarking Multilabel Topic Classification in the Kyrgyz Language

要約

キルギス語は、現代の自然言語処理リソースの点で非常に過小評価されている言語です。
この研究では、キルギスにおけるトピック分類の新しい公開ベンチマークを提示します。ニュースサイト 24.KG から収集され注釈が付けられたデータに基づくデータセットを導入し、マルチラベル設定でのニュース分類のいくつかのベースラインモデルを提示します。
私たちは古典的な統計モデルとニューラルモデルの両方をトレーニングおよび評価し、スコアを報告し、結果について議論し、将来の作業の方向性を提案します。

要約(オリジナル)

Kyrgyz is a very underrepresented language in terms of modern natural language processing resources. In this work, we present a new public benchmark for topic classification in Kyrgyz, introducing a dataset based on collected and annotated data from the news site 24.KG and presenting several baseline models for news classification in the multilabel setting. We train and evaluate both classical statistical and neural models, reporting the scores, discussing the results, and proposing directions for future work.

arxiv情報

著者	Anton Alekseev,Sergey I. Nikolenko,Gulnara Kabaeva
発行日	2023-08-30 11:02:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking Multilabel Topic Classification in the Kyrgyz Language

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー