CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

要約

我々は、Stanza 自然言語処理パイプラインに基づいた、南スラブ言語の自動言語注釈用パイプラインである CLASSLA-Stanza を紹介します。
Stanza に関して CLASSLA-Stanza の主な改善点を説明し、パイプラインの最新 2.1 リリースのモデルトレーニングプロセスについて詳しく説明します。
また、さまざまな言語や種類のパイプラインによって生成されたパフォーマンススコアも報告します。
CLASSLA-Stanza は、サポートされているすべての言語にわたって一貫して高いパフォーマンスを示し、サポートされているすべてのタスクにおいて親パイプライン Stanza を上回るパフォーマンスまたは拡張を示します。
また、Web データの効率的な処理を可能にするパイプラインの新機能と、その実装に至った理由についても紹介します。

要約(オリジナル)

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline’s new functionality enabling efficient processing of web data and the reasons that led to its implementation.

arxiv情報

著者	Luka Terčon,Nikola Ljubešić
発行日	2023-08-11 15:24:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー