Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

要約

この論文では、シェフィールド大学のAmericasNLP 2023への提出について説明します。これは、スペイン語から11の先住民族への翻訳を構成する先住民言語への機械翻訳に関する共有タスクです。
私たちのアプローチは、NLLB-200のさまざまなバリエーションを拡張、トレーニング、アンサンミングすることで構成されています。
主催者から提供されたデータと、憲法、ハンドブック、ニュース記事、単一言語データから生成された逆翻訳など、他のさまざまなソースからのデータを使用します。
DEVセットでは、当社の最高の提出物は、すべての言語で平均CHRF 11％のベースラインを上回り、特にAymara、Guarani、およびQuechuaの大幅な改善があります。
テストセットでは、すべての提出物の中で最も高い平均CHRFを達成し、11の言語のうち4つで最初にランク付けされ、すべての言語のトップ3で提出の少なくとも1つのランクがランク付けされます。

要約(オリジナル)

In this paper we describe the University of Sheffield’s submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages. Our approach consists of extending, training, and ensembling different variations of NLLB-200. We use data provided by the organizers and data from various other sources such as constitutions, handbooks, news articles, and backtranslations generated from monolingual data. On the dev set, our best submission outperforms the baseline by 11% average chrF across all languages, with substantial improvements particularly for Aymara, Guarani and Quechua. On the test set, we achieve the highest average chrF of all the submissions, we rank first in four of the eleven languages, and at least one of our submissions ranks in the top 3 for all languages.

arxiv情報

著者	Edward Gow-Smith,Danae Sánchez Villegas
発行日	2025-02-27 15:47:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sheffield’s Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー