Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation

要約

同時機械翻訳 (SimulMT) モデルは、ソースセンテンスが終わる前に翻訳を開始し、翻訳をソースセンテンスと単調に揃えます。
ただし、一般的な全文翻訳テストセットは、SimulMT の評価用に設計されていないソースセンテンス全体のオフライン翻訳によって取得されるため、SimulMT モデルのパフォーマンスを過小評価するかどうかを再考する必要があります。
この論文では、SiMuST-C と呼ばれる MuST-C 英語-中国語テストセットに基づいて単調テストセットに手動で注釈を付けます。
私たちの人間による評価は、注釈付きのテストセットの許容性を確認します。
3 つの異なる SimulMT モデルでの評価により、テストセットで過小評価の問題を軽減できることが確認されました。
さらなる実験では、自動的に抽出された単調トレーニングセットを微調整すると、SimulMT モデルが最大 3 BLEU ポイント改善されることが示されています。

要約(オリジナル)

Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence. However, the general full-sentence translation test set is acquired by offline translation of the entire source sentence, which is not designed for SimulMT evaluation, making us rethink whether this will underestimate the performance of SimulMT models. In this paper, we manually annotate a monotonic test set based on the MuST-C English-Chinese test set, denoted as SiMuST-C. Our human evaluation confirms the acceptability of our annotated test set. Evaluations on three different SimulMT models verify that the underestimation problem can be alleviated on our test set. Further experiments show that finetuning on an automatically extracted monotonic training set improves SimulMT models by up to 3 BLEU points.

arxiv情報

著者	Mengge Liu,Wen Zhang,Xiang Li,Jian Luan,Bin Wang,Yuhang Guo,Shuoying Chen
発行日	2023-03-13 12:14:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー