AI-Assisted Human Evaluation of Machine Translation

要約

研究チームは毎年、機械翻訳システム (特に WMT) の品質を評価するために多額の費用を費やしています。
これには多くの専門家による労働力が必要となるため、コストが高くなります。
最近採用された注釈プロトコルであるエラースパンアノテーション (ESA) では、注釈者が翻訳の誤った部分をマークし、最終スコアを割り当てます。
アノテーターの時間の多くは、翻訳に誤りがないかをスキャンすることに費やされます。
私たちの作業では、リコール指向の自動品質推定を使用してエラーアノテーションを事前に入力することで、アノテーターを支援します。
この AI 支援により、スパンアノテーションあたりの時間を半分に短縮しながら、同じ品質レベルのアノテーションを取得できます (71 秒/エラースパン $\rightarrow$ 31 秒/エラースパン)。
ESA$^\mathrm{AI}$ プロトコルの最大の利点は、最終スコアを割り当てる前にアノテーターが正確にプライミング (事前入力されたエラー範囲) できることです。
これにより、潜在的な自動化バイアスも軽減されますが、このバイアスは低いことが確認されています。
さらに、AI が正しい可能性が高いと判断した例をフィルタリングすることで、アノテーションの予算をほぼ 25\% 削減できます。

要約(オリジナル)

Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires a lot of expert human labor. The recently adopted annotation protocol, Error Span Annotation (ESA), has annotators marking erroneous parts of the translation and then assigning a final score. A lot of the annotator time is spent on scanning the translation for possible errors. In our work, we help the annotators by pre-filling the error annotations with recall-oriented automatic quality estimation. With this AI assistance, we obtain annotations at the same quality level while cutting down the time per span annotation by half (71s/error span $\rightarrow$ 31s/error span). The biggest advantage of ESA$^\mathrm{AI}$ protocol is an accurate priming of annotators (pre-filled error spans) before they assign the final score. This also alleviates a potential automation bias, which we confirm to be low. In addition, the annotation budget can be reduced by almost 25\% with filtering of examples that the AI deems to be very likely to be correct.

arxiv情報

著者	Vilém Zouhar,Tom Kocmi,Mrinmaya Sachan
発行日	2024-09-17 14:18:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AI-Assisted Human Evaluation of Machine Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー