Don’t Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

要約

ニューラル機械翻訳システムは、ソース文が与えられた場合にターゲット文の確率を推定しますが、これらの推定値は人間の好みと一致しない可能性があります。
この研究では、人間の判断とより適切に相関して、改善された翻訳を合成する品質推定基準 (QE) を利用する方法である QE フュージョンを導入します。
QE 融合は、モデルからサンプリングされた候補プールを活用し、CometKiwi などの QE メトリクスを使用してさまざまな候補からのスパンを結合します。
我々は、QE 融合をビーム検索および最小ベイズリスクデコードや QE 再ランキングなどの最近の再ランキング技術と比較します。
私たちの手法は、翻訳に使用される大規模言語モデル (LLM) (PolyLM、XGLM、Llama2、および Mistral) および 5 つの言語ペアにわたる多言語翻訳モデル (NLLB) に適用すると、COMET および BLEURT スコアの観点から翻訳品質を一貫して向上させます。
特に、QE 融合は、多様な出力を生成できるため、LLM に大きな改善を示します。
私たちのアプローチは、半数以上のケースで新しい翻訳を生成し、さまざまな数の候補 (5 ～ 200) にわたって他の方法よりも一貫して優れていることを示します。
さらに、我々は、QE 融合がプール内の候補数に比例して増加することを経験的に確立しています。
QE 融合は、コストのかかる LLM の再トレーニングを必要とせずに、LLM ベースの翻訳を強化するのに効果的であることが証明されています。

要約(オリジナル)

Neural machine translation systems estimate probabilities of target sentences given source sentences, yet these estimates may not align with human preferences. This work introduces QE-fusion, a method utilizing a quality estimation metric (QE) that better correlates with human judgments to synthesize improved translations. QE-fusion leverages a candidate pool sampled from a model, combining spans from different candidates using QE metrics such as CometKiwi. We compare QE-fusion against beam search and recent reranking techniques, such as Minimum Bayes Risk decoding or QE-reranking. Our method consistently improves translation quality in terms of COMET and BLEURT scores when applied to large language models (LLMs) used for translation (PolyLM, XGLM, Llama2, and Mistral) and to multilingual translation models (NLLB), over five language pairs. Notably, QE-fusion exhibits larger improvements for LLMs due to their ability to generate diverse outputs. We demonstrate that our approach generates novel translations in over half of the cases and consistently outperforms other methods across varying numbers of candidates (5-200). Furthermore, we empirically establish that QE-fusion scales linearly with the number of candidates in the pool. QE-fusion proves effective in enhancing LLM-based translation without the need for costly retraining of LLMs.

arxiv情報

著者	Giorgos Vernikos,Andrei Popescu-Belis
発行日	2024-01-12 16:52:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Don’t Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー