Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

要約

最小ベイズリスク (MBR) デコードにより、多言語大規模言語モデル (MLLM) の翻訳パフォーマンスが大幅に向上します。
ただし、MBR デコードには計算コストがかかるため、この論文では、最近開発された強化学習 (RL) 技術である直接優先最適化 (DPO) を使用して MLLM を微調整し、追加の計算を行わずに MBR からゲインを得る方法を示します。
推論では。
当社の微調整されたモデルは、優先度の最適化を行わないベース MLLM と比較して、複数の NMT テストセットでのパフォーマンスが大幅に向上しました。
私たちの方法では、比較的小規模な単言語微調整セットを使用して MLLM の翻訳パフォーマンスを向上させます。

要約(オリジナル)

Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets.

arxiv情報

著者	Guangyu Yang,Jinghong Chen,Weizhe Lin,Bill Byrne
発行日	2023-11-14 18:43:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー