Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

要約

最小ベイズリスク (MBR) デコードにより、多言語大規模言語モデル (MLLM) の翻訳パフォーマンスが大幅に向上します。
ただし、MBR デコードは計算コストが高くなります。
最近開発された強化学習手法である Direct Preference Optimization (DPO) が、推論時に追加の計算を行わずに MLLM を微調整して MBR のゲインを得る方法を示します。
私たちの方法では、小規模な単一言語微調整セットのみを使用し、DPO を使用しない MLLM と比較して、複数の NMT テストセットでパフォーマンスが大幅に向上しました。

要約(オリジナル)

Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive. We show how the recently developed Reinforcement Learning technique, Direct Preference Optimization (DPO), can fine-tune MLLMs to get the gains of MBR without any additional computation in inference. Our method uses only a small monolingual fine-tuning set and yields significantly improved performance on multiple NMT test sets compared to MLLMs without DPO.

arxiv情報

著者	Guangyu Yang,Jinghong Chen,Weizhe Lin,Bill Byrne
発行日	2024-04-12 14:07:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー