Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

要約

強化学習（RL）は、特に翻訳の品質を正確に評価する強力な報酬モデルと組み合わせた場合、神経機械翻訳システムをトレーニングするための効果的で堅牢な方法であることが証明されています。
ただし、ほとんどの研究では、文レベルのフィードバックを使用するRLメソッドに焦点を当てており、報酬スパースの問題により非効率的な学習信号につながります。モデルは文全体の単一スコアを受け取ります。
これに対処するために、RLメソッドを使用してエラーの重大度レベルとともに、細粒のトークンレベルの品質評価を活用する新しいアプローチを提案します。
具体的には、トークンレベルの報酬モデルとして、最先端の品質推定システムであるXcometを使用しています。
標準のエンコーダーデコーダーと大規模な言語モデルベースの機械翻訳システムを使用して、大小の翻訳データセットで実験を実施し、翻訳品質に対する文レベルと細粒の報酬信号の影響を比較します。
我々の結果は、トークンレベルの報酬を使用したトレーニングにより、自動評価と人間の評価に従って、ベースライン上の言語ペア全体の翻訳品質が向上することを示しています。
さらに、トークンレベルの報酬の最適化は、トレーニングエポックに対する平均報酬の着実な増加によって証明されるトレーニングの安定性を改善します。

要約(オリジナル)

Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, leading to inefficient learning signals due to the reward sparsity problem — the model receives a single score for the entire sentence. To address this, we propose a novel approach that leverages fine-grained, token-level quality assessments along with error severity levels using RL methods. Specifically, we use xCOMET, a state-of-the-art quality estimation system, as our token-level reward model. We conduct experiments on small and large translation datasets with standard encoder-decoder and large language models-based machine translation systems, comparing the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to both automatic and human evaluation. Furthermore, token-level reward optimization improves training stability, evidenced by a steady increase in mean rewards over training epochs.

arxiv情報

著者	Miguel Moura Ramos,Tomás Almeida,Daniel Vareta,Filipe Azevedo,Sweta Agrawal,Patrick Fernandes,André F. T. Martins
発行日	2025-04-16 13:31:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー