Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation

要約

大規模な言語モデル（LLMS）は、機械翻訳（MT）を含むさまざまなNLPタスクで優れていますが、ほとんどの研究は文レベルの翻訳に焦点を当てています。
この作業では、ドキュメントレベルの翻訳（Docmt）の命令チューニングLLMの固有の能力を調査します。
専門的な手法を必要とする以前のアプローチとは異なり、LLMSを1回のパスでドキュメント全体を翻訳するように直接促すことで評価します。
私たちの結果は、この方法が、ドキュメントレベルの微調整がなくても、個別に翻訳文と比較して翻訳の品質を改善することを示しています。
ただし、この利点はBLEUスコアには反映されておらず、多くの場合、文ベースの翻訳を支持しています。
GPT-4を使用して、N-GRAMベースのメトリックよりも微妙な方法でドキュメントの一貫性、精度、および流encyさを評価するために、LLM-A-A-A-Judgeパラダイムを使用することを提案します。
全体として、私たちの仕事は、命令チューニングされたLLMが翻訳のためにドキュメントコンテキストを効果的に活用できることを示しています。
ただし、Docmtを評価するためにBLEUスコアを使用することに対して注意してください。誤解を招く結果を提供することが多く、ドキュメントレベルの翻訳の品質をキャプチャできないからです。
gpt4-as-a-judgeからのコードと出力は、https：//github.com/eit-nlp/bleuless_docmtで入手できます。

要約(オリジナル)

Large language models (LLMs) have excelled in various NLP tasks, including machine translation (MT), yet most studies focus on sentence-level translation. This work investigates the inherent capability of instruction-tuned LLMs for document-level translation (docMT). Unlike prior approaches that require specialized techniques, we evaluate LLMs by directly prompting them to translate entire documents in a single pass. Our results show that this method improves translation quality compared to translating sentences separately, even without document-level fine-tuning. However, this advantage is not reflected in BLEU scores, which often favor sentence-based translations. We propose using the LLM-as-a-judge paradigm for evaluation, where GPT-4 is used to assess document coherence, accuracy, and fluency in a more nuanced way than n-gram-based metrics. Overall, our work demonstrates that instruction-tuned LLMs can effectively leverage document context for translation. However, we caution against using BLEU scores for evaluating docMT, as they often provide misleading outcomes, failing to capture the quality of document-level translation. Code and the outputs from GPT4-as-a-judge are available at https://github.com/EIT-NLP/BLEUless_DocMT

arxiv情報

著者	Yirong Sun,Dawei Zhu,Yanjun Chen,Erjia Xiao,Xinghao Chen,Xiaoyu Shen
発行日	2025-03-14 13:12:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fine-Grained and Multi-Dimensional Metrics for Document-Level Machine Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー