BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression

要約

大規模な言語モデル（LLM）ベースの自然言語生成評価における最近の進歩は、単一例のプロンプトに大きく焦点を当てており、重要なトークンオーバーヘッドと計算の非効率性をもたらしています。
この作業では、機械翻訳評価のためにGemba-MQMメトリックとバッチプロンプトを統合するフレームワークであるBatchGemba-MQMを紹介します。
私たちのアプローチは、複数の翻訳の例を単一のプロンプトに集約し、トークンの使用量を単一例のプロンプトと比較して2〜4回（バッチサイズに応じて）削減します。
さらに、平均で13〜15％の追加のトークン削減を達成すると同時に、バッチ誘発性の品質分解を緩和する能力を示すバッチアウェアプロンプト圧縮モデルを提案します。
いくつかのLLMS（GPT-4O、GPT-4O-MINI、Mistral Small、Phi4、およびCommandR7B）にわたる評価は、バッチが一般的に品質に悪影響を与える一方で、迅速な圧縮はそれ以上に低下しないことを明らかにしていますが、場合によっては、品質損失を回復します。
たとえば、GPT-4Oは、圧縮のない44.6％の低下と比較して、圧縮が適用されるときに4のバッチサイズでベースラインパフォーマンスの90％以上を保持します。
このドメインでの将来の研究をサポートするために、https://github.com/nl2g/batchgembaでコードとトレーニングモデルをリリースする予定です。

要約(オリジナル)

Recent advancements in Large Language Model (LLM)-based Natural Language Generation evaluation have largely focused on single-example prompting, resulting in significant token overhead and computational inefficiencies. In this work, we introduce BatchGEMBA-MQM, a framework that integrates batched prompting with the GEMBA-MQM metric for machine translation evaluation. Our approach aggregates multiple translation examples into a single prompt, reducing token usage by 2-4 times (depending on the batch size) relative to single-example prompting. Furthermore, we propose a batching-aware prompt compression model that achieves an additional token reduction of 13-15% on average while also showing ability to help mitigate batching-induced quality degradation. Evaluations across several LLMs (GPT-4o, GPT-4o-mini, Mistral Small, Phi4, and CommandR7B) and varying batch sizes reveal that while batching generally negatively affects quality (but sometimes not substantially), prompt compression does not degrade further, and in some cases, recovers quality loss. For instance, GPT-4o retains over 90% of its baseline performance at a batch size of 4 when compression is applied, compared to a 44.6% drop without compression. We plan to release our code and trained models at https://github.com/NL2G/batchgemba to support future research in this domain.

arxiv情報

著者	Daniil Larionov,Steffen Eger
発行日	2025-03-04 16:20:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー