Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

要約

音声言語翻訳の微妙な性質、通訳者が適用する戦略、ユーザーの多様な期待を考慮すると、通訳サービスのパフォーマンスを評価することは複雑な作業です。
自動評価方法が適用されると、このタスクの複雑さはさらに顕著になります。
インタプリタが採用する戦略により、解釈されたテキストはソース言語とターゲット言語の間の線形性が低いため、これは特に当てはまります。
本研究は、同時通訳評価における自動指標の信頼性を、人間による評価との相関を分析することで評価することを目的としている。
私たちは、翻訳の品質の特定の特徴、つまり翻訳の正確さや忠実さに重点を置いています。
ベンチマークとして、言語専門家による人による評価を使用し、文の埋め込みと大規模言語モデルがどの程度相関しているかを評価します。
参考訳に頼らずに、原文と訳文の意味上の類似性を定量化します。
この結果は、GPT モデル、特に直接プロンプトを備えた GPT-3.5 が、短いテキストセグメントを評価する場合でも、原文と訳文の意味上の類似性に関して人間の判断と最も強い相関関係を示していることを示唆しています。
さらに、この研究では、コンテキストウィンドウのサイズがこの相関関係に顕著な影響を与えていることが明らかになりました。

要約(オリジナル)

Assessing the performance of interpreting services is a complex task, given the nuanced nature of spoken language translation, the strategies that interpreters apply, and the diverse expectations of users. The complexity of this task become even more pronounced when automated evaluation methods are applied. This is particularly true because interpreted texts exhibit less linearity between the source and target languages due to the strategies employed by the interpreter. This study aims to assess the reliability of automatic metrics in evaluating simultaneous interpretations by analyzing their correlation with human evaluations. We focus on a particular feature of interpretation quality, namely translation accuracy or faithfulness. As a benchmark we use human assessments performed by language experts, and evaluate how well sentence embeddings and Large Language Models correlate with them. We quantify semantic similarity between the source and translated texts without relying on a reference translation. The results suggest GPT models, particularly GPT-3.5 with direct prompting, demonstrate the strongest correlation with human judgment in terms of semantic similarity between source and target texts, even when evaluating short textual segments. Additionally, the study reveals that the size of the context window has a notable impact on this correlation.

arxiv情報

著者	Xiaoman Wang,Claudio Fantinuoli
発行日	2024-06-14 14:47:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー