DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation

要約

ドキュメントレベルのコンテキストは、テキスト間ドキュメントレベルの機械翻訳（MT）の談話の課題を処理するために重要です。
自動音声認識（ASR）からのノイズによって導入された談話の課題の増加にもかかわらず、音声翻訳（ST）におけるドキュメントレベルのコンテキストの統合は不十分なままです。
このペーパーでは、ドキュメントレベルのコンテキストを組み込むことでSTパフォーマンスを向上させるオンラインフレームワークであるDociaを開発します。
Dociaは、STパイプラインを4つの段階に分解します。
ドキュメントレベルのコンテキストは、補助LLM（大手言語モデル）ベースのモジュールを介して、ASR洗練、MT、およびMT洗練の段階に統合されています。
さらに、Dociaは、計算オーバーヘッドを最小限に抑えながら、ドキュメントレベルの情報をマルチレベルで活用します。
さらに、幻覚が過度に洗練されたことを防ぎ、最終結果の信頼性を確保するために、シンプルでありながら効果的な決定メカニズムが導入されています。
実験結果は、Dociaが4つのLLMにわたる文と談話の両方のメトリックの従来のSTベースラインを大幅に上回り、STパフォーマンスの向上におけるその効果を示していることを示しています。

要約(オリジナル)

Document-level context is crucial for handling discourse challenges in text-to-text document-level machine translation (MT). Despite the increased discourse challenges introduced by noise from automatic speech recognition (ASR), the integration of document-level context in speech translation (ST) remains insufficiently explored. In this paper, we develop DoCIA, an online framework that enhances ST performance by incorporating document-level context. DoCIA decomposes the ST pipeline into four stages. Document-level context is integrated into the ASR refinement, MT, and MT refinement stages through auxiliary LLM (large language model)-based modules. Furthermore, DoCIA leverages document-level information in a multi-level manner while minimizing computational overhead. Additionally, a simple yet effective determination mechanism is introduced to prevent hallucinations from excessive refinement, ensuring the reliability of the final results. Experimental results show that DoCIA significantly outperforms traditional ST baselines in both sentence and discourse metrics across four LLMs, demonstrating its effectiveness in improving ST performance.

arxiv情報

著者	Xinglin Lyu,Wei Tang,Yuang Li,Xiaofeng Zhao,Ming Zhu,Junhui Li,Yunfei Lu,Min Zhang,Daimeng Wei,Hao Yang,Min Zhang
発行日	2025-04-07 14:26:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー