Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach

要約

直接音声翻訳 (ST) モデルは、まれな単語に苦戦することがよくあります。
これらの単語の翻訳が間違っていると、翻訳の品質とユーザーの信頼に影響を及ぼし、重大な結果を招く可能性があります。
まれな単語の翻訳は、学習信号がまばらであるため、ニューラルモデルにとって本質的に困難ですが、現実世界のシナリオでは、同様のトピックに関する過去の録音の翻訳にアクセスできることがよくあります。
これらの貴重なリソースを活用するために、直接 ST モデルにおけるレアワードの翻訳精度を向上させる検索とデモンストレーションのアプローチを提案します。
まず、既存の ST モデルを適応させて、まれな単語の翻訳用に取得した例を組み込みます。これにより、モデルはインコンテキスト学習と同様に、先頭に追加された例から恩恵を受けることができます。
次に、適切な例を見つけるためのクロスモーダル (音声から音声へ、音声からテキストへ、テキストからテキストへ) 検索ツールを開発します。
標準 ST モデルを効果的に適応させてレアワード翻訳の用例を活用することができ、レアワードの翻訳精度がベースラインよりもゴールドサンプルで 17.6%、検索されたサンプルで 8.5% 向上することを実証します。
さらに、私たちのスピーチツースピーチ検索アプローチは他のモダリティよりも優れており、見えない話者に対してより高い堅牢性を示します。
私たちのコードは公開されています (https://github.com/SiqiLii/Retrieve-and-Demonstration-ST)。

要約(オリジナル)

Direct speech translation (ST) models often struggle with rare words. Incorrect translation of these words can have severe consequences, impacting translation quality and user trust. While rare word translation is inherently challenging for neural models due to sparse learning signals, real-world scenarios often allow access to translations of past recordings on similar topics. To leverage these valuable resources, we propose a retrieval-and-demonstration approach to enhance rare word translation accuracy in direct ST models. First, we adapt existing ST models to incorporate retrieved examples for rare word translation, which allows the model to benefit from prepended examples, similar to in-context learning. We then develop a cross-modal (speech-to-speech, speech-to-text, text-to-text) retriever to locate suitable examples. We demonstrate that standard ST models can be effectively adapted to leverage examples for rare word translation, improving rare word translation accuracy over the baseline by 17.6% with gold examples and 8.5% with retrieved examples. Moreover, our speech-to-speech retrieval approach outperforms other modalities and exhibits higher robustness to unseen speakers. Our code is publicly available (https://github.com/SiqiLii/Retrieve-and-Demonstration-ST).

arxiv情報

著者	Siqi Li,Danni Liu,Jan Niehues
発行日	2024-10-01 13:06:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー