Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies

要約

NLP の最近の進歩により、マルチモーダル入力の処理や特定のドメインでの優れた機能など、特化された強みを持つモデルが誕生しました。
ただし、マルチモーダル翻訳などの現実世界のタスクでは、翻訳と画像処理の両方を処理するなど、これらの強みの組み合わせが必要になることがよくあります。
個々の翻訳モデルとビジョンモデルは強力ですが、通常、単一システムで両方のタスクを実行する機能が不足しています。
これらのモデルを組み合わせると、特に語彙の違いにより課題が生じ、従来のアンサンブル手法の有効性が、N ベストリストの再ランキングなどの生成後の手法に限定されます。
この研究では、追加のトレーニングを必要とせずに、デコード段階でさまざまなモデルの統合を可能にする、新しいゼロショットアンサンブル戦略を提案します。
私たちのアプローチでは、ヒューリスティックを使用して単語が完成する時期を予測し、単語レベルでスコアを組み合わせることにより、デコード中にビームを再ランク付けします。
私たちは、機械翻訳のシナリオでこの方法の有効性を実証し、音声と画像の両方を認識した翻訳の生成が可能であると同時に、全体的な翻訳品質も向上することを示しています (論文の受理後にコードをリリースします)。

要約(オリジナル)

Recent advancements in NLP have resulted in models with specialized strengths, such as processing multimodal inputs or excelling in specific domains. However, real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing. While individual translation and vision models are powerful, they typically lack the ability to perform both tasks in a single system. Combining these models poses challenges, particularly due to differences in their vocabularies, which limit the effectiveness of traditional ensemble methods to post-generation techniques like N-best list re-ranking. In this work, we propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training. Our approach re-ranks beams during decoding by combining scores at the word level, using heuristics to predict when a word is completed. We demonstrate the effectiveness of this method in machine translation scenarios, showing that it enables the generation of translations that are both speech- and image-aware while also improving overall translation quality (We will release the code upon paper acceptance.).

arxiv情報

著者	Sai Koneru,Matthias Huck,Miriam Exel,Jan Niehues
発行日	2024-11-04 12:17:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー