E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

要約

タイトル: テキスト画像機械翻訳のための効率的で効果的なモーダルアダプターE2TIMT
要約:

– テキスト画像機械翻訳（TIMT）は、画像に埋め込まれたテキストを、1つのソース言語から別のターゲット言語に翻訳することを目的としています。
– 既存の2段階のカスケードおよび1段階のエンドツーエンドアーキテクチャの両方には、異なる問題があります。
– カスケードモデルは、大規模な光学式文字認識（OCR）およびMTデータセットから利益を得ることができますが、2段階アーキテクチャは冗長です。
– エンドトゥーエンドモデルは効率的ですが、トレーニングデータの不足に悩んでいます。
– このため、我々の論文では、既存のOCRとMTデータセットからの知識を十分に活用するEnd-to-End TIMTモデルを提案しています。具体的には、OCRエンコーダとMTデコーダを効果的に接続する新しいモーダルアダプターを構築しました。
– エンドツーエンドTIMT損失とクロスモーダルコントラスティブ損失を結合して、OCRタスクとMTタスクの特徴分布を整列します。
– 幅広い実験は、提案された方法が、より軽量で高速なアーキテクチャを備えた既存の2段階カスケードモデルと1段階エンドツーエンドモデルを上回ることを示しています。
– さらに、アブレーションスタディでは、提案されたモーダルアダプターが、さまざまなOCRとMTモデルを接続するのに有効であることが検証されています。

要約(オリジナル)

Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to-end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are utilized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.

arxiv情報

著者	Cong Ma,Yaping Zhang,Mei Tu,Yang Zhao,Yu Zhou,Chengqing Zong
発行日	2023-05-09 04:25:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー