E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

要約

【タイトル】テキスト画像機械翻訳のための効率的かつ効果的なモーダルアダプターE2TIMT

【要約】

– TIMTは、画像に埋め込まれたテキストを、1つのソース言語から別のターゲット言語に翻訳することを目的としています。
– 既存の方法には、2段階のカスケードモデルと1段階のエンドツーエンドモデルがありますが、それぞれ異なる問題があります。
– カスケードモデルは、大規模なOCR（光学文字認識）と機械翻訳のデータセットからメリットを得ることができますが、2段階のアーキテクチャは冗長です。
– エンドツーエンドモデルは効率的ですが、トレーニングデータの不足に悩まされています。
– 本論文では、既存のOCRおよびMTデータセットからの知識を完全に活用し、効果的で効率的なフレームワークを追求するエンドツーエンドTIMTモデルを提案しています。
– 具体的には、OCRエンコーダーとMTデコーダーを効果的に橋渡しする新しいモーダルアダプターを構築しました。
– エンドツーエンドTIMTロスとクロスモーダルコントラスティブロスを共同で使用して、OCRとMTタスクの特徴分布を整列させました。
– 幅広い実験により、提案手法がより軽量で高速なアーキテクチャを持つ既存の2段階カスケードモデルと1段階エンドツーエンドモデルを上回っていることが示されました。
– さらに、ablation studiesにより、提案されたモーダルアダプターが様々なOCRおよびMTモデルを橋渡しするのに効果的であることが確認されました。

要約(オリジナル)

Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end-to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to-end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are utilized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.

arxiv情報

著者	Cong Ma,Yaping Zhang,Mei Tu,Yang Zhao,Yu Zhou,Chengqing Zong
発行日	2023-05-10 02:37:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー