Translation-Enhanced Multilingual Text-to-Image Generation

要約

他の言語では注釈付きの画像キャプションデータが不足しているため、テキストから画像への生成 (TTI) に関する研究は依然として主に英語に焦点を当てています。
長期的には、これにより TTI テクノロジーへの不公平なアクセスが拡大する可能性があります。
したがって、この研究では、多言語 TTI (mTTI と呼ばれる) と、mTTI システムをブートストラップするためのニューラル機械翻訳 (NMT) の現在の可能性を調査します。
私たちは 2 つの重要な貢献を提供します。
1) 多言語マルチモーダルエンコーダを利用して、翻訳トレイン、翻訳テスト、およびゼロショット転送など、mTTI に適用される場合に異言語 NLP で使用される標準手法の体系的な実証研究を提供します。
2) 我々は、mTTI フレームワーク内で多言語テキストの知識を評価して統合することを学習し、言語ギャップを緩和して mTTI のパフォーマンスを向上させる、新しいパラメーター効率の高いアプローチである Ensemble Adaptor (EnsAd) を提案します。
標準 mTTI データセット COCO-CN、Multi30K Task2、LAION-5B に対する評価は、翻訳強化 mTTI システムの可能性を実証し、すべてのデータセットにわたって一貫した利益をもたらす提案された EnsAd の利点も検証します。
モデルのバリアント、アブレーション研究、および定性分析に関するさらなる調査により、提案されている mTTI アプローチの内部動作に関する追加の洞察が得られます。

要約(オリジナル)

Research on text-to-image generation (TTI) still predominantly focuses on the English language due to the lack of annotated image-caption data in other languages; in the long run, this might widen inequitable access to TTI technology. In this work, we thus investigate multilingual TTI (termed mTTI) and the current potential of neural machine translation (NMT) to bootstrap mTTI systems. We provide two key contributions. 1) Relying on a multilingual multi-modal encoder, we provide a systematic empirical study of standard methods used in cross-lingual NLP when applied to mTTI: Translate Train, Translate Test, and Zero-Shot Transfer. 2) We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework, mitigating the language gap and thus improving mTTI performance. Our evaluations on standard mTTI datasets COCO-CN, Multi30K Task2, and LAION-5B demonstrate the potential of translation-enhanced mTTI systems and also validate the benefits of the proposed EnsAd which derives consistent gains across all datasets. Further investigations on model variants, ablation studies, and qualitative analyses provide additional insights on the inner workings of the proposed mTTI approaches.

arxiv情報

著者	Yaoyiran Li,Ching-Yun Chang,Stephen Rawls,Ivan Vulić,Anna Korhonen
発行日	2023-05-30 17:03:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Translation-Enhanced Multilingual Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー