Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

要約

最近、画像からテキストへの生成とテキストから画像への生成の両方の観点から、マルチモーダル学習が大幅に増加しています。
ただし、その成功は通常英語に限定されており、他の言語は大きく後れを取っています。
英語以外のマルチモーダルデータはリソースが少ないという性質がある (つまり、大規模で高品質の画像テキストデータが不足している) ため、他の言語で競争力のある対応言語を構築することは非常に困難です。
この研究では、低リソース言語で大規模なマルチモーダルモデルをトレーニングするための効果的なトレーニングパラダイムである MPM を提案します。
MPM は、多言語言語モデルが言語間でゼロショットマルチモーダル学習をピボットできることを実証します。
具体的には、強力な多言語大規模言語モデルに基づいて、英語のみの画像テキストデータで事前トレーニングされたマルチモーダルモデルは、画像からテキストへの生成とテキストから画像への生成の両方において、ゼロショット方式で他の言語にうまく一般化できます。
母国語の画像テキストデータでトレーニングされたモデルを上回ります。
MPM の実践として中国語を採用し、画像からテキストへの生成およびテキストから画像への生成で大規模なマルチモーダルモデル VisCPM を構築し、中国語で最先端の (オープンソース) パフォーマンスを実現します。
将来の研究を促進するために、https://github.com/OpenBMB/VisCPM.git でコードとモデルの重みをオープンソースにします。

要約(オリジナル)

Recently there has been a significant surge in multimodal learning in terms of both image-to-text and text-to-image generation. However, the success is typically limited to English, leaving other languages largely behind. Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i.e., lack of large-scale, high-quality image-text data). In this work, we propose MPM, an effective training paradigm for training large multimodal models in low-resource languages. MPM demonstrates that Multilingual language models can Pivot zero-shot Multimodal learning across languages. Specifically, based on a strong multilingual large language model, multimodal models pretrained on English-only image-text data can well generalize to other languages in a zero-shot manner for both image-to-text and text-to-image generation, even surpassing models trained on image-text data in native languages. Taking Chinese as a practice of MPM, we build large multimodal models VisCPM in image-to-text and text-to-image generation, which achieve state-of-the-art (open-source) performance in Chinese. To facilitate future research, we open-source codes and model weights at https://github.com/OpenBMB/VisCPM.git.

arxiv情報

著者	Jinyi Hu,Yuan Yao,Chongyi Wang,Shan Wang,Yinxu Pan,Qianyu Chen,Tianyu Yu,Hanghao Wu,Yue Zhao,Haoye Zhang,Xu Han,Yankai Lin,Jiao Xue,Dahai Li,Zhiyuan Liu,Maosong Sun
発行日	2023-08-23 09:55:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー