Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

要約

下流タスクのほとんどの事前訓練を受けたビジョン言語（VL）モデルとトレーニングデータは、英語でのみ利用できます。
したがって、多言語VLタスクは、横断的転送を使用して解決されます。多言語の事前訓練モデルを微調整するか、並列データを使用してテキストエンコーダーを転送します。
代替アプローチを研究します。並列データを使用して、すでに訓練されたエンコーダーを転送します。
並列データの効果：ドメインと言語の数を調査します。
私たちの結果は、機械翻訳されたタスクデータでさえ、平均して、キャプションのような本物の並列データが一部の言語でそれを上回ったことを示しています。
さらに、ほとんどの言語は多言語トレーニングの恩恵を受けることを示しています。

要約(オリジナル)

Most pre-trained Vision-Language (VL) models and training data for the downstream tasks are only available in English. Therefore, multilingual VL tasks are solved using cross-lingual transfer: fine-tune a multilingual pre-trained model or transfer the text encoder using parallel data. We study the alternative approach: transferring an already trained encoder using parallel data. We investigate the effect of parallel data: domain and the number of languages, which were out of focus in previous work. Our results show that even machine-translated task data are the best on average, caption-like authentic parallel data outperformed it in some languages. Further, we show that most languages benefit from multilingual training.

arxiv情報

著者	Andrei-Alexandru Manea,Jindřich Libovický
発行日	2025-04-30 14:19:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー