Two Stacks Are Better Than One: A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

要約

事前トレーニング済み言語モデル (PLM) は優れたパフォーマンスを示し、NLP コミュニティの注目を集めています。
したがって、事前トレーニングにおけるベストプラクティスを確立することが、NLP 研究の多くにとって主要な焦点となっています。特に、単言語英語モデル用に開発された洞察を、より複雑な多言語モデルに適用する必要がないためです。
現在の最先端技術に関する重要な注意点の 1 つは、異なる研究が比較できることはほとんどないということです。研究では、異なるパラメータ数、トレーニングデータ、および評価方法が議論されることがよくあります。
この論文では、制御された方法論的環境における多言語の事前トレーニング目標の比較を提案します。
私たちはトレーニングデータとモデルアーキテクチャが比較可能であることを確認し、調査と微調整シナリオで観察された 6 つの言語にわたるダウンストリームパフォーマンスについて議論します。
私たちは 2 つの重要な観察を行っています。(1) アーキテクチャによって、どの事前トレーニング目標が最適であるかが決まります。
(2) 多言語翻訳は、適切な条件下では非常に効果的な事前トレーニング目標です。
コード、データ、モデルの重みは \texttt{\url{https://github.com/Helsinki-NLP/lm-vs-mt}} で入手できます。

要約(オリジナル)

Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community. Establishing the best practices in pretraining has therefore become a major point of focus for much of NLP research — especially since the insights developed for monolingual English models need not carry to more complex multilingual. One significant caveat of the current state of the art is that different works are rarely comparable: they often discuss different parameter counts, training data, and evaluation methodology. This paper proposes a comparison of multilingual pretraining objectives in a controlled methodological environment. We ensure that training data and model architectures are comparable, and discuss the downstream performances across 6 languages that we observe in probing and fine-tuning scenarios. We make two key observations: (1) the architecture dictates which pretraining objective is optimal; (2) multilingual translation is a very effective pre-training objective under the right conditions. We make our code, data, and model weights available at \texttt{\url{https://github.com/Helsinki-NLP/lm-vs-mt}}.

arxiv情報

著者	Zihao Li,Shaoxiong Ji,Timothee Mickus,Vincent Segonne,Jörg Tiedemann
発行日	2024-07-22 09:16:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Two Stacks Are Better Than One: A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー