Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

要約

スイスドイツ語の書き言葉用のニューラルテキストエンコーダの作成は、トレーニングデータの不足と方言のバリエーションのせいで困難です。
このペーパーでは、いくつかの既存の多言語エンコーダを構築し、継続的な事前トレーニングを使用してそれらをスイスドイツ語に適応させます。
3 つの多様なダウンストリームタスクの評価では、スイスドイツ製アダプターをモジュラーエンコーダーに追加するだけで、完全にモノリシックな適応パフォーマンスの 97.5% が達成されることが示されています。
さらに、標準ドイツ語のクエリを与えられたスイスドイツ語の文を取得するタスクでは、文字レベルのモデルを適応させる方が他の適応戦略よりも効果的であることがわかりました。
コードと実験用にトレーニングされたモデルは https://github.com/ZurichNLP/swiss-german-text-encoders で公開しています。

要約(オリジナル)

Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments at https://github.com/ZurichNLP/swiss-german-text-encoders

arxiv情報

著者	Jannis Vamvas,Noëmi Aepli,Rico Sennrich
発行日	2024-01-25 18:59:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー