NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis

要約

タイトル：

要約：

この論文は、SemEval-2023 Task 12「Twitterデータセットを使用した低資源多言語感情分析」のために開発されたシステムについて説明しています。感情分析は、自然言語処理で最も広く研究されているアプリケーションの1つです。しかし、これまでの研究のほとんどは、限られた数の高資源言語に焦点をあてています。低資源言語に対して信頼性の高い感情分析システムを構築することは、このタスクにおける限られたトレーニングデータのために依然として困難です。

この研究では、アフリカのテキストにおける言語適応およびタスク適応型のプリトレーニングを活用し、アフリカ言語中心の事前学習言語モデルの上にソース言語選択を使用した転移学習を調査することを提案しています。その結果、次のような重要な知見が得られました。

– 適切な少量コーパスを使用して、事前学習モデルをターゲット言語およびタスクに適応させることは、F1スコアポイントで10以上の改善をもたらします。
– 学習中に転移ゲインが高いソース言語を選択することで、似たような言語からの有害な干渉を回避し、多言語およびクロス言語設定においてより良い結果をもたらします。

共有タスクでは、本システムが15トラック中8トラックを獲得し、特に多言語評価において最高の成績を収めました。

要約(オリジナル)

This paper describes our system developed for the SemEval-2023 Task 12 ‘Sentiment Analysis for Low-resource African Languages using Twitter Dataset’. Sentiment analysis is one of the most widely studied applications in natural language processing. However, most prior work still focuses on a small number of high-resource languages. Building reliable sentiment analysis systems for low-resource languages remains challenging, due to the limited training data in this task. In this work, we propose to leverage language-adaptive and task-adaptive pretraining on African texts and study transfer learning with source language selection on top of an African language-centric pretrained language model. Our key findings are: (1) Adapting the pretrained model to the target language and task using a small yet relevant corpus improves performance remarkably by more than 10 F1 score points. (2) Selecting source languages with positive transfer gains during training can avoid harmful interference from dissimilar languages, leading to better results in multilingual and cross-lingual settings. In the shared task, our system wins 8 out of 15 tracks and, in particular, performs best in the multilingual evaluation.

arxiv情報

著者	Mingyang Wang,Heike Adel,Lukas Lange,Jannik Strötgen,Hinrich Schütze
発行日	2023-04-28 21:02:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー