Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios

要約

フェデレーテッドラーニングは、分散型かつプライベートな設計であるため、機械学習コミュニティで成長している分野です。
フェデレーションラーニングのモデルトレーニングは複数のクライアントに分散され、プライバシーを維持しながら多くのクライアントデータにアクセスできるようになります。
次に、サーバーはこれらの複数のクライアントで行われたトレーニングをデータにアクセスせずに集約します。これらのデータは、ユーザーの感情を表現するためにソーシャルメディアサービスやインスタントメッセージングプラットフォームで広く使用されている絵文字になる可能性があります。
この論文では、クリーンシナリオと攻撃シナリオの両方におけるフェデレーテッドラーニングベースの多言語絵文字予測を提案します。
絵文字予測データは、Twitter と SemEval 絵文字データセットの両方からクロールされました。
このデータは、すべてのクライアントのクリーンデータまたは一部のクライアントのラベルフリッピング攻撃による有害なデータのいずれかを前提とした、まばらにアクティブ化されたトランスを含む、さまざまなトランスモデルサイズのトレーニングと評価に使用されます。
これらのモデルの実験結果は、クリーンなシナリオまたは攻撃されたシナリオのどちらでもフェデレーテッドラーニングが、さまざまなデータソースと分布の下で、目に見える言語と目に見えない言語での多言語絵文字予測における集中トレーニングと同様に機能することを示しています。
当社のトレーニング済みトランスフォーマーは、プライバシーとフェデレーテッドラーニングの分散型メリットに加えて、SemEval 絵文字データセット上で他の手法よりも優れたパフォーマンスを発揮します。

要約(オリジナル)

Federated learning is a growing field in the machine learning community due to its decentralized and private design. Model training in federated learning is distributed over multiple clients giving access to lots of client data while maintaining privacy. Then, a server aggregates the training done on these multiple clients without access to their data, which could be emojis widely used in any social media service and instant messaging platforms to express users’ sentiments. This paper proposes federated learning-based multilingual emoji prediction in both clean and attack scenarios. Emoji prediction data have been crawled from both Twitter and SemEval emoji datasets. This data is used to train and evaluate different transformer model sizes including a sparsely activated transformer with either the assumption of clean data in all clients or poisoned data via label flipping attack in some clients. Experimental results on these models show that federated learning in either clean or attacked scenarios performs similarly to centralized training in multilingual emoji prediction on seen and unseen languages under different data sources and distributions. Our trained transformers perform better than other techniques on the SemEval emoji dataset in addition to the privacy as well as distributed benefits of federated learning.

arxiv情報

著者	Karim Gamal,Ahmed Gaber,Hossam Amer
発行日	2023-07-07 00:51:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Federated Learning Based Multilingual Emoji Prediction In Clean and Attack Scenarios

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー