Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

要約

プライバシーを保護し、法的規制を満たすために、自動音声認識 (ASR) や音声翻訳 (ST) を含む音声テキスト変換 (S2T) システムのトレーニングにおいてフェデレーテッドラーニング (FL) が大きな注目を集めています。
ただし、S2T タスクで一般的に使用される FL アプローチ (\textsc{FedAvg}) は通常、モデル全体に基づくマルチラウンドインタラクションと、クライアント間のデータの異質性によって引き起こされるパフォーマンスの低下による広範な通信オーバーヘッドに悩まされます。これらの問題に対処するには
として、通信オーバーヘッドを最小限に抑えるためのクライアント側の調整とサーバーとの対話のための軽量 LoRA モジュールである \textsc{FedLoRA} と、$k を備えたグローバルモデルである \textsc{FedMem} を導入する、パーソナライズされたフェデレーション S2T フレームワークを提案します。
$-nearest-neighbor ($k$NN) 分類子は、クライアント固有の分布の変化を捕捉してパーソナライゼーションを実現し、データの異質性を克服します。
CoVoST および GigaSpeech ベンチマークでの Conformer および Whisper バックボーンモデルに基づく広範な実験により、私たちのアプローチがすべての S2T タスクの通信オーバーヘッドを大幅に削減し、グローバルモデルを効果的にパーソナライズしてデータの異質性を克服できることがわかりました。

要約(オリジナル)

To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc{FedLoRA}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc{FedMem}, a global model equipped with a $k$-nearest-neighbor ($k$NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST and GigaSpeech benchmarks show that our approach significantly reduces the communication overhead on all S2T tasks and effectively personalizes the global model to overcome data heterogeneity.

arxiv情報

著者	Yichao Du,Zhirui Zhang,Linan Yue,Xu Huang,Yuqing Zhang,Tong Xu,Linli Xu,Enhong Chen
発行日	2024-01-18 15:39:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー