Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

要約

推論中にニューラルモデルの計算負荷を動的に調整する機能は、限られた時間変動する計算リソースを特徴とするオンデバイス処理シナリオにとって非常に重要です。
有望な解決策は、追加の出口ブランチがエンコーダの中間層に追加される早期出口アーキテクチャによって提示されます。
自動音声認識 (ASR) のセルフアテンションモデルでは、早期終了アーキテクチャにより、サイズとアーキテクチャをさまざまなレベルの計算リソースと ASR パフォーマンス要求に適応できる動的モデルの開発が可能になります。
早期離脱 ASR モデルに関するこれまでの研究は、早期離脱損失を考慮して微調整された、事前トレーニングされた自己教師ありモデルに依存していました。
この論文では、事前にトレーニングされたバックボーンを微調整する場合と、早期終了の目的でモデルをゼロからトレーニングする場合との実験的な比較を行います。
公開データセットに対して行われた実験により、最初からトレーニングされた早期終了モデルは、使用するエンコーダー層が少ない場合でもパフォーマンスが維持されるだけでなく、単一終了モデルまたは事前トレーニングされたモデルと比較してタスクの精度が向上することが明らかになりました。
さらに、従来のフレームベースのエントロピーアプローチの代替として、事後確率に基づいた出口選択戦略を検討します。
結果は、ASR モデルの早期終了アーキテクチャのトレーニングダイナミクス、特にトレーニング戦略と終了選択方法の有効性についての洞察を提供します。

要約(オリジナル)

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands. Previous research on early-exiting ASR models has relied on pre-trained self-supervised models, fine-tuned with an early-exit loss. In this paper, we undertake an experimental comparison between fine-tuning pre-trained backbones and training models from scratch with the early-exiting objective. Experiments conducted on public datasets reveal that early-exit models trained from scratch not only preserve performance when using fewer encoder layers but also exhibit enhanced task accuracy compared to single-exit or pre-trained models. Furthermore, we explore an exit selection strategy grounded in posterior probabilities as an alternative to the conventional frame-based entropy approach. Results provide insights into the training dynamics of early-exit architectures for ASR models, particularly the efficacy of training strategies and exit selection methods.

arxiv情報

著者	George August Wright,Umberto Cappellazzo,Salah Zaiem,Desh Raj,Lucas Ondel Yang,Daniele Falavigna,Mohamed Nabih Ali,Alessio Brutti
発行日	2024-02-22 15:10:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー