Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

要約

この論文では、超低電力スマートオーディオセンサーへの展開後に、パーソナライズされたキーワードスポッティング (KWS) モデルを段階的にトレーニング (微調整) するための自己学習フレームワークを提案します。
少数のユーザー録音に関する類似性スコアに基づいて、新しく録音されたオーディオフレームに擬似ラベルを割り当てることで、ラベル付きトレーニングデータが存在しないという根本的な問題に対処します。
2 つの公開データセットで最大 0.5M のパラメーターを含む複数の KWS モデルを実験したところ、大規模な汎用キーワードで事前トレーニングされた初期モデルと比較して、最大 +19.2% および +16.0% の精度向上が見られました。
ラベル付けタスクは、低電力マイクとエネルギー効率の高いマイクロコントローラー (MCU) で構成されるセンサーシステム上で実証されます。
MCU の異種処理エンジンを効率的に活用することにより、常時稼働のラベル付けタスクは、最大 8.2 mW の平均電力コストでリアルタイムに実行されます。
同じプラットフォーム上で、DS-CNN-S モデルまたは DS-CNN-M モデルを使用して 5 秒ごとまたは 16.4 秒ごとに新しい発話をサンプリングする場合、オンデバイストレーニングのエネルギーコストはラベル付けエネルギーの 10 分の 1 と推定されます。
私たちの経験的結果は、最先端の自己適応型パーソナライズ KWS センサーへの道を開きます。

要約(オリジナル)

This paper proposes a self-learning framework to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost of up to 8.2 mW. On the same platform, we estimate an energy cost for on-device training 10x lower than the labeling energy if sampling a new utterance every 5 s or 16.4 s with a DS-CNN-S or a DS-CNN-M model. Our empirical result paves the way to self-adaptive personalized KWS sensors at the extreme edge.

arxiv情報

著者	Manuele Rusci,Francesco Paci,Marco Fariselli,Eric Flamand,Tinne Tuytelaars
発行日	2024-08-22 15:17:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー