Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

要約

ウェイクワード検出は、ほとんどのインテリジェントホームおよびポータブルデバイスに存在します。
これらのデバイスは、低コストの電力とコンピューティングで、呼び出されたときに「ウェイクアップ」する機能を提供します。
このペーパーでは、一般的なフレーズに応答するウェイクワードシステムの開発におけるアライメントの役割を理解することに焦点を当てています。
3 つのアプローチについて説明します。
1 つ目はアライメントベースで、モデルはフレーム単位のクロスエントロピーを使用してトレーニングされます。
2 つ目はアライメントフリーで、モデルは CTC でトレーニングされます。
3 つ目は、私たちが提案するハイブリッドソリューションです。モデルは、整列されたデータの小さなセットでトレーニングされ、その後、整列されていないかなりのデータセットで調整されます。
3 つのアプローチを比較し、ハイブリッドトレーニングに対するさまざまな調整済みと非調整の比率の影響を評価します。
私たちの結果は、ターゲット操作点に関してアライメントなしのシステムがアライメントベースのシステムよりも優れたパフォーマンスを発揮し、データのごく一部 (20%) で初期の制約に準拠するモデルをトレーニングできることを示しています。

要約(オリジナル)

Wake word detection exists in most intelligent homes and portable devices. It offers these devices the ability to ‘wake up’ when summoned at a low cost of power and computing. This paper focuses on understanding alignment’s role in developing a wake-word system that answers a generic phrase. We discuss three approaches. The first is alignment-based, where the model is trained with frame-wise cross-entropy. The second is alignment-free, where the model is trained with CTC. The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset. We compare the three approaches and evaluate the impact of the different aligned-to-unaligned ratios for hybrid training. Our results show that the alignment-free system performs better than the alignment-based for the target operating point, and with a small fraction of the data (20%), we can train a model that complies with our initial constraints.

arxiv情報

著者	Vinicius Ribeiro,Yiteng Huang,Yuan Shangguan,Zhaojun Yang,Li Wan,Ming Sun
発行日	2023-06-07 15:04:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー