Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

要約

特に低リソース言語では、多言語性能を高める自動音声認識（ASR）のデコード段階を中心とした新しいアプローチを提示します。
横断的な埋め込みクラスタリング法を利用して、階層的なソフトマックス（h-softmax）デコーダーを構築します。これにより、異なる言語で同様のトークンが同様のデコーダー表現を共有できます。
これは、トークンの類似性評価の浅い機能に依存していた以前のHuffmanベースのH-Softmaxメソッドの制限に対処します。
15の言語のダウンサンプリングデータセットの実験を通じて、低リソースの多言語ASR精度を改善する際のアプローチの有効性を実証します。

要約(オリジナル)

We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.

arxiv情報

著者	Zhengdong Yang,Qianying Liu,Sheng Li,Fei Cheng,Chenhui Chu
発行日	2025-01-29 12:44:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー