Benchmarking Rotary Position Embeddings for Automatic Speech Recognition

要約

Rotary Position Embedding (RoPE) は、シーケンス内の入力ベクトルに適用される回転行列を通じて、Transformer ベースのモデルの相対および絶対位置情報をエンコードします。
RoPE は、自然言語処理タスクにおいて他の位置埋め込みテクノロジと比較して優れたパフォーマンスを実証していますが、音声処理アプリケーションにおけるその有効性は依然として十分に研究されていません。
この研究では、さまざまな自動音声認識 (ASR) タスクにわたる RoPE の包括的な評価を実施します。
私たちの実験結果は、ASR タスクに関して、現在広く使用されている相対位置埋め込みと比較して、RoPE が一貫して低いエラー率を達成することを示しています。
さらなる研究を促進するために、SpeechBrain ツールキットを通じて実装とすべての実験レシピをリリースします。

要約(オリジナル)

Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models through rotation matrices applied to input vectors within sequences. While RoPE has demonstrated superior performance compared to other positional embedding technologies in natural language processing tasks, its effectiveness in speech processing applications remains understudied. In this work, we conduct a comprehensive evaluation of RoPE across diverse automatic speech recognition (ASR) tasks. Our experimental results demonstrate that for ASR tasks, RoPE consistently achieves lower error rates compared to the currently widely used relative positional embedding. To facilitate further research, we release the implementation and all experimental recipes through the SpeechBrain toolkit.

arxiv情報

著者	Shucong Zhang,Titouan Parcollet,Rogier van Dalen,Sourav Bhattacharya
発行日	2025-01-10 15:30:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking Rotary Position Embeddings for Automatic Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー