On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

要約

タイトル：エンドツーエンド音声強化システムのトレーニングにおける可変サイズのインプットのバッチ化についての考察

要約：
-ニューラルネットワークを用いた音声強化システムのパフォーマンスはモデルアーキテクチャに影響されるが、トレーニングパラメータ（バッチサイズなど）によっては、トレーニング時間と計算リソースの利用が影響を受ける。
-ノイズや残響のある音声ミックスチャーは異なる期間を持つことがあり、特に最新のエンドツーエンドシステムにおいてトレーニング中に可変サイズのインプットを処理するためのバッチ化戦略が必要である。
-これらの戦略は、ゼロパディングとデータのランダム化の折衷案を目指し、動的バッチサイズと組み合わせて、各バッチで一定量のデータが得られるようにすることが多い。
-しかし、これらの戦略がリソース利用やネットワークパフォーマンスにどのような影響を与えるかは十分に文書化されていない。
-本論文では、異なるバッチング戦略とバッチサイズが、Conv-TasNetのトレーニング統計と音声強化パフォーマンスに及ぼす影響を系統的に調べた。
-結果として、トレーニング中に小さいバッチサイズを使用することで、両方の条件においてパフォーマンスが向上することがわかった。
-さらに、ソートやバケット化されたバッチングを動的なバッチサイズで使用することで、ランダム化されたバッチングと比較して、トレーニング時間とGPUメモリ使用量を削減しながら、同等のパフォーマンスを達成することができる。

要約(オリジナル)

The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive for a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these strategies on resource utilization and more importantly network performance is not well documented. This paper systematically investigates the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.

arxiv情報

著者	Philippe Gonzalez,Tommy Sonne Alstrøm,Tobias May
発行日	2023-03-31 11:37:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー