Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

要約

音声認識と話者識別は、セキュリティやパーソナルアシスタントのアプリケーションにとって不可欠です。
この論文では、最小限のデータセットで話者識別を実行するように設計された軽量の 1D 畳み込みニューラルネットワーク (1D-CNN) について説明します。
私たちのアプローチは、データ拡張技術を活用してバックグラウンドノイズと限られたトレーニングサンプルを処理することで、97.87% の検証精度を達成します。
今後の改善には、より大規模なデータセットでのテストや転移学習手法の統合による一般化可能性の強化が含まれます。
再現性を高めるために、すべてのコード、カスタムデータセット、トレーニングされたモデルを提供します。
これらのリソースは、GitHub リポジトリ https://github.com/IrfanNafiz/RecMe で入手できます。

要約(オリジナル)

Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets. Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples. Future improvements include testing on larger datasets and integrating transfer learning methods to enhance generalizability. We provide all code, the custom dataset, and the trained models to facilitate reproducibility. These resources are available on our GitHub repository: https://github.com/IrfanNafiz/RecMe.

arxiv情報

著者	Irfan Nafiz Shahan,Pulok Ahmed Auvi
発行日	2024-11-22 17:18:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー