Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

要約

聴取者の脳波 (EEG) 信号から聴取者の焦点の指向性を解読することは、聴覚障害を持つ個人の生活の質を向上させるブレインコンピューターインターフェイスの開発に不可欠です。
これまでの研究は、バイナリ指向性フォーカスデコード、つまり、参加している話者がリスナーの左側にいるのか右側にいるのかを判断することに集中していました。
ただし、効果的な音声処理には、話者の正確な方向をより正確にデコードする必要があります。
さらに、オーディオの空間情報が効果的に活用されておらず、デコード結果が最適ではありません。
この論文では、最近提示された 14 クラスの指向性焦点を備えたデータセットでは、EEG 入力のみに依存するモデルは、1 被験者抜きと 1 試行放置の両方で指向性焦点をデコードする際に精度が大幅に低いことがわかりました。
-アウトシナリオ。
オーディオ空間スペクトルを EEG 特徴と統合することにより、デコード精度を効果的に向上させることができます。
CNN、LSM-CNN、および Deformer モデルは、リスナーの EEG 信号とオーディオ空間スペクトルから指向性フォーカスをデコードするために使用されます。
提案された Sp-EEG-Deformer モデルは、1 秒の決定ウィンドウで、被験者を 1 名残すシナリオおよび試行 1 名を残すシナリオで、それぞれ 55.35% および 57.19% という顕著な 14 クラスのデコード精度を達成します。
実験結果は、代替方向の数が減少するにつれてデコード精度が向上することを示しています。
これらの発見は、私たちが提案したデュアルモーダル指向性フォーカス復号化戦略の有効性を示唆しています。

要約(オリジナル)

Decoding the directional focus of an attended speaker from listeners’ electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, it is found that on the recently presented dataset with 14-class directional focus, models relying exclusively on EEG inputs exhibit significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. The CNN, LSM-CNN, and Deformer models are employed to decode the directional focus from listeners’ EEG signals and audio spatial spectra. The proposed Sp-EEG-Deformer model achieves notable 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios with a decision window of 1 second, respectively. Experiment results indicate increased decoding accuracy as the number of alternative directions reduces. These findings suggest the efficacy of our proposed dual modal directional focus decoding strategy.

arxiv情報

著者	Yuanming Zhang,Jing Lu,Fei Chen,Haoliang Du,Xia Gao,Zhibin Lin
発行日	2025-01-09 13:56:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー