Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation

要約

脳波検査 (EEG) などの非侵襲的な脳信号から音声を解読することは、音声障害を持つ個人のためのサイレントコミュニケーションや支援技術に応用できるブレインコンピューターインターフェイス (BCI) を進歩させる可能性があります。
しかし、EEG ベースの音声デコードは、ノイズの多いデータ、限られたデータセット、音声認識などの複雑なタスクのパフォーマンスの低下など、大きな課題に直面しています。
この研究では、EEG データ拡張に変分オートエンコーダ (VAE) を採用してデータ品質を向上させ、もともと筋電図検査 (EMG) で成功していた最先端 (SOTA) シーケンスツーシーケンス深層学習アーキテクチャを適用することで、これらの課題に対処しようとしています。
) タスクを EEG ベースの音声デコードに変換します。
さらに、このアーキテクチャを単語分類タスクに適応させます。
ナレーションによる音声を聞いている被験者の脳波記録を含むブレナンデータセットを使用して、データを前処理し、脳波から単語/文へのタスクの分類モデルとシーケンスからシーケンスへのモデルの両方を評価します。
私たちの実験は、VAE が拡張のために人工 EEG データを再構築する可能性があることを示しています。
一方、私たちのシーケンスツーシーケンスモデルは、分類モデルと比較して文生成においてより有望なパフォーマンスを達成していますが、どちらも依然として困難なタスクです。
これらの発見は、EEG音声知覚デコードに関する将来の研究の基礎となり、無言または想像上の音声などの音声生成タスクへの拡張の可能性があります。

要約(オリジナル)

Decoding speech from non-invasive brain signals, such as electroencephalography (EEG), has the potential to advance brain-computer interfaces (BCIs), with applications in silent communication and assistive technologies for individuals with speech impairments. However, EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception. This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality and applying a state-of-the-art (SOTA) sequence-to-sequence deep learning architecture, originally successful in electromyography (EMG) tasks, to EEG-based speech decoding. Additionally, we adapt this architecture for word classification tasks. Using the Brennan dataset, which contains EEG recordings of subjects listening to narrated speech, we preprocess the data and evaluate both classification and sequence-to-sequence models for EEG-to-words/sentences tasks. Our experiments show that VAEs have the potential to reconstruct artificial EEG data for augmentation. Meanwhile, our sequence-to-sequence model achieves more promising performance in generating sentences compared to our classification model, though both remain challenging tasks. These findings lay the groundwork for future research on EEG speech perception decoding, with possible extensions to speech production tasks such as silent or imagined speech.

arxiv情報

著者	Terrance Yu-Hao Chen,Yulin Chen,Pontus Soederhaell,Sadrishya Agrawal,Kateryna Shapovalenko
発行日	2025-01-08 08:55:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー