DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

要約

カクテルパーティーでは、人間は注意を向ける驚くべき能力を発揮します。
聴覚的注意検出 (AAD) アプローチは、EEG 信号などの脳信号を分析することによって、注目している話者を識別しようとします。
しかし、現在の AAD アルゴリズムは、EEG 信号内の空間分布情報を見落としており、長距離の潜在的な依存関係を捕捉する能力に欠けており、脳活動を解読するモデルの能力が制限されています。
これらの問題に対処するために、この論文では、DARNet と呼ばれる、AAD 用の時空間構築を備えたデュアルアテンションリファインメントネットワークを提案します。このネットワークは、時空間構築モジュール、デュアルアテンションリファインメントモジュール、および特徴融合および分類器モジュールで構成されます。
具体的には、時空間構築モジュールは、EEG信号の空間分布特性を捕捉することにより、より表現力豊かな時空間特徴表現を構築することを目的としています。
デュアルアテンション改良モジュールは、EEG 信号のさまざまなレベルの時間パターンを抽出し、長距離の潜在的な依存関係を捕捉するモデルの能力を強化することを目的としています。
特徴融合および分類子モジュールは、さまざまなレベルからの時間的パターンと依存関係を集約し、最終的な分類結果を取得することを目的としています。
実験結果は、最先端のモデルと比較して、DARNet は DTU データセット上で 0.1 秒で 5.9\%、1 秒で 4.6\%、2 秒で 3.9\% の平均分類精度の向上を達成していることを示しています。
DARNet は、優れた分類パフォーマンスを維持しながら、必要なパラメータの数を大幅に削減します。
最先端のモデルと比較して、DARNet はパラメータ数を 91\% 削減します。
コードは https://github.com/fchest/DARNet.git から入手できます。

要約(オリジナル)

At a cocktail party, humans exhibit an impressive ability to direct their attention. The auditory attention detection (AAD) approach seeks to identify the attended speaker by analyzing brain signals, such as EEG signals. However, current AAD algorithms overlook the spatial distribution information within EEG signals and lack the ability to capture long-range latent dependencies, limiting the model’s ability to decode brain activity. To address these issues, this paper proposes a dual attention refinement network with spatiotemporal construction for AAD, named DARNet, which consists of the spatiotemporal construction module, dual attention refinement module, and feature fusion \& classifier module. Specifically, the spatiotemporal construction module aims to construct more expressive spatiotemporal feature representations, by capturing the spatial distribution characteristics of EEG signals. The dual attention refinement module aims to extract different levels of temporal patterns in EEG signals and enhance the model’s ability to capture long-range latent dependencies. The feature fusion \& classifier module aims to aggregate temporal patterns and dependencies from different levels and obtain the final classification results. The experimental results indicate that compared to the state-of-the-art models, DARNet achieves an average classification accuracy improvement of 5.9\% for 0.1s, 4.6\% for 1s, and 3.9\% for 2s on the DTU dataset. While maintaining excellent classification performance, DARNet significantly reduces the number of required parameters. Compared to the state-of-the-art models, DARNet reduces the parameter count by 91\%. Code is available at: https://github.com/fchest/DARNet.git.

arxiv情報

著者	Sheng Yan,Cunhang fan,Hongyu Zhang,Xiaoke Yang,Jianhua Tao,Zhao Lv
発行日	2024-11-18 16:25:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー