A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

要約

アルツハイマー病（AD）は進行性の神経疾患であり、長年にわたって徐々に症状が進行することを意味します。また、認知症の主な原因でもあり、記憶力、思考力、精神力などに影響を及ぼします。現在、研究者の関心は、時間効率のよい手順である自然発話からのAD検知に移っています。しかし、マルチモーダルアプローチを提案する既存の最先端研究では、モダルの相互作用が考慮されておらず、早期および後期の融合アプローチを提案しています。これらの限界に取り組むため、我々は、エンドツーエンドで学習可能で、かつ、モーダル間・モーダル内の相互作用を捉えることができるディープニューラルネットワークを提案します。まず、各オーディオファイルは3つのチャンネル、すなわち、log-Melスペクトログラム、デルタ、デルタデルタからなる画像に変換される。次に、各音訳はBERTモデルとゲート付き自己認識層を通過させる。同様に、各画像はスウィン・トランスフォーマーと、それに続く独立したゲート付き自己注意層を通過する。音響的特徴も各オーディオファイルから抽出される。最後に、異なるモダリティからの表現ベクトルは、モダリティ間の相互作用を捉えるためにテンソル融合層に送られる。ADReSSチャレンジデータセットを用いて行った広範な実験により、我々の導入したアプローチは、既存の研究イニシアチブに対して価値ある利点を獲得し、精度とF1スコアはそれぞれ86.25%と85.48%に達したことが示されている。

要約(オリジナル)

Alzheimer’s disease (AD) is a progressive neurological disorder, meaning that the symptoms develop gradually throughout the years. It is also the main cause of dementia, which affects memory, thinking skills, and mental abilities. Nowadays, researchers have moved their interest towards AD detection from spontaneous speech, since it constitutes a time-effective procedure. However, existing state-of-the-art works proposing multimodal approaches do not take into consideration the inter- and intra-modal interactions and propose early and late fusion approaches. To tackle these limitations, we propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions. Firstly, each audio file is converted to an image consisting of three channels, i.e., log-Mel spectrogram, delta, and delta-delta. Next, each transcript is passed through a BERT model followed by a gated self-attention layer. Similarly, each image is passed through a Swin Transformer followed by an independent gated self-attention layer. Acoustic features are extracted also from each audio file. Finally, the representation vectors from the different modalities are fed to a tensor fusion layer for capturing the inter-modal interactions. Extensive experiments conducted on the ADReSS Challenge dataset indicate that our introduced approaches obtain valuable advantages over existing research initiatives reaching Accuracy and F1-score up to 86.25% and 85.48% respectively.

arxiv情報

著者	Loukas Ilias,Dimitris Askounis,John Psarras
発行日	2022-11-08 16:43:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー