MultiConAD: A Unified Multilingual Conversational Dataset for Early Alzheimer’s Detection

要約

認知症は、主要な原因としてアルツハイマー病（AD）を伴う進行性認知症候群です。
会話ベースの広告検出は、言語の機能不全はADの初期バイオマーカーであるため、臨床方法に代わる費用対効果の高い代替品を提供します。
ただし、ほとんどの以前の研究では、AD検出がバイナリ分類問題として組み立てられており、早期介入のための重要な段階である軽度の認知障害（MCI）を特定する能力が制限されています。
また、研究は主に単一言語のデータセット、主に英語で依存しており、言語間の一般化を制限しています。
このギャップに対処するために、3つの重要な貢献をします。
まず、16の公開されている認知症関連の会話データセットを統一することにより、広告検出のための新しい多言語データセットを紹介します。
このコーパスは、英語、スペイン語、中国語、ギリシャ語にまたがっており、さまざまな認知評価タスクから派生したオーディオデータとテキストデータの両方が組み込まれています。
第二に、MCIを含むより細かい分類を実行し、まばらで密なテキスト表現を使用してさまざまな分類器を評価します。
第三に、私たちは単一言語および多言語の設定で実験を実施し、一部の言語は多言語トレーニングの恩恵を受けるが、他の言語は独立してより良いパフォーマンスを発揮することを発見しました。
この研究は、多言語広告検出の課題を強調し、モデルの一般化と堅牢性を改善することを目的とした言語固有のアプローチと技術の両方に関する将来の研究を可能にします。

要約(オリジナル)

Dementia is a progressive cognitive syndrome with Alzheimer’s disease (AD) as the leading cause. Conversation-based AD detection offers a cost-effective alternative to clinical methods, as language dysfunction is an early biomarker of AD. However, most prior research has framed AD detection as a binary classification problem, limiting the ability to identify Mild Cognitive Impairment (MCI)-a crucial stage for early intervention. Also, studies primarily rely on single-language datasets, mainly in English, restricting cross-language generalizability. To address this gap, we make three key contributions. First, we introduce a novel, multilingual dataset for AD detection by unifying 16 publicly available dementia-related conversational datasets. This corpus spans English, Spanish, Chinese, and Greek and incorporates both audio and text data derived from a variety of cognitive assessment tasks. Second, we perform finer-grained classification, including MCI, and evaluate various classifiers using sparse and dense text representations. Third, we conduct experiments in monolingual and multilingual settings, finding that some languages benefit from multilingual training while others perform better independently. This study highlights the challenges in multilingual AD detection and enables future research on both language-specific approaches and techniques aimed at improving model generalization and robustness.

arxiv情報

著者	Arezo Shakeri,Mina Farmanbar,Krisztian Balog
発行日	2025-02-26 15:12:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MultiConAD: A Unified Multilingual Conversational Dataset for Early Alzheimer’s Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー