Whispers of Sound-Enhancing Information Extraction from Depression Patients’ Unstructured Data through Audio and Text Emotion Recognition and Llama Fine-tuning

要約

この研究では、うつ病の分類の精度を高めるために、教師と学生のアーキテクチャに基づいた革新的なマルチモーダル融合モデルを提案しています。
設計されたモデルは、マルチヘッドの注意メカニズムと加重マルチモーダル転送学習を導入することにより、特徴の融合とモダリティの重量配分における従来の方法の制限に対処します。
DAIC-WOZデータセットを活用して、テキストおよび聴覚教師モデルに導かれた学生融合モデルは、分類の精度を大幅に改善します。
アブレーション実験は、提案されたモデルがテストセットで99。1％のF1スコアを達成し、単峰性および従来のアプローチを大幅に上回ることを示しています。
私たちの方法は、テキスト機能とオーディオ機能の相補性を効果的にキャプチャしながら、一般化機能を強化するために教師モデルの貢献を動的に調整します。
実験結果は、複雑なマルチモーダルデータの処理における提案されたフレームワークの堅牢性と適応性を強調しています。
この研究は、うつ病分析におけるマルチモーダル大規模モデル学習のための新しい技術的枠組みを提供し、モダリティ融合と特徴抽出における既存の方法の制限に対処するための新しい洞察を提供します。

要約(オリジナル)

This study proposes an innovative multimodal fusion model based on a teacher-student architecture to enhance the accuracy of depression classification. Our designed model addresses the limitations of traditional methods in feature fusion and modality weight allocation by introducing multi-head attention mechanisms and weighted multimodal transfer learning. Leveraging the DAIC-WOZ dataset, the student fusion model, guided by textual and auditory teacher models, achieves significant improvements in classification accuracy. Ablation experiments demonstrate that the proposed model attains an F1 score of 99. 1% on the test set, significantly outperforming unimodal and conventional approaches. Our method effectively captures the complementarity between textual and audio features while dynamically adjusting the contributions of the teacher models to enhance generalization capabilities. The experimental results highlight the robustness and adaptability of the proposed framework in handling complex multimodal data. This research provides a novel technical framework for multimodal large model learning in depression analysis, offering new insights into addressing the limitations of existing methods in modality fusion and feature extraction.

arxiv情報

著者	Lindy Gan,Yifan Huang,Xiaoyang Gao,Jiaming Tan,Fujun Zhao,Tao Yang
発行日	2025-01-28 09:30:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Whispers of Sound-Enhancing Information Extraction from Depression Patients’ Unstructured Data through Audio and Text Emotion Recognition and Llama Fine-tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー