Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

要約

コーパス横断的音声感情認識(SER)は、ラベル付けされたコーパスからラベル付けされていないコーパスに音声感情を推測する能力を一般化しようとするものである。既存の手法は、教師なし領域適応（UDA）に基づくものが一般的であり、大域的な分布整列によってコーパス不変な特徴量を学習することに苦労しているが、残念ながら、得られた特徴量はコーパス固有の特徴量と混在していたり、クラス識別性がなかったりする。このような課題に対処するため、我々は、感情関連コーパス不変特徴量を学習する新しいUDA手法である、コーパス横断SERのための新しいEmotion Decoupling aNd Alignment学習フレームワーク（EMO-DNA）を提案する。EMO-DNAの特徴は2つある：対照的感情分離と二重レベル感情アライメントである。一方、対照的感情分離は、コーパス固有特徴から感情関連特徴の分離可能性を強化するために、対照的分離損失によって分離学習を実現する。また、コーパスレベルのアライメントを行うことで、コーパス間でクラス識別可能なコーパス不変特徴量を学習するモデルを導く。広範な実験結果により、EMO-DNAが複数のコーパスを横断するシナリオにおいて、最先端の手法よりも優れた性能を発揮することが実証されている。ソースコードはhttps://github.com/Jiaxin-Ye/Emo-DNA。

要約(オリジナル)

Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, but unfortunately, the resulting features are mixed with corpus-specific features or not class-discriminative. To tackle these challenges, we propose a novel Emotion Decoupling aNd Alignment learning framework (EMO-DNA) for cross-corpus SER, a novel UDA method to learn emotion-relevant corpus-invariant features. The novelties of EMO-DNA are two-fold: contrastive emotion decoupling and dual-level emotion alignment. On one hand, our contrastive emotion decoupling achieves decoupling learning via a contrastive decoupling loss to strengthen the separability of emotion-relevant features from corpus-specific ones. On the other hand, our dual-level emotion alignment introduces an adaptive threshold pseudo-labeling to select confident target samples for class-level alignment, and performs corpus-level alignment to jointly guide model for learning class-discriminative corpus-invariant features across corpora. Extensive experimental results demonstrate the superior performance of EMO-DNA over the state-of-the-art methods in several cross-corpus scenarios. Source code is available at https://github.com/Jiaxin-Ye/Emo-DNA.

arxiv情報

著者	Jiaxin Ye,Yujie Wei,Xin-Cheng Wen,Chenglong Ma,Zhizhong Huang,Kunhong Liu,Hongming Shan
発行日	2023-08-04 08:15:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー