Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

要約

本論文では，MuSe-Humor, MuSe-Reaction, MuSe-Stress Sub-challenges を含むMultimodal Sentiment Analysis Challenge (MuSe) 2022に対する我々の解決策を紹介する．MuSe 2022は、異なるモダリティとデータセットを利用したユーモア検出、感情反応、マルチモーダル感情ストレスに焦点を当てます。我々の研究では、音響、視覚、テキスト、生物学的特徴など、様々な種類のマルチモーダル特徴を抽出する。これらの特徴は、TEMMAとGRUによって、自己注視機構のフレームワークと融合される。本論文では、1) 精度向上のために、いくつかの新しい音声特徴、顔の表情特徴、段落レベルのテキスト埋め込みを抽出する。2) マルチモーダル感情予測の精度と信頼性を、マルチモーダル特徴のマイニングとブレンドにより大幅に向上させる。3) モデル学習において、効果的なデータ補強戦略を適用し、サンプルの不均衡の問題を緩和し、モデルが偏った主語の文字を学習するのを防ぐ。MuSe-Humorサブチャレンジにおいて，我々のモデルは0.8932のAUCスコアを獲得した．MuSe-Reactionサブチャレンジでは，テストセットにおける我々のアプローチのピアソン相関係数は0.3879であり，他のすべての参加者を凌駕している．MuSe-Stressサブチャレンジでは、テストデータセットにおいて、我々のアプローチは覚醒度、価数ともにベースラインを上回り、最終的に0.5151という複合結果に達した。

要約(オリジナル)

In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilising different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability for multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model form learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson’s Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151.

arxiv情報

著者	Jia Li,Ziyang Zhang,Junjie Lang,Yueqi Jiang,Liuwei An,Peng Zou,Yangyang Xu,Sheng Gao,Jie Lin,Chunxiao Fan,Xiao Sun,Meng Wang
発行日	2022-08-05 09:07:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー