CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

要約

マルチモーダル感情分析は、テキスト、画像、音声などの複数のデータモダリティを組み合わせて人間の感情を分析し、さまざまなアプリケーションに利益をもたらす活発な研究分野です。
既存のマルチモーダル感情分析方法は、モダリティ相互作用ベースの方法、モダリティ変換ベースの方法、およびモダリティ類似性ベースの方法に分類できます。
ただし、これらの方法のほとんどはモダリティ間の強い相関関係に大きく依存しており、モダリティ間の相関関係を完全に明らかにして利用してセンチメント分析を強化することはできません。
したがって、これらの方法では通常、相関関係が弱いマルチモーダルデータのセンチメントを特定するパフォーマンスが悪くなります。
この問題に対処するために、事前トレーニング段階と予測段階で構成される相関認識マルチモーダルトランスフォーマー (CorMulT) と呼ばれる 2 段階の半教師ありモデルを提案しました。
事前トレーニング段階では、モダリティ相関対比学習モジュールが、異なるモダリティ間のモダリティ相関係数を効率的に学習するように設計されています。
予測段階では、学習された相関係数がモダリティ表現と融合されて感情予測が行われます。
人気のマルチモーダルデータセット CMU-MOSEI での実験によると、CorMulT は明らかに最先端のマルチモーダル感情分析手法を上回っています。

要約(オリジナル)

Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions and benefits a variety of applications. Existing multimodal sentiment analysis methods can be classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve bad performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses state-of-the-art multimodal sentiment analysis methods.

arxiv情報

著者	Yangmin Li,Ruiqi Zhu,Wengen Li
発行日	2024-07-09 17:07:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー