Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

要約

音声とテキストを活用するマルチモーダル感情認識 (MER) は、人間とコンピューターのインタラクションの中で極めて重要な領域として浮上しており、効果的なマルチモーダル統合のための高度な方法が求められています。
これらのモダリティ全体で特徴を調整するという課題は重大であり、既存のアプローチのほとんどは単一の調整戦略を採用しています。
このように焦点が狭いと、モデルのパフォーマンスが制限されるだけでなく、感情表現に固有の複雑さと曖昧さに対処できなくなります。
これに応えて、このホワイトペーパーでは、ディストリビューションベース、インスタンスベース、およびトークンベースのアラインメントモジュールを含む包括的なアプローチによって特徴付けられる、Multi-Granularity Cross-Modal Alignment (MGCMA) フレームワークを紹介します。
このフレームワークにより、モダリティ全体で感情情報をマルチレベルで認識できるようになります。
IEMOCAP に関する私たちの実験は、私たちが提案した方法が現在の最先端技術よりも優れていることを示しています。

要約(オリジナル)

Multimodal emotion recognition (MER), leveraging speech and text, has emerged as a pivotal domain within human-computer interaction, demanding sophisticated methods for effective multimodal integration. The challenge of aligning features across these modalities is significant, with most existing approaches adopting a singular alignment strategy. Such a narrow focus not only limits model performance but also fails to address the complexity and ambiguity inherent in emotional expressions. In response, this paper introduces a Multi-Granularity Cross-Modal Alignment (MGCMA) framework, distinguished by its comprehensive approach encompassing distribution-based, instance-based, and token-based alignment modules. This framework enables a multi-level perception of emotional information across modalities. Our experiments on IEMOCAP demonstrate that our proposed method outperforms current state-of-the-art techniques.

arxiv情報

著者	Xuechen Wang,Shiwan Zhao,Haoqin Sun,Hui Wang,Jiaming Zhou,Yong Qin
発行日	2024-12-30 09:30:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー