Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

要約

感情認識は人間の会話を理解するために重要なタスクです。
言語、音声、表情などのマルチモーダルデータの概念では、さらに困難になります。
典型的な解決策として、グローバルおよびローカルのコンテキスト情報を利用して、対話内のすべての単一文、つまり発話の感情ラベルを予測します。
具体的には、会話レベルでのクロスモーダルインタラクションのモデリングを通じて、グローバルな表現をキャプチャできます。
局所的なものは、話者の一時的な情報や感情の変化を使用して推測されることが多く、発話レベルでの重要な要素が無視されます。
さらに、既存のアプローチのほとんどは、モダリティ固有の表現を利用せずに、統合された入力で複数のモダリティの融合された特徴を利用します。
これらの問題を動機として、我々は補助クロスモダリティインタラクション（CORECT）を備えたリレーショナル時間グラフニューラルネットワークを提案します。これは、モダリティ固有の方法で会話レベルのクロスモダリティインタラクションと発話レベルの時間依存性を効果的に捕捉する新しいニューラルネットワークフレームワークです。
会話理解のために。
広範な実験により、マルチモーダル ERC タスクの IEMOCAP および CMU-MOSEI データセットに関する最先端の結果を通じて CORECT の有効性が実証されています。

要約(オリジナル)

Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representation could be captured via modeling of cross-modal interactions at the conversation level. The local one is often inferred using the temporal information of speakers or emotional shifts, which neglects vital factors at the utterance level. Additionally, most existing approaches take fused features of multiple modalities in an unified input without leveraging modality-specific representations. Motivating from these problems, we propose the Relational Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction (CORECT), an novel neural network framework that effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies with the modality-specific manner for conversation understanding. Extensive experiments demonstrate the effectiveness of CORECT via its state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the multimodal ERC task.

arxiv情報

著者	Cam-Van Thi Nguyen,Anh-Tuan Mai,The-Son Le,Hai-Dang Kieu,Duc-Trong Le
発行日	2024-01-30 08:01:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー