Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

要約

近年のソーシャルメディア投稿の急増に伴い、マルチモーダル (画像とテキスト) コンテンツから感情を検出する必要性が急速に高まっています。
投稿はユーザーが作成したものであるため、同じ投稿の画像とテキストが異なる感情や矛盾した感情を表現する可能性があり、\textbf{感情の不一致}が生じる可能性があります。
しかし、既存の作品は主に画像とテキストの間の一貫した感情を捉える単分岐融合構造を採用しています。
矛盾した感情を無視したり、暗黙的にモデリングしたりすると、単峰性のエンコーディングが損なわれ、パフォーマンスが制限されます。
この論文では、上記の問題を解決するためにセマンティクスの補完と分解 (CoDe) ネットワークを提案します。
セマンティクス補完モジュールでは、画像とテキスト表現を、画像に埋め込まれた OCR テキストのセマンティクスで補完し、感情のギャップを埋めるのに役立ちます。
セマンティクス分解モジュールでは、排他的射影と対比学習を使用して画像とテキスト表現を分解し、それによってモダリティ間の不一致の感情を明示的に捕捉します。
最後に、相互注意によって画像とテキストの表現を融合し、学習した不一致の感情と組み合わせて最終的な分類を行います。
4 つのマルチモーダル感情データセットに対して行われた広範な実験により、SOTA 手法に対する CoDe の優位性が実証されました。

要約(オリジナル)

With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that primarily captures the consistent sentiment between image and text. The ignorance or implicit modeling of discrepant sentiment results in compromised unimodal encoding and limited performances. In this paper, we propose a semantics Completion and Decomposition (CoDe) network to resolve the above issue. In the semantics completion module, we complement image and text representations with the semantics of the OCR text embedded in the image, helping bridge the sentiment gap. In the semantics decomposition module, we decompose image and text representations with exclusive projection and contrastive learning, thereby explicitly capturing the discrepant sentiment between modalities. Finally, we fuse image and text representations by cross-attention and combine them with the learned discrepant sentiment for final classification. Extensive experiments conducted on four multimodal sentiment datasets demonstrate the superiority of CoDe against SOTA methods.

arxiv情報

著者	Daiqing Wu,Dongbao Yang,Huawen Shen,Can Ma,Yu Zhou
発行日	2024-07-09 16:46:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー