CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

要約

ソーシャルメディアには多様な皮肉が溢れており、テキストや画像のモダリティには直接的には現れない暗黙の違和感があるため、皮肉の対象を特定することは特に困難です。
マルチモーダル皮肉ターゲット識別 (MSTI) の現在の方法は、主にエンドツーエンドの方法で表面的な指標に焦点を当てており、テキストと画像の両方を通じて伝わるマルチモーダル皮肉の微妙な理解を無視しています。
この論文では、推論と事前トレーニングの知識によって皮肉の説明可能性を強化することにより、粗いパラダイムから細かいパラダイムを備えた汎用性の高い MSTI フレームワークを提案します。
マルチモーダル推論に関する大規模マルチモーダルモデル (LMM) の強力な能力に触発されて、私たちはまず LMM を利用して、マルチモーダル皮肉検出に関する小規模言語モデルのより粒度の粗い事前トレーニングのための競合する理論的根拠を生成します。
次に、よりきめ細かい皮肉のターゲットを識別するためにモデルを微調整することを提案します。
したがって、私たちのフレームワークは、マルチモーダルな皮肉の中にある複雑なターゲットを巧みに明らかにし、LMM に固有の潜在的なノイズによってもたらされる悪影響を軽減することができます。
実験結果は、私たちのモデルが最先端の MSTI 手法をはるかに上回り、皮肉の解読においても説明可能性を顕著に示すことを示しています。

要約(オリジナル)

Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.

arxiv情報

著者	Hongzhan Lin,Zixin Chen,Ziyang Luo,Mingfei Cheng,Jing Ma,Guang Chen
発行日	2024-05-01 08:44:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー