DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

要約

Multimodal Sarcasm Understanding (MSU) は、世論分析や捏造検出などのニュース分野で幅広い用途に使用できます。
ただし、既存の MSU ベンチマークとアプローチは通常、文レベルの MSU に焦点を当てています。
ドキュメントレベルのニュースでは、皮肉の手がかりはまばらであるか小さく、長いテキストの中に隠されていることがよくあります。
さらに、主に少数のトレンドや話題のトピック（スポーツイベントなど）のみに焦点を当てたツイートのような文章レベルのコメントと比較して、ニュースの内容はかなり多様です。
文レベルの MSU 用に作成されたモデルは、文書レベルのニュースの皮肉の手がかりを捕捉できない可能性があります。
このギャップを埋めるために、Document-level Multimodal Sarcasm Understanding (DocMSU) の包括的なベンチマークを紹介します。
私たちのデータセットには、健康、ビジネスなど 9 つの多様なトピックをカバーする、テキストと画像のペアを含む 102,588 件のニュースが含まれています。提案された大規模で多様な DocMSU は、現実世界のシナリオにおけるドキュメントレベルの MSU の研究を大幅に促進します。
DocMSU が提起する新たな課題に取り組むために、ドキュメント内のピクセルレベルの画像の特徴と単語レベルのテキストの特徴を適切に調整するためのきめの細かい皮肉理解方法を導入します。
実験は私たちの方法の有効性を実証し、この方法が困難な DocMSU に対するベースラインアプローチとして機能できることを示しています。
コードとデータセットは https://github.com/Dulpy/DocMSU で入手できます。

要約(オリジナル)

Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection. However, existing MSU benchmarks and approaches usually focus on sentence-level MSU. In document-level news, sarcasm clues are sparse or small and are often concealed in long text. Moreover, compared to sentence-level comments like tweets, which mainly focus on only a few trends or hot topics (e.g., sports events), content in the news is considerably diverse. Models created for sentence-level MSU may fail to capture sarcasm clues in document-level news. To fill this gap, we present a comprehensive benchmark for Document-level Multimodal Sarcasm Understanding (DocMSU). Our dataset contains 102,588 pieces of news with text-image pairs, covering 9 diverse topics such as health, business, etc. The proposed large-scale and diverse DocMSU significantly facilitates the research of document-level MSU in real-world scenarios. To take on the new challenges posed by DocMSU, we introduce a fine-grained sarcasm comprehension method to properly align the pixel-level image features with word-level textual features in documents. Experiments demonstrate the effectiveness of our method, showing that it can serve as a baseline approach to the challenging DocMSU. Our code and dataset are available at https://github.com/Dulpy/DocMSU.

arxiv情報

著者	Hang Du,Guoshun Nan,Sicheng Zhang,Binzhu Xie,Junrui Xu,Hehe Fan,Qimei Cui,Xiaofeng Tao,Xudong Jiang
発行日	2023-12-26 12:24:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー