Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes

要約

ミームは、視覚的要素とテキスト要素を統合してユーモア、風刺、文化的メッセージを伝える強力なコミュニケーション形式として登場しました。
既存の研究は主に、感情の分類、ミームの生成、伝播、解釈、比喩的な言語、社会言語学などの側面に焦点を当ててきましたが、より深いミームの理解やミームテキストの検索はしばしば見落とされてきました。
これらのギャップに対処するために、この研究では ClassicMemes-50-templates (CM50) を導入しました。これは、50 の人気のあるミームテンプレートを中心とした 33,000 を超えるミームで構成される大規模なデータセットです。
また、大規模なビジョン言語モデルを活用して、手動アノテーションの労働集約的な要求を克服して、高品質の画像キャプション、ミームキャプション、および文学的装置ラベルを生成する、自動化された知識ベースのアノテーションパイプラインも紹介します。
さらに、クロスモーダル埋め込みを利用してミーム分析を強化し、検索パフォーマンスを大幅に向上させるミームテキスト検索 CLIP モデル (mtrCLIP) を提案します。
私たちの貢献には、(1) 大規模なミーム研究のための新しいデータセット、(2) スケーラブルなミーム注釈フレームワーク、および (3) ミームテキスト検索のための微調整された CLIP が含まれます。これらはすべて、ミームの理解と分析を進めることを目的としています。
大規模なミーム。

要約(オリジナル)

Memes have emerged as a powerful form of communication, integrating visual and textual elements to convey humor, satire, and cultural messages. Existing research has focused primarily on aspects such as emotion classification, meme generation, propagation, interpretation, figurative language, and sociolinguistics, but has often overlooked deeper meme comprehension and meme-text retrieval. To address these gaps, this study introduces ClassicMemes-50-templates (CM50), a large-scale dataset consisting of over 33,000 memes, centered around 50 popular meme templates. We also present an automated knowledge-grounded annotation pipeline leveraging large vision-language models to produce high-quality image captions, meme captions, and literary device labels overcoming the labor intensive demands of manual annotation. Additionally, we propose a meme-text retrieval CLIP model (mtrCLIP) that utilizes cross-modal embedding to enhance meme analysis, significantly improving retrieval performance. Our contributions include:(1) a novel dataset for large-scale meme study, (2) a scalable meme annotation framework, and (3) a fine-tuned CLIP for meme-text retrieval, all aimed at advancing the understanding and analysis of memes at scale.

arxiv情報

著者	Shiling Deng,Serge Belongie,Peter Ebert Christensen
発行日	2025-01-23 17:18:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー