Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection

要約

憎しみのあるミームはインターネット上の重要な懸念となっており、堅牢な自動検出システムが必要です。
大規模なマルチモーダルモデルは、さまざまなタスクにわたって強い一般化を示していますが、新たな社会的傾向と壊れたニュースに結びついたミームの動的な性質のために、憎悪なミーム検出に不十分な一般化を示します。
最近の研究では、このコンテキストでの大規模なマルチモーダルモデルの従来の監視付き微調整の限界をさらに強調しています。
これらの課題に対処するために、ドメイン内の精度とクロスドメインの一般化の両方を改善するために設計された新しい2段階の微調整フレームワークである大規模なマルチモードモデル検索誘導対照学習（LMM-RGCL）を提案します。
6つの広く使用されているMEME分類データセットの実験結果は、LMM-RGCLがVPD-PALI-X-55Bなどの最先端のパフォーマンスを上回るエージェントベースのシステムを達成することを示しています。
さらに、私たちの方法は、GPT-4Oのようなモデルを上回る、低リソース設定の下で領域外のミームに効果的に一般化します。

要約(オリジナル)

Hateful memes have become a significant concern on the Internet, necessitating robust automated detection systems. While large multimodal models have shown strong generalization across various tasks, they exhibit poor generalization to hateful meme detection due to the dynamic nature of memes tied to emerging social trends and breaking news. Recent work further highlights the limitations of conventional supervised fine-tuning for large multimodal models in this context. To address these challenges, we propose Large Multimodal Model Retrieval-Guided Contrastive Learning (LMM-RGCL), a novel two-stage fine-tuning framework designed to improve both in-domain accuracy and cross-domain generalization. Experimental results on six widely used meme classification datasets demonstrate that LMM-RGCL achieves state-of-the-art performance, outperforming agent-based systems such as VPD-PALI-X-55B. Furthermore, our method effectively generalizes to out-of-domain memes under low-resource settings, surpassing models like GPT-4o.

arxiv情報

著者	Jingbiao Mei,Jinghong Chen,Guangyu Yang,Weizhe Lin,Bill Byrne
発行日	2025-02-18 17:07:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improved Fine-Tuning of Large Multimodal Models for Hateful Meme Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー