MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms

要約

ソーシャルメディア・プラットフォームは、テキスト、画像、動画を含むマルチモーダルな情報交換のハブであり、オンライン空間でのインタラクションに関連する情報や感情を機械が理解することを困難にしている。マルチモーダル大規模言語モデル（MLLM）は、このような課題に対する有望なソリューションとして登場しましたが、人間の感情や誤報のような複雑なコンテンツを正確に解釈することに苦戦しています。本稿では、マルチモーダルなソーシャルメディアコンテンツに対するMLLMの理解を評価するために設計された包括的なベンチマークであるMM-Socを紹介する。MM-Socは、著名なマルチモーダルデータセットをコンパイルし、新しい大規模なYouTubeタグ付けデータセットを組み込み、誤報検出、ヘイトスピーチ検出、社会的コンテキスト生成などの様々なタスクを対象とする。4つのオープンソースMLLMの10種類のサイズバリエーションに対する徹底的な評価を通じて、我々は性能の著しい格差を特定し、モデルの社会理解能力の進歩の必要性を強調した。我々の分析により、ゼロショット設定では、様々なタイプのMLLMが一般的にソーシャルメディアタスクの処理に困難を示すことが明らかになった。しかしながら、MLLMは微調整後に性能向上を示し、改善の可能性を示唆している。我々のコードとデータはhttps://github.com/claws-lab/MMSoc.git。

要約(オリジナル)

Social media platforms are hubs for multimodal information exchange, encompassing text, images, and videos, making it challenging for machines to comprehend the information or emotions associated with interactions in online spaces. Multimodal Large Language Models (MLLMs) have emerged as a promising solution to these challenges, yet they struggle to accurately interpret human emotions and complex content such as misinformation. This paper introduces MM-Soc, a comprehensive benchmark designed to evaluate MLLMs’ understanding of multimodal social media content. MM-Soc compiles prominent multimodal datasets and incorporates a novel large-scale YouTube tagging dataset, targeting a range of tasks from misinformation detection, hate speech detection, and social context generation. Through our exhaustive evaluation on ten size-variants of four open-source MLLMs, we have identified significant performance disparities, highlighting the need for advancements in models’ social understanding capabilities. Our analysis reveals that, in a zero-shot setting, various types of MLLMs generally exhibit difficulties in handling social media tasks. However, MLLMs demonstrate performance improvements post fine-tuning, suggesting potential pathways for improvement. Our code and data are available at https://github.com/claws-lab/MMSoc.git.

arxiv情報

著者	Yiqiao Jin,Minje Choi,Gaurav Verma,Jindong Wang,Srijan Kumar
発行日	2024-09-02 02:41:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー