MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

要約

性差別は一般に、性別または性別に基づく偏見と差別として定義され、社会制度から人間関係や個人の行動まで、社会のあらゆる分野に影響を与えます。
ソーシャルメディアプラットフォームは、テキストだけでなく複数のモダリティ全体で差別的なコンテンツを伝えることにより、性差別の影響を増幅し、性差別のオンライン分析に対するマルチモーダルアプローチの重要な必要性を強調しています。
ユーザーが短いビデオを共有するソーシャルメディアプラットフォームの台頭により、性差別はビデオコンテンツを通じてますます広がっています。
ビデオで性差別を自動的に検出することは、性差別的な内容を特定するために口頭、オーディオ、視覚要素の組み合わせを分析する必要があるため、挑戦的な作業です。
この研究では、（1）TiktokとBitchuteから抽出された$ 11時間のビデオで構成される性差別検出のための新しいマルチモーダルスペインのデータセットであるMusedを紹介します。
（2）性差別的コンテンツと非セクシストコンテンツの分類におけるテキストおよびマルチモーダルラベルの貢献を分析するための革新的な注釈フレームワークを提案します。
（3）性差別検出のタスクに関するさまざまな大規模な言語モデル（LLM）とマルチモーダルLLMを評価します。
視覚情報は、人間とモデルの両方の性差別的な内容にラベルを付ける上で重要な役割を果たしていることがわかります。
モデルは明示的な性差別を効果的に検出します。
しかし、彼らはステレオタイプなど、アノテーターも低い一致を示す例などの暗黙のケースと格闘しています。
これは、暗黙の性差別を特定することは社会的および文化的文脈に依存するため、タスクの固有の困難を強調しています。

要約(オリジナル)

Sexism is generally defined as prejudice and discrimination based on sex or gender, affecting every sector of society, from social institutions to relationships and individual behavior. Social media platforms amplify the impact of sexism by conveying discriminatory content not only through text but also across multiple modalities, highlighting the critical need for a multimodal approach to the analysis of sexism online. With the rise of social media platforms where users share short videos, sexism is increasingly spreading through video content. Automatically detecting sexism in videos is a challenging task, as it requires analyzing the combination of verbal, audio, and visual elements to identify sexist content. In this study, (1) we introduce MuSeD, a new Multimodal Spanish dataset for Sexism Detection consisting of $\approx$ 11 hours of videos extracted from TikTok and BitChute; (2) we propose an innovative annotation framework for analyzing the contribution of textual and multimodal labels in the classification of sexist and non-sexist content; and (3) we evaluate a range of large language models (LLMs) and multimodal LLMs on the task of sexism detection. We find that visual information plays a key role in labeling sexist content for both humans and models. Models effectively detect explicit sexism; however, they struggle with implicit cases, such as stereotypes, instances where annotators also show low agreement. This highlights the inherent difficulty of the task, as identifying implicit sexism depends on the social and cultural context.

arxiv情報

著者	Laura De Grazia,Pol Pastells,Mauro Vázquez Chas,Desmond Elliott,Danae Sánchez Villegas,Mireia Farrús,Mariona Taulé
発行日	2025-04-15 13:16:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー