MVTamperBench: Evaluating Robustness of Vision-Language Models

要約

マルチモーダル大規模言語モデル (MLLM) は、ビデオの理解において大きな進歩をもたらしましたが、敵対的な改ざんや操作に対する脆弱性は依然として解明されていません。
このギャップに対処するために、回転、マスキング、置換、反復、およびドロップという 5 つの一般的な改ざん手法に対する MLLM の堅牢性を体系的に評価するベンチマークである MVTamperBench を導入します。
3.4K のオリジナルビデオから構築され、19 のビデオタスクにわたる 17K 以上の改ざんクリップに拡張されました。
MVTamperBench は、空間的および時間的コヒーレンスにおける操作を検出するためにモデルに課題を与えます。
15 を超えるモデルファミリからの 45 の最近の MLLM を評価したところ、改ざんの種類によって復元力が大きく異なることが明らかになり、パラメータ数が大きいほど堅牢性が保証されるわけではないことがわかりました。
MVTamperBench は、クリックベイトの検出、有害なコンテンツ配信の防止、メディアプラットフォームでのポリシーの強制など、安全性が重要なアプリケーションにおける改ざん耐性のある MLLM を開発するための新しいベンチマークを設定します。
私たちは、信頼できるビデオの理解に関するオープンな研究を促進するために、すべてのコードとデータを公開します。
コード: https://amitbcp.github.io/MVTamperBench/ データ: https://huggingface.co/datasets/Srikant86/MVTamperBench

要約(オリジナル)

Multimodal Large Language Models (MLLMs) have driven major advances in video understanding, yet their vulnerability to adversarial tampering and manipulations remains underexplored. To address this gap, we introduce MVTamperBench, a benchmark that systematically evaluates MLLM robustness against five prevalent tampering techniques: rotation, masking, substitution, repetition, and dropping. Built from 3.4K original videos-expanded to over 17K tampered clips spanning 19 video tasks. MVTamperBench challenges models to detect manipulations in spatial and temporal coherence. We evaluate 45 recent MLLMs from 15+ model families, revealing substantial variability in resilience across tampering types and showing that larger parameter counts do not necessarily guarantee robustness. MVTamperBench sets a new benchmark for developing tamper-resilient MLLM in safety-critical applications, including detecting clickbait, preventing harmful content distribution, and enforcing policies on media platforms. We release all code and data to foster open research in trustworthy video understanding. Code: https://amitbcp.github.io/MVTamperBench/ Data: https://huggingface.co/datasets/Srikant86/MVTamperBench

arxiv情報

著者	Amit Agarwal,Srikant Panda,Angeline Charles,Bhargava Kumar,Hitesh Patel,Priyaranjan Pattnayak,Taki Hasan Rafi,Tejaswini Kumar,Dong-Kyu Chae
発行日	2025-01-17 18:18:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MVTamperBench: Evaluating Robustness of Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー