FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

要約

生成 AI の急速な発展は諸刃の剣であり、コンテンツ作成が容易になるだけでなく、画像操作が容易になる一方で、検出が困難になります。
現在の画像偽造検出および位置特定 (IFDL) 手法は一般に効果的ですが、次の 2 つの課題に直面する傾向があります。 \textbf{1)} 未知の検出原理を持つブラックボックスの性質、\textbf{2)} 多様な改ざん手法にわたる一般化が制限されている (
例: Photoshop、DeepFake、AIGC 編集)。
これらの問題に対処するために、私たちは説明可能なIFDLタスクを提案し、画像の信頼性を評価し、改ざん領域マスクを生成し、ピクセルレベルおよび画像レベルの改ざんの手がかりに基づいて判断基準を提供できるマルチモーダルフレームワークであるFakeShieldを設計します。
さらに、GPT-4o を活用して既存の IFDL データセットを強化し、FakeShield の改ざん分析機能をトレーニングするためのマルチモーダル改ざん記述データセット (MMTD-Set) を作成します。
一方、ドメインタグに基づく説明可能な偽造検出モジュール (DTE-FDM) とマルチモーダル偽造位置特定モジュール (MFLM) を組み込んで、さまざまなタイプの改ざん検出解釈に対処し、詳細なテキスト記述に基づいて偽造位置特定を実現します。
広範な実験により、FakeShield がさまざまな改ざん手法を効果的に検出して位置を特定し、以前の IFDL 手法と比較して説明可能で優れたソリューションを提供することが実証されました。

要約(オリジナル)

The rapid development of generative AI is a double-edged sword, which not only facilitates content creation but also makes image manipulation easier and more difficult to detect. Although current image forgery detection and localization (IFDL) methods are generally effective, they tend to face two challenges: \textbf{1)} black-box nature with unknown detection principle, \textbf{2)} limited generalization across diverse tampering methods (e.g., Photoshop, DeepFake, AIGC-Editing). To address these issues, we propose the explainable IFDL task and design FakeShield, a multi-modal framework capable of evaluating image authenticity, generating tampered region masks, and providing a judgment basis based on pixel-level and image-level tampering clues. Additionally, we leverage GPT-4o to enhance existing IFDL datasets, creating the Multi-Modal Tamper Description dataSet (MMTD-Set) for training FakeShield’s tampering analysis capabilities. Meanwhile, we incorporate a Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and a Multi-modal Forgery Localization Module (MFLM) to address various types of tamper detection interpretation and achieve forgery localization guided by detailed textual descriptions. Extensive experiments demonstrate that FakeShield effectively detects and localizes various tampering techniques, offering an explainable and superior solution compared to previous IFDL methods.

arxiv情報

著者	Zhipei Xu,Xuanyu Zhang,Runyi Li,Zecheng Tang,Qing Huang,Jian Zhang
発行日	2024-11-05 13:14:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー