Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

要約

ディープフェイクの顔の急増は、私たちの日常生活に大きな潜在的な悪影響をもたらします。
これらの年にわたるディープファーク検出の実質的な進歩にもかかわらず、目に見えないデータセットからの偽造または新たな生成モデルによって作成された既存の方法の一般化可能性は、拘束されたままです。
この論文では、視覚言語モデル（VLMS）のゼロショットの利点に触発されて、一般的なディープフェイク検出のためによく訓練されたVLMを再利用する新しいアプローチを提案します。
入力摂動を介してモデル予測を操作するパラダイムの再プログラミングモデルによって動機付けられているため、この方法は、内部パラメーターを調整せずに入力を操作することに基づいて、事前に訓練されたVLMモデル（たとえば、クリップ）を再プログラムできます。
まず、学習可能な視覚的摂動を使用して、ディープフェイク検出のための特徴抽出を改良します。
次に、顔の埋め込みの情報を活用して、サンプルレベルの適応テキストプロンプトを作成し、パフォーマンスを改善します。
いくつかの一般的なベンチマークデータセットでの広範な実験は、（1）ディープフェイク検出のクロスダタセットおよびクロスマニピュレーションパフォーマンスが大幅かつ一貫して改善できることを示しています（たとえば、FF ++からWildDeepfakeまでのクロスダタセット設定で88 \％AUCを超える）;
（2）優れたパフォーマンスは、トレーニング可能なパラメーターが少ないため達成されているため、実際のアプリケーションに対する有望なアプローチとなっています。

要約(オリジナル)

The proliferation of deepfake faces poses huge potential negative impacts on our daily lives. Despite substantial advancements in deepfake detection over these years, the generalizability of existing methods against forgeries from unseen datasets or created by emerging generative models remains constrained. In this paper, inspired by the zero-shot advantages of Vision-Language Models (VLMs), we propose a novel approach that repurposes a well-trained VLM for general deepfake detection. Motivated by the model reprogramming paradigm that manipulates the model prediction via input perturbations, our method can reprogram a pre-trained VLM model (e.g., CLIP) solely based on manipulating its input without tuning the inner parameters. First, learnable visual perturbations are used to refine feature extraction for deepfake detection. Then, we exploit information of face embedding to create sample-level adaptative text prompts, improving the performance. Extensive experiments on several popular benchmark datasets demonstrate that (1) the cross-dataset and cross-manipulation performances of deepfake detection can be significantly and consistently improved (e.g., over 88\% AUC in cross-dataset setting from FF++ to WildDeepfake); (2) the superior performances are achieved with fewer trainable parameters, making it a promising approach for real-world applications.

arxiv情報

著者	Kaiqing Lin,Yuzhen Lin,Weixiang Li,Taiping Yao,Bin Li
発行日	2025-04-11 13:57:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー