FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

要約

顔の認識システムをプレゼンテーション攻撃から保護するためには、顔のアンチスプーフィング（FAS）が重要です。
以前の方法は、このタスクに分類の問題としてアプローチしましたが、予測された結果の背後にある解釈性と推論が欠けていました。
最近、マルチモーダルの大手言語モデル（MLLM）は、視覚タスクにおける知覚、推論、意思決定において強力な能力を示しています。
ただし、現在、FASタスク用に特別に設計された普遍的で包括的なMLLMおよびデータセットはありません。
このギャップに対処するために、FASのMLLMであるFACESHIELDを提案します。また、対応するプリトレーニングおよび監視付きの微調整（SFT）データセット、Faceshield-PRE10KおよびFaceshield-SFT45Kを提案します。
Faceshieldは、顔の信頼性を決定し、スプーフィング攻撃の種類を特定し、その判断の推論を提供し、攻撃領域を検出することができます。
具体的には、事前知識に基づいて元の画像と補助情報の両方を組み込んだスプーフィングアウェアビジョン認識（SAVP）を採用しています。
次に、Visionトークンをランダムにマスクするために、プロンプトガイド付きVisionトークンマスキング（PVTM）戦略を使用して、モデルの一般化能力を改善します。
3つのベンチマークデータセットで広範な実験を実施し、Faceshieldが4つのFASタスクで以前の深い学習モデルと一般的なMLLMを大幅に上回ることを実証しました。
命令データセット、プロトコル、およびコードはまもなくリリースされます。

要約(オリジナル)

Face anti-spoofing (FAS) is crucial for protecting facial recognition systems from presentation attacks. Previous methods approached this task as a classification problem, lacking interpretability and reasoning behind the predicted results. Recently, multimodal large language models (MLLMs) have shown strong capabilities in perception, reasoning, and decision-making in visual tasks. However, there is currently no universal and comprehensive MLLM and dataset specifically designed for FAS task. To address this gap, we propose FaceShield, a MLLM for FAS, along with the corresponding pre-training and supervised fine-tuning (SFT) datasets, FaceShield-pre10K and FaceShield-sft45K. FaceShield is capable of determining the authenticity of faces, identifying types of spoofing attacks, providing reasoning for its judgments, and detecting attack areas. Specifically, we employ spoof-aware vision perception (SAVP) that incorporates both the original image and auxiliary information based on prior knowledge. We then use an prompt-guided vision token masking (PVTM) strategy to random mask vision tokens, thereby improving the model’s generalization ability. We conducted extensive experiments on three benchmark datasets, demonstrating that FaceShield significantly outperforms previous deep learning models and general MLLMs on four FAS tasks, i.e., coarse-grained classification, fine-grained classification, reasoning, and attack localization. Our instruction datasets, protocols, and codes will be released soon.

arxiv情報

著者	Hongyang Wang,Yichen Shi,Zhuofu Tao,Yuhao Gao,Liepiao Zhang,Xun Lin,Jun Feng,Xiaochen Yuan,Zitong Yu,Xiaochun Cao
発行日	2025-05-14 14:10:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー