Deep Convolutional Pooling Transformer for Deepfake Detection

要約

最近、ディープフェイクは、ソーシャルメディアのデジタルフォレンジックにおけるセキュリティとプライバシーに関する懸念から、世間の注目を集めています。
インターネット上で拡散しているディープフェイク動画がよりリアルになるにつれて、従来の検出技術では本物と偽物を区別できなくなりました。
既存のディープラーニング手法のほとんどは、主に、畳み込みニューラルネットワークをバックボーンとして使用して、顔画像内の局所的な特徴と関係に焦点を当てています。
ただし、ディープフェイク検出のための十分な一般情報をモデルトレーニングで学習するには、局所的な特徴と関係が不十分です。
したがって、既存の Deepfake 検出方法は、検出性能をさらに向上させるためのボトルネックに達しています。
この問題に対処するために、決定的な画像機能をローカルとグローバルの両方に組み込む深い畳み込みトランスフォーマーを提案します。
具体的には、畳み込みプーリングと再注意を適用して、抽出された機能を充実させ、有効性を高めます。
また、これまであまり議論してこなかった画像キーフレームをモデル学習に採用し、性能向上を図り、動画圧縮によるキーフレームと通常画像フレームの特徴量のずれを可視化します。
最後に、いくつかの Deepfake ベンチマークデータセットで広範な実験を行い、転送可能性を示します。
提案されたソリューションは、データセット内およびデータセット間の実験の両方で、いくつかの最先端のベースラインよりも一貫して優れています。

要約(オリジナル)

Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between the real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improving the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance the efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.

arxiv情報

著者	Tianyi Wang,Harry Cheng,Kam Pui Chow,Liqiang Nie
発行日	2022-09-12 15:05:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep Convolutional Pooling Transformer for Deepfake Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー