‘Glitch in the Matrix!’: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

要約

【タイトル】『マトリックスの中の欠陥！』：コンテンツ駆動型音声・映像偽装検出とそのローカリゼーションのための大規模ベンチマーク

【要約】

– 大半のディープフェイク検出方法は、主に顔部の空間的および/または時空間的な変更を検出することに焦点を当てている。
– これは、既存のベンチマークデータセットがほとんど視覚的な変更しか含まないためです。
– しかし、先進的なディープフェイクは、コンテンツの意味を完全に変えることができる小さなオーディオまたはオーディオ・ビジュアルの操作を含む場合があります。
– そのため、本研究は、戦略的なコンテンツ駆動型オーディオ、ビジュアル、オーディオ・ビジュアルの操作で構成される新しいデータセット、局所化されたオーディオビジュアルディープフェイク(LAV-DF) を提案・ベンチマーク化しています。
– 提案されたベースラインメソッドである、Boundary Aware Temporal Forgery Detection (BA-TFD) は、3D畳み込みニューラルネットワークベースのアーキテクチャであり、マルチモーダルマニピュレーションを効果的に捉えています。
– さらに、Multiscale Vision Transformerをバックボーンに置き、コントラスティブ、フレーム分類、境界マッチング、マルチモーダル境界マッチングの損失関数でトレーニングプロセスをガイドすることで、ベースラインメソッドを改善しました(BA-TFD+)。
– 定量的分析は、BA-TFD +が新しく提案したデータセットを含むいくつかのベンチマークデータセットを使用して、時空間的な偽装ローカリゼーションおよびディープフェイク検出タスクで優れた性能を発揮することを示しています。
– データセット、モデル、コードは、https://github.com/ControlNet/LAV-DF で入手可能です。

要約(オリジナル)

Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes. This is because available benchmark datasets contain mostly visual-only modifications. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which efficiently captures multimodal manipulations. We further improve (i.e. BA-TFD+) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA- TFD+ on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF.

arxiv情報

著者	Zhixi Cai,Shreya Ghosh,Abhinav Dhall,Tom Gedeon,Kalin Stefanov,Munawar Hayat
発行日	2023-05-05 05:33:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

‘Glitch in the Matrix!’: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー