Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection

要約

ビデオ修復検出のタスクは、ビデオシーケンス内のピクセルレベルで修復された領域を公開することです。
既存の方法は通常、空間的および時間的不一致を利用することに重点を置いています。
ただし、これらの方法は通常、固定操作を使用して空間的手がかりと時間的手がかりを組み合わせるため、さまざまなシナリオでの適用性が制限されます。
この論文では、時空間手がかりを柔軟に連携させる新しい多角時間ビューピラミッドトランスフォーマー ({\em MumPy}) を紹介します。
私たちの方法では、新しく設計された多国間時間ビューエンコーダを利用して、時空間手がかりのさまざまなコラボレーションを抽出し、変形可能なウィンドウベースの時間ビューインタラクションモジュールを導入して、これらのコラボレーションの多様性を強化します。
続いて、さまざまなタイプの特徴を集約し、検出マップを生成するマルチピラミッドデコーダーを開発します。
空間的および時間的な手がかりの寄与の強さを調整することにより、私たちの方法は効果的に塗りつぶされた領域を識別できます。
私たちは既存のデータセットで手法を検証し、さらにいくつかの最新の修復手法を採用した、YouTube-VOS データセットに基づく新しい挑戦的で大規模なビデオ修復データセットを導入します。
結果は、ドメイン内およびクロスドメインの両方の評価シナリオにおける私たちの方法の優位性を示しています。

要約(オリジナル)

The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal-view Pyramid Transformer ({\em MumPy}) that collaborates spatial-temporal clues flexibly. Our method utilizes a newly designed multilateral temporal-view encoder to extract various collaborations of spatial-temporal clues and introduces a deformable window-based temporal-view interaction module to enhance the diversity of these collaborations. Subsequently, we develop a multi-pyramid decoder to aggregate the various types of features and generate detection maps. By adjusting the contribution strength of spatial and temporal clues, our method can effectively identify inpainted regions. We validate our method on existing datasets and also introduce a new challenging and large-scale Video Inpainting dataset based on the YouTube-VOS dataset, which employs several more recent inpainting methods. The results demonstrate the superiority of our method in both in-domain and cross-domain evaluation scenarios.

arxiv情報

著者	Ying Zhang,Yuezun Li,Bo Peng,Jiaran Zhou,Huiyu Zhou,Junyu Dong
発行日	2024-08-29 14:43:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー