MMVP: Motion-Matrix-based Video Prediction

要約

ビデオ予測の中心的な課題は、システムが画像フレームからオブジェクトの将来の動きを推論し、同時にフレーム間でオブジェクトの外観の一貫性を維持する必要があるところにあります。
この取り組みでは、この課題に取り組むために、エンドツーエンドのトレーニング可能な 2 ストリームビデオ予測フレームワークであるモーションマトリックスベースのビデオ予測 (MMVP) を導入しています。
通常、同じモジュールセット内で動きの予測と外観の維持を処理する以前の方法とは異なり、MMVP は、外観に依存しない動きマトリックスを構築することによって、動きと外観の情報を分離します。
動き行列は、入力フレーム内の特徴パッチの各ペアの時間的類似性を表し、MMVP の動き予測モジュールの唯一の入力です。
この設計により、ビデオ予測の精度と効率が向上し、モデルのサイズが削減されます。
広範な実験の結果、MMVP は、大幅に小さいモデルサイズ (サイズの 84% 以下) において、無視できない大きなマージン (PSNR、UCF Sports で約 1 db) により、公開データセット上の最先端のシステムよりも優れたパフォーマンスを発揮することが実証されました。

要約(オリジナル)

A central challenge of video prediction lies where the system has to reason the objects’ future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually handle motion prediction and appearance maintenance within the same set of modules, MMVP decouples motion and appearance information by constructing appearance-agnostic motion matrices. The motion matrices represent the temporal similarity of each and every pair of feature patches in the input frames, and are the sole input of the motion prediction module in MMVP. This design improves video prediction in both accuracy and efficiency, and reduces the model size. Results of extensive experiments demonstrate that MMVP outperforms state-of-the-art systems on public data sets by non-negligible large margins (about 1 db in PSNR, UCF Sports) in significantly smaller model sizes (84% the size or smaller).

arxiv情報

著者	Yiqi Zhong,Luming Liang,Ilya Zharkov,Ulrich Neumann
発行日	2023-08-31 00:51:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MMVP: Motion-Matrix-based Video Prediction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー