DUT: Learning Video Stabilization by Simply Watching Unstable Videos

要約

これまでの深層学習ベースのビデオスタビライザーは、学習のために大規模なペアとなる不安定と安定したビデオを必要とし、これは収集が困難である。一方、従来の軌跡ベースのスタビライザーは、タスクをいくつかのサブタスクに分割し、その後に取り組むため、手作業で作成した特徴の使用に関して、テクスチャレスやオクルード領域で壊れやすいという欠点がある。本論文では、DNNの表現力を活用しながら、従来のスタビライザーから分割統治するアイデアを借りて、実世界のシナリオの課題を処理するために、深い教師なし学習方法でビデオの安定化問題に取り組むことを試みます。技術的には、DUTは軌道推定ステージと軌道平滑化ステージで構成される。軌跡推定ステージでは、まずキーポイントの動きを推定し、新しいマルチホモグラフィ推定戦略と動き洗練ネットワークによってそれぞれグリッドの動きを初期化および洗練し、時間的関連付けによってグリッドベースの軌跡を取得する。軌跡の平滑化段階では、軌跡の平滑化のための動的平滑化カーネルを予測する新しいネットワークを考案し、異なる動的パターンの軌跡にうまく適応させることができる。我々は、キーポイントとグリッド頂点の空間的・時間的な一貫性を利用して学習目標を設定し、教師なし学習スキームを実現する。一般的なベンチマークを用いた実験の結果、DUTは定性的にも定量的にも最新の手法を凌駕していることが示された。ソースコードは https://github.com/Annbless/DUTCode で公開されている。

要約(オリジナル)

Previous deep learning-based video stabilizers require a large scale of paired unstable and stable videos for training, which are difficult to collect. Traditional trajectory-based stabilizers, on the other hand, divide the task into several sub-tasks and tackle them subsequently, which are fragile in textureless and occluded regions regarding the usage of hand-crafted features. In this paper, we attempt to tackle the video stabilization problem in a deep unsupervised learning manner, which borrows the divide-and-conquer idea from traditional stabilizers while leveraging the representation power of DNNs to handle the challenges in real-world scenarios. Technically, DUT is composed of a trajectory estimation stage and a trajectory smoothing stage. In the trajectory estimation stage, we first estimate the motion of keypoints, initialize and refine the motion of grids via a novel multi-homography estimation strategy and a motion refinement network, respectively, and get the grid-based trajectories via temporal association. In the trajectory smoothing stage, we devise a novel network to predict dynamic smoothing kernels for trajectory smoothing, which can well adapt to trajectories with different dynamic patterns. We exploit the spatial and temporal coherence of keypoints and grid vertices to formulate the training objectives, resulting in an unsupervised training scheme. Experiment results on public benchmarks show that DUT outperforms state-of-the-art methods both qualitatively and quantitatively. The source code is available at https://github.com/Annbless/DUTCode.

arxiv情報

著者	Yufei Xu,Jing Zhang,Stephen J. Maybank,Dacheng Tao
発行日	2022-06-09 08:30:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DUT: Learning Video Stabilization by Simply Watching Unstable Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー