Video alignment using unsupervised learning of local and global features

要約

タイトル：
– ローカルとグローバルな特徴量の自己学習を用いたビデオのアラインメント

要約：
– この論文では、類似したアクションを含む2つのビデオのフレームをマッチングするビデオアラインメントの問題に取り組みます。
– ビデオアラインメントの主な課題は、2つのビデオの実行プロセスと外観の違いにもかかわらず、正確な対応関係を確立する必要があることです。
– この問題に対して、フレームのグローバルとローカルな特徴を使用した自己学習法を導入します。
– 3つの機械視ツールを用いて各ビデオフレームに効果的な特徴を導入し、それらを処理して多次元時系列に結合します。
– 得られた時系列は、Diagonalized Dynamic Time Warping (DDTW)と呼ばれる新しいバージョンの動的時間歪みを用いて、同じアクションのビデオをアラインメントするために使用されます。
– 我々のアプローチの主な利点は、トレーニングが必要ないことであり、新しいタイプのアクションにも適用可能であり、トレーニングサンプルを収集する必要がありません。
– 評価のために、Penn actionデータセット上でビデオ同期と位相分類タスクを考慮し、ビデオ同期タスクの効果的な評価のためにEnclosed Area Error (EAE)という新しいメトリックを提供します。
– 結果は、TCCや他の自己教師あり及び教師なしの方法などの先行技術を上回ることを示しています。

要約(オリジナル)

In this paper, we tackle the problem of video alignment, the process of matching the frames of a pair of videos containing similar actions. The main challenge in video alignment is that accurate correspondence should be established despite the differences in the execution processes and appearances between the two videos. We introduce an unsupervised method for alignment that uses global and local features of the frames. In particular, we introduce effective features for each video frame by means of three machine vision tools: person detection, pose estimation, and VGG network. Then the features are processed and combined to construct a multidimensional time series that represent the video. The resulting time series are used to align videos of the same actions using a novel version of dynamic time warping named Diagonalized Dynamic Time Warping(DDTW). The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it. For evaluation, we considered video synchronization and phase classification tasks on the Penn action dataset. Also, for an effective evaluation of the video synchronization task, we present a new metric called Enclosed Area Error(EAE). The results show that our method outperforms previous state-of-the-art methods, such as TCC and other self-supervised and supervised methods.

arxiv情報

著者	Niloofar Fakhfour,Mohammad ShahverdiKondori,Hoda Mohammadzade
発行日	2023-04-13 22:20:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Video alignment using unsupervised learning of local and global features

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー