Video-based Person Re-identification with Long Short-Term Representation Learning

要約

ビデオベースの人物再識別 (V-ReID) は、重複していないカメラで撮影された生のビデオから特定の人物を取得することを目的としています。
基本的なタスクとして、多くのマルチメディアおよびコンピュータビジョンアプリケーションを普及させます。
しかし、人や場面の違いにより、高いパフォーマンスを実現するにはまだ乗り越えなければならない壁がたくさんあります。
この研究では、堅牢なビデオ表現には人物の長期情報と短期情報の両方が重要であることに気づきました。
そこで、効果的な V-ReID のために、Long Short-Term Representation Learning (LSTRL) と呼ばれる新しい深層学習フレームワークを提案します。
より具体的には、長期表現を抽出するために、4 つの粒度外観エクストラクター (MAE) を提案します。MAE では、複数のフレームにわたって 4 つの粒度外観が効果的にキャプチャされます。
一方、短期表現を抽出するために、連続したフレームから相互の動き情報を効率的に抽出する双方向動き推定器(BME)を提案します。
MAE と BME はプラグアンドプレイであり、既存のネットワークに簡単に挿入して、効率的に機能を学習できます。
その結果、V-ReIDの特徴表現能力が大幅に向上します。
広く使用されている 3 つのベンチマークに関する広範な実験により、私たちが提案したアプローチがほとんどの最先端技術よりも優れたパフォーマンスを実現できることが示されました。

要約(オリジナル)

Video-based person Re-Identification (V-ReID) aims to retrieve specific persons from raw videos captured by non-overlapped cameras. As a fundamental task, it spreads many multimedia and computer vision applications. However, due to the variations of persons and scenes, there are still many obstacles that must be overcome for high performance. In this work, we notice that both the long-term and short-term information of persons are important for robust video representations. Thus, we propose a novel deep learning framework named Long Short-Term Representation Learning (LSTRL) for effective V-ReID. More specifically, to extract long-term representations, we propose a Multi-granularity Appearance Extractor (MAE), in which four granularity appearances are effectively captured across multiple frames. Meanwhile, to extract short-term representations, we propose a Bi-direction Motion Estimator (BME), in which reciprocal motion information is efficiently extracted from consecutive frames. The MAE and BME are plug-and-play and can be easily inserted into existing networks for efficient feature learning. As a result, they significantly improve the feature representation ability for V-ReID. Extensive experiments on three widely used benchmarks show that our proposed approach can deliver better performances than most state-of-the-arts.

arxiv情報

著者	Xuehu Liu,Pingping Zhang,Huchuan Lu
発行日	2023-08-07 16:22:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Video-based Person Re-identification with Long Short-Term Representation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー