A Generalized Framework for Video Instance Segmentation

要約

最近、複雑で遮られたシーケンスの長いビデオを処理することが、ビデオインスタンスセグメンテーション (VIS) コミュニティの新たな課題として浮上しています。
ただし、既存の方法では、課題への対処に限界があります。
現在のアプローチにおける最大のボトルネックは、トレーニングと推論の間の不一致であると私たちは主張します。
このギャップを効果的に埋めるために、\textbf{VIS} の \textbf{Generalized フレームワーク、つまり \textbf{GenVIS} を提案します。このフレームワークは、複雑なアーキテクチャや追加の設計を行うことなく、困難なベンチマークで最先端のパフォーマンスを実現します。
後処理。
GenVIS の主な貢献は、学習戦略です。
具体的には、新しいターゲットラベル割り当て戦略を使用して、シーケンシャル学習用のクエリベースのトレーニングパイプラインを提案します。
残りのギャップをさらに埋めるために、以前の状態から効果的に情報を取得するメモリを導入します。
個別のフレームまたはクリップ間の関係の構築に焦点を当てた新しい視点のおかげで、GenVIS はオンラインおよびセミオンラインの両方で柔軟に実行できます。
人気のある VIS ベンチマークである YouTube-VIS 2019/2021/2022 および Occluded VIS (OVIS) でメソッドを評価し、最先端の結果を達成しています。
特に、当社は long VIS ベンチマーク (OVIS) で最新技術を大幅に上回り、ResNet-50 バックボーンで 5.6 AP を改善しています。
コードは https://github.com/miranheo/GenVIS で入手できます。

要約(オリジナル)

Recently, handling long videos of complex and occluded sequences has emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods show limitations in addressing the challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between the training and the inference. To effectively bridge the gap, we propose a \textbf{Gen}eralized framework for \textbf{VIS}, namely \textbf{GenVIS}, that achieves the state-of-the-art performance on challenging benchmarks without designing complicated architectures or extra post-processing. The key contribution of GenVIS is the learning strategy. Specifically, we propose a query-based training pipeline for sequential learning, using a novel target label assignment strategy. To further fill the remaining gaps, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our methods on popular VIS benchmarks, YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS), achieving state-of-the-art results. Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code will be available at https://github.com/miranheo/GenVIS.

arxiv情報

著者	Miran Heo,Sukjun Hwang,Jeongseok Hyun,Hanjung Kim,Seoung Wug Oh,Joon-Young Lee,Seon Joo Kim
発行日	2022-11-16 11:17:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Generalized Framework for Video Instance Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー