A Generalized Framework for Video Instance Segmentation

要約

複雑で遮られたシーケンスを含む長いビデオの処理は、最近、ビデオインスタンスセグメンテーション (VIS) コミュニティの新たな課題として浮上しています。
ただし、既存の方法には、この課題への対処に限界があります。
現在のアプローチにおける最大のボトルネックは、トレーニングと推論の間の不一致であると私たちは主張します。
このギャップを効果的に埋めるために、VIS の一般化されたフレームワーク、つまり GenVIS を提案します。これは、複雑なアーキテクチャを設計したり、追加の後処理を必要としたりすることなく、困難なベンチマークで最先端のパフォーマンスを実現します。
GenVIS の主な貢献は学習戦略です。これには、新しいターゲットラベルの割り当てによるシーケンシャル学習のためのクエリベースのトレーニングパイプラインが含まれます。
さらに、以前の状態から効果的に情報を取得するメモリを導入します。
個別のフレームまたはクリップ間の関係の構築に焦点を当てた新しい視点のおかげで、GenVIS はオンラインおよびセミオンラインの両方で柔軟に実行できます。
人気のある VIS ベンチマークでアプローチを評価し、YouTube-VIS 2019/2021/2022 および Occluded VIS (OVIS) で最先端の結果を達成しています。
特に、当社は long VIS ベンチマーク (OVIS) で最新技術を大幅に上回り、ResNet-50 バックボーンで 5.6 AP を改善しています。
コードは https://github.com/miranheo/GenVIS で入手できます。

要約(オリジナル)

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

arxiv情報

著者	Miran Heo,Sukjun Hwang,Jeongseok Hyun,Hanjung Kim,Seoung Wug Oh,Joon-Young Lee,Seon Joo Kim
発行日	2023-03-24 15:26:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Generalized Framework for Video Instance Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー