Tracking Anything with Decoupled Video Segmentation

要約

ビデオセグメンテーションのトレーニングデータに注釈を付けるにはコストがかかります。
これは、特に語彙が多い設定において、新しいビデオセグメンテーションタスクへのエンドツーエンドアルゴリズムの拡張を妨げます。
個々のタスクごとにビデオデータのトレーニングを行わずに「何でも追跡」するために、タスク固有の画像レベルのセグメンテーションとクラス/タスクに依存しない双方向時間伝播で構成される分離ビデオセグメンテーションアプローチ (DEVA) を開発します。
この設計により、必要なのはターゲットタスクの画像レベルモデル (トレーニングのコストが低い) と、一度トレーニングされタスク間で一般化される普遍的な時間伝播モデルだけです。
これら 2 つのモジュールを効果的に組み合わせるために、異なるフレームからのセグメンテーション仮説を (半) オンラインで融合する双方向伝播を使用して、一貫したセグメンテーションを生成します。
我々は、この分離された定式化が、大語彙のビデオパノプティックセグメンテーション、オープンワールドビデオセグメンテーション、参照ビデオセグメンテーション、教師なしビデオオブジェクトセグメンテーションなどのデータが不足しているいくつかのタスクにおいて、エンドツーエンドのアプローチと比べて有利であることを示します。
コードは https://hkchengrex.github.io/Tracking-Anything-with-DEVA で入手できます。

要約(オリジナル)

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To ‘track anything’ without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

arxiv情報

著者	Ho Kei Cheng,Seoung Wug Oh,Brian Price,Alexander Schwing,Joon-Young Lee
発行日	2023-09-07 17:59:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tracking Anything with Decoupled Video Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー