TarViS: A Unified Approach for Target-based Video Segmentation

要約

ビデオセグメンテーションの一般的な領域は、現在、複数のベンチマークにまたがる異なるタスクに細分化されている。しかし、現在の手法はタスクに特化したものであり、他のタスクに汎用的に適用することはできない。本論文では、マルチタスク能力を持つ最近のアプローチに触発され、ビデオ内の任意に定義された「ターゲット」のセットを分割することを必要とするあらゆるタスクに適用可能な、新規かつ統一的なネットワークアーキテクチャであるTarViSを提案する。本アプローチは、タスクがターゲットをどのように定義するかに関して柔軟であり、ターゲットを抽象的な「クエリ」としてモデル化し、それを用いてピクセル精度のターゲットマスクを予測することができる。一つのTarViSモデルは、異なるタスクにまたがるデータセットに対して共同で学習することができ、タスク固有の再学習なしに推論中にタスク間のホットスワップが可能である。その有効性を示すために、我々はTarViSを4つの異なるタスク、すなわち、ビデオインスタンスセグメンテーション（VIS）、ビデオパノプティックセグメンテーション（VPS）、ビデオオブジェクトセグメンテーション（VOS）、ポイントエンプラガイドトラッキング（PET）に適用した。我々の統一された共同学習モデルは、これら4つのタスクにまたがる5/7のベンチマークで最先端の性能を達成し、残りの2つのベンチマークでは競争力のある性能を達成した。

要約(オリジナル)

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined ‘targets’ in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract ‘queries’ which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET). Our unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two.

arxiv情報

著者	Ali Athar,Alexander Hermans,Jonathon Luiten,Deva Ramanan,Bastian Leibe
発行日	2023-01-06 18:59:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

TarViS: A Unified Approach for Target-based Video Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー