We’re Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline

要約

セマンティックセグメンテーション (DAS) の教師なしドメイン適応では、ラベル付きソースドメインの画像でトレーニングされたモデルをラベルなしターゲットドメインに適応させるための研究が数多く行われてきました。
従来の研究の大部分は、これをフレームレベルの画像 DAS 問題として研究していましたが、いくつかのビデオ DAS 研究では、隣接するフレームに存在する時間信号をさらに活用しようとしました。
ただし、Video-DAS の研究では、歴史的に、最小限の相互ベンチマークを使用して、Image-DAS とは異なる一連のベンチマークを研究してきました。
この研究では、このギャップに対処します。
驚くべきことに、(1) データとモデルのアーキテクチャを注意深く制御した後でも、最先端の Image-DAS メソッド (HRDA および HRDA+MIC) は、確立された Video-DAS ベンチマーク (+14.5) で Video-DAS メソッドを上回るパフォーマンスを示します。
Viper$\rightarrow$CityscapesSeq では +19.0 mIoU、Synthia$\rightarrow$CityscapesSeq では +19.0 mIoU)、(2) Image-DAS 技術と Video-DAS 技術の単純な組み合わせは、データセット全体でわずかな改善にしかつながりません。
Image-DAS と Video-DAS の間でのサイロ化した進行を避けるために、共通のベンチマークで Video-DAS メソッドと Image-DAS メソッドの包括的なセットをサポートするコードベースをオープンソースにしました。
コードは https://github.com/SimarKareer/UnifiedVideoDA で入手できます

要約(オリジナル)

There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA

arxiv情報

著者	Simar Kareer,Vivek Vijaykumar,Harsh Maheshwari,Prithvijit Chattopadhyay,Judy Hoffman,Viraj Prabhu
発行日	2024-02-06 18:35:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

We’re Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー