OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

要約

ビデオオブジェクトセグメンテーション (RVOS) の参照は、人間の指示に従ってビデオ内のオブジェクトをセグメント化することを目的としています。
現在の最先端の方法はオフラインパターンに分類され、各クリップが独立してテキスト埋め込みと対話して、クロスモーダルな理解を実現します。
通常、オフラインパターンが RVOS に必要であることを示しますが、各クリップ内の時間的な関連性は限定されています。
この研究では、これまでのオフラインの信念を打ち破り、明示的なクエリ伝播を使用した OnlineRefer という名前のシンプルかつ効果的なオンラインモデルを提案します。
具体的には、私たちのアプローチは、現在のフレームの予測の参照の精度と容易さを向上させる前に、セマンティック情報と位置を収集するターゲットキューを活用します。
さらに、ビデオベースのバックボーンと互換性を持たせるために、オンラインモデルをセミオンラインフレームワークに一般化します。
私たちの方法の有効性を示すために、Refer-Youtube-VOS、Refer-DAVIS17、A2D-Sentences、および JHMDB-Sentences の 4 つのベンチマークで評価します。
Swin-L バックボーンを備えた OnlineRefer は、余分な機能を追加せずに、Refer-Youtube-VOS および Refer-DAVIS17 で 63.5 J&F と 64.8 J&F を達成し、他のすべてのオフライン方法を上回ります。

要約(オリジナル)

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction. Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding for cross-modal understanding. They usually present that the offline pattern is necessary for RVOS, yet model limited temporal association within each clip. In this work, we break up the previous offline belief and propose a simple yet effective online model using explicit query propagation, named OnlineRefer. Specifically, our approach leverages target cues that gather semantic information and position prior to improve the accuracy and ease of referring predictions for the current frame. Furthermore, we generalize our online model into a semi-online framework to be compatible with video-based backbones. To show the effectiveness of our method, we evaluate it on four benchmarks, \ie, Refer-Youtube-VOS, Refer-DAVIS17, A2D-Sentences, and JHMDB-Sentences. Without bells and whistles, our OnlineRefer with a Swin-L backbone achieves 63.5 J&F and 64.8 J&F on Refer-Youtube-VOS and Refer-DAVIS17, outperforming all other offline methods.

arxiv情報

著者	Dongming Wu,Tiancai Wang,Yuang Zhang,Xiangyu Zhang,Jianbing Shen
発行日	2023-07-18 15:43:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー