MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

要約

この論文は、オブジェクトの動きを説明する文に基づいてビデオコンテンツ内のオブジェクトをセグメント化することに焦点を当てた、モーション表現に基づくビデオセグメンテーションを目指しています。
既存の参照ビデオオブジェクトデータセットは通常、顕著なオブジェクトに焦点を当てており、単一フレーム内でターゲットオブジェクトを識別できる可能性がある過剰な静的属性を含む言語表現を使用しています。
これらのデータセットは、言語ガイド付きビデオオブジェクトセグメンテーションにおけるビデオコンテンツ内の動きの重要性を軽視しています。
モーション表現を使用してビデオ内のオブジェクトを接地およびセグメント化する実現可能性を調査するために、複雑な環境でターゲットオブジェクトを示す多数のモーション表現を含む MeViS と呼ばれる大規模なデータセットを提案します。
私たちは、既存の 5 つの参照ビデオオブジェクトセグメンテーション (RVOS) 手法をベンチマークし、MeViS データセットで包括的な比較を実施しました。
この結果は、現在の RVOS 手法では動き表現に基づくビデオセグメンテーションに効果的に対処できないことを示しています。
私たちは課題をさらに分析し、提案されている MeViS データセットのベースラインアプローチを提案します。
私たちのベンチマークの目標は、複雑なビデオシーンにおけるオブジェクトセグメンテーションの主な手がかりとしてモーション表現を活用する、効果的な言語ガイド付きビデオセグメンテーションアルゴリズムの開発を可能にするプラットフォームを提供することです。
提案された MeViS データセットは https://henghuiding.github.io/MeViS でリリースされました。

要約(オリジナル)

This paper strives for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. Existing referring video object datasets typically focus on salient objects and use language expressions that contain excessive static attributes that could potentially enable the target object to be identified in a single frame. These datasets downplay the importance of motion in video content for language-guided video object segmentation. To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. We benchmarked 5 existing referring video object segmentation (RVOS) methods and conducted a comprehensive comparison on the MeViS dataset. The results show that current RVOS methods cannot effectively address motion expression-guided video segmentation. We further analyze the challenges and propose a baseline approach for the proposed MeViS dataset. The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms that leverage motion expressions as a primary cue for object segmentation in complex video scenes. The proposed MeViS dataset has been released at https://henghuiding.github.io/MeViS.

arxiv情報

著者	Henghui Ding,Chang Liu,Shuting He,Xudong Jiang,Chen Change Loy
発行日	2023-08-16 17:58:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー