Segment Anything Meets Point Tracking

要約

Segment Anything Model (SAM) は、マスクを生成するためのポイントなどの対話型プロンプトを採用し、強力なゼロショット画像セグメンテーションモデルとしての地位を確立しています。
このペーパーでは、SAM の機能を拡張してダイナミックビデオ内のあらゆるものを追跡およびセグメント化する方法である SAM-PT について説明します。
SAM-PT は、マスク生成に堅牢かつスパースなポイント選択および伝播技術を活用しており、SAM ベースのセグメンテーショントラッカーが、DAVIS、YouTube-VOS、MOSE などの一般的なビデオオブジェクトセグメンテーションベンチマーク全体で強力なゼロショットパフォーマンスを実現できることを示しています。
従来のオブジェクト中心のマスク伝播戦略と比較して、当社はポイント伝播を独自に使用して、オブジェクトのセマンティクスにとらわれないローカル構造情報を活用します。
ゼロショットのオープンワールド未確認ビデオオブジェクト (UVO) ベンチマークでの直接評価を通じて、ポイントベースの追跡のメリットを強調します。
私たちのアプローチをさらに強化するために、ポイントの初期化に K-Medoids クラスタリングを利用し、正と負の両方のポイントを追跡してターゲットオブジェクトを明確に区別します。
また、マスクを改良するために複数のマスクデコードパスを採用し、追跡精度を向上させるためにポイントの再初期化戦略を考案します。
私たちのコードは、さまざまなポイントトラッカーとビデオセグメンテーションベンチマークを統合しており、https://github.com/SysCV/sam-pt でリリースされます。

要約(オリジナル)

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, employing interactive prompts such as points to generate masks. This paper presents SAM-PT, a method extending SAM’s capability to tracking and segmenting anything in dynamic videos. SAM-PT leverages robust and sparse point selection and propagation techniques for mask generation, demonstrating that a SAM-based segmentation tracker can yield strong zero-shot performance across popular video object segmentation benchmarks, including DAVIS, YouTube-VOS, and MOSE. Compared to traditional object-centric mask propagation strategies, we uniquely use point propagation to exploit local structure information that is agnostic to object semantics. We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark. To further enhance our approach, we utilize K-Medoids clustering for point initialization and track both positive and negative points to clearly distinguish the target object. We also employ multiple mask decoding passes for mask refinement and devise a point re-initialization strategy to improve tracking accuracy. Our code integrates different point trackers and video segmentation benchmarks and will be released at https://github.com/SysCV/sam-pt.

arxiv情報

著者	Frano Rajič,Lei Ke,Yu-Wing Tai,Chi-Keung Tang,Martin Danelljan,Fisher Yu
発行日	2023-07-03 17:58:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Segment Anything Meets Point Tracking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー