RAP-SAM: Towards Real-Time All-Purpose Segment Anything

要約

トランスフォーマーアーキテクチャによって進化したビジョンファウンデーションモデル (VFM) は、パフォーマンスと汎化能力において目覚ましい進歩を遂げています。
Segment Anything Model (SAM) は、一般化されたセグメンテーションを実現できる注目すべきモデルの 1 つです。
ただし、ほとんどの VFM はリアルタイムで実行できないため、VFM を複数の製品に移行することが困難になります。
一方、現在のリアルタイムセグメンテーションは、運転シーンでのセマンティックセグメンテーションなど、主に1つの目的を持っています。
実際のアプリケーションには多様な出力が必要であると私たちは主張します。
したがって、この作業では、リアルタイム展開で VFM を転送するための、リアルタイムの汎用セグメンテーションという名前の新しいリアルタイムセグメンテーション設定を検討します。
これには、インタラクティブセグメンテーション、パノプティックセグメンテーション、ビデオセグメンテーションという 3 つの異なるタスクが含まれています。
1 つのモデルを使用して上記のタスクをリアルタイムで達成することを目指しています。
まず、いくつかの強力なベースラインをベンチマークします。
次に、リアルタイム汎用 SAM (RAP-SAM) を紹介します。
これには、プロンプト駆動のデコードを実行するための効率的なエンコーダーと効率的なデカップリングデコーダーが含まれています。
さらに、共同トレーニングのパフォーマンスをさらに向上させるために、さまざまなトレーニング戦略と調整方法をさらに調査します。
コードとモデルは https://github.com/xushilin1/RAP-SAM/ で入手できます。

要約(オリジナル)

Advanced by transformer architecture, vision foundation models (VFMs) achieve remarkable progress in performance and generalization ability. Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation. However, most VFMs cannot run in realtime, which makes it difficult to transfer them into several products. On the other hand, current real-time segmentation mainly has one purpose, such as semantic segmentation on the driving scene. We argue that diverse outputs are needed for real applications. Thus, this work explores a new real-time segmentation setting, named all-purpose segmentation in real-time, to transfer VFMs in real-time deployment. It contains three different tasks, including interactive segmentation, panoptic segmentation, and video segmentation. We aim to use one model to achieve the above tasks in real-time. We first benchmark several strong baselines. Then, we present Real-Time All Purpose SAM (RAP-SAM). It contains an efficient encoder and an efficient decoupled decoder to perform prompt-driven decoding. Moreover, we further explore different training strategies and tuning methods to boost co-training performance further. Our code and model are available at https://github.com/xushilin1/RAP-SAM/.

arxiv情報

著者	Shilin Xu,Haobo Yuan,Qingyu Shi,Lu Qi,Jingbo Wang,Yibo Yang,Yining Li,Kai Chen,Yunhai Tong,Bernard Ghanem,Xiangtai Li,Ming-Hsuan Yang
発行日	2024-01-18 18:59:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー