Coarse-to-Fine Amodal Segmentation with Shape Prior

要約

アモーダルオブジェクトセグメンテーションは、オブジェクトの可視部分と隠れた部分の両方をセグメント化することを含む、困難なタスクです。
この論文では、アモーダルセグメンテーションを段階的にモデル化することでこの問題に対処する、Coarse-to-Fine Segmentation (C2F-Seg) と呼ばれる新しいアプローチを提案します。
C2F-Seg は、最初に学習空間をピクセルレベルの画像空間からベクトル量子化された潜在空間に縮小します。
これにより、長距離の依存関係をより適切に処理し、視覚的特徴と可視セグメントから粗粒のアモーダルセグメントを学習できるようになります。
ただし、この潜在空間にはオブジェクトに関する詳細な情報が欠けているため、正確なセグメンテーションを直接提供することが困難になります。
この問題に対処するために、私たちは、きめの細かい情報を注入し、視覚的特徴と粗く予測されたセグメンテーションに基づいて、より正確なアモーダルオブジェクトセグメンテーションを提供する畳み込みリファインモジュールを提案します。
アモーダルオブジェクトセグメンテーションの研究を支援するために、画像とビデオの両方のアモーダルオブジェクトセグメンテーションに使用できる、MOViD-Amodal (MOViD-A) という名前の合成アモーダルデータセットを作成します。
KINS と COCO-A という 2 つのベンチマークデータセットでモデルを徹底的に評価します。
私たちの実証結果は、C2F-Seg の優位性を実証しています。
さらに、FISHBOWL と私たちが提案する MOViD-A 上のビデオアモーダルオブジェクトセグメンテーションタスクに対するアプローチの可能性を示します。
プロジェクトページ: http://jianxgao.github.io/C2F-Seg。

要約(オリジナル)

Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object. In this paper, we propose a novel approach, called Coarse-to-Fine Segmentation (C2F-Seg), that addresses this problem by progressively modeling the amodal segmentation. C2F-Seg initially reduces the learning space from the pixel-level image space to the vector-quantized latent space. This enables us to better handle long-range dependencies and learn a coarse-grained amodal segment from visual features and visible segments. However, this latent space lacks detailed information about the object, which makes it difficult to provide a precise segmentation directly. To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation. To help the studies of amodal object segmentation, we create a synthetic amodal dataset, named as MOViD-Amodal (MOViD-A), which can be used for both image and video amodal object segmentation. We extensively evaluate our model on two benchmark datasets: KINS and COCO-A. Our empirical results demonstrate the superiority of C2F-Seg. Moreover, we exhibit the potential of our approach for video amodal object segmentation tasks on FISHBOWL and our proposed MOViD-A. Project page at: http://jianxgao.github.io/C2F-Seg.

arxiv情報

著者	Jianxiong Gao,Xuelin Qian,Yikai Wang,Tianjun Xiao,Tong He,Zheng Zhang,Yanwei Fu
発行日	2023-08-31 15:56:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Coarse-to-Fine Amodal Segmentation with Shape Prior

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー