X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion

要約

コピーアンドペーストは、インスタンスセグメンテーションのためのシンプルで効果的なデータ拡張戦略です。
オブジェクトインスタンスを新しい背景画像にランダムに貼り付けることで、新しいトレーニングデータを無料で作成し、特にまれなオブジェクトカテゴリのセグメンテーションパフォーマンスを大幅に向上させます。
コピーアンドペーストで使用される多様で高品質なオブジェクトインスタンスはパフォーマンスの向上につながりますが、以前の作業では、人間が注釈を付けたインスタンスセグメンテーションデータセットから、または 3D オブジェクトモデルからレンダリングされたオブジェクトインスタンスを利用しており、どちらのアプローチもスケールアップして取得するには費用がかかりすぎます。
良い多様性。
このホワイトペーパーでは、新たに登場したゼロショット認識モデル (CLIP など) と text2image モデル (StableDiffusion など) の力を利用して、大規模なコピーアンドペーストを再検討します。
text2image モデルを使用して画像を生成するか、ゼロショット認識モデルを使用してさまざまなオブジェクトカテゴリの騒々しくクロールされた画像をフィルタリングすることが、Copy-Paste を真にスケーラブルにする実行可能な方法であることを初めて示しました。
このような成功を実現するために、体系的な研究が行われる「X-Paste」と呼ばれるデータ取得および処理フレームワークを設計します。
LVIS データセットでは、X-Paste は Swin-L をバックボーンとして強力なベースラインである CenterNet2 を大幅に改善します。
具体的には、すべてのクラスで +2.6 ボックス AP と +2.1 マスク AP を獲得し、ロングテールクラスで +6.8 ボックス AP +6.5 マスク AP をさらに大幅に獲得します。

要約(オリジナル)

Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ‘X-Paste’, upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP +6.5 mask AP on long-tail classes.

arxiv情報

著者	Hanqing Zhao,Dianmo Sheng,Jianmin Bao,Dongdong Chen,Dong Chen,Fang Wen,Lu Yuan,Ce Liu,Wenbo Zhou,Qi Chu,Weiming Zhang,Nenghai Yu
発行日	2022-12-07 18:59:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー