Neural Congealing: Aligning Images to a Joint Semantic Atlas

要約

我々は、与えられた画像の集合から意味的に共通するコンテンツを検出し、共同で整列させるためのゼロショットの自己教師付きフレームワークであるNeural Congealingを紹介する。我々のアプローチは、事前に学習されたDINO-ViTの特徴を利用し、以下を学習する。(i)ジョイントセマンティックアトラス（入力セットにおけるDINO-ViT特徴のモードを捉えた2Dグリッド）、(ii)統一アトラスから各入力画像への密なマッピングを学習するために、事前に学習したDINO-ViT特徴の力を利用する。我々は、画像セットごとにアトラス表現とマッピングを最適化する新しいロバストな自己教師付きフレームワークを導き出し、入力としていくつかの実世界の画像のみを必要とし、追加の入力情報（例えば、分割マスク）を必要としない。注目すべきは、外観、ポーズ、背景の乱れや他の邪魔なオブジェクトの激しい変化の下で、共有コンテンツのみを考慮するように、損失と学習パラダイムを設計していることである。我々は、様々な領域が混在する画像セット（例えば、猫の彫刻と芸術作品を描いた画像の位置合わせ）、関連するが異なるオブジェクトカテゴリを描いたセット（例えば、犬と虎）、大規模な学習データが不足している領域（例えば、コーヒーカップ）を含む多くの困難な画像セットで結果を実証する。本手法を徹底的に評価し、テスト時最適化手法が、大規模データセットによる大規模な学習を必要とする最新の手法と比較して、良好なパフォーマンスを示すことを示す。

要約(オリジナル)

We present Neural Congealing — a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas — a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images. We derive a new robust self-supervised framework that optimizes the atlas representation and mappings per image set, requiring only a few real-world images as input without any additional input information (e.g., segmentation masks). Notably, we design our losses and training paradigm to account only for the shared content under severe variations in appearance, pose, background clutter or other distracting objects. We demonstrate results on a plethora of challenging image sets including sets of mixed domains (e.g., aligning images depicting sculpture and artwork of cats), sets depicting related yet different object categories (e.g., dogs and tigers), or domains for which large-scale training data is scarce (e.g., coffee mugs). We thoroughly evaluate our method and show that our test-time optimization approach performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.

arxiv情報

著者	Dolev Ofri-Amar,Michal Geyer,Yoni Kasten,Tali Dekel
発行日	2023-02-08 09:26:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Neural Congealing: Aligning Images to a Joint Semantic Atlas

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー