SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations

要約

セマンティック対応は、最近の大規模なビジョンモデル（LVM）の進歩を通じて大きな進歩を遂げました。
これらのLVMは、ローカルセマンティクスを確実にキャプチャすることが示されていますが、セマンティックオブジェクト領域間のグローバルな幾何学的関係をキャプチャするために同じことは現在言えません。
この問題は、極端なビューの変動を伴う画像間のセマンティック対応のための信頼性の低いパフォーマンスにつながります。
この作業では、より堅牢でデータ効率の良いセマンティック対応のために、これらの幾何学的な関係をキャプチャするために、単眼の深さの推定値を活用することを目指しています。
まず、まばらに注釈付きの画像対応データセットを使用して、単眼深度推定値とLVM機能から3Dオブジェクトクラス表現を構築するためのシンプルだが効果的な方法を紹介します。
第二に、勾配降下を使用して最小化できるアライメントエネルギーを策定し、入力RGBイメージの3Dオブジェクトクラス表現とオブジェクトクラスインスタンスの間のアライメントを取得します。
私たちの方法は、挑戦的なSPAIR-71Kデータセットの複数のカテゴリで最先端のマッチング精度を達成し、PCK@0.1スコアを3つのカテゴリで10ポイント以上、全体で85.6％から88.9％に3.3ポイント増加させます。
追加のリソースとコードはhttps://dub.sh/semalign3dで入手できます。

要約(オリジナル)

Semantic correspondence made tremendous progress through the recent advancements of large vision models (LVM). While these LVMs have been shown to reliably capture local semantics, the same can currently not be said for capturing global geometric relationships between semantic object regions. This problem leads to unreliable performance for semantic correspondence between images with extreme view variation. In this work, we aim to leverage monocular depth estimates to capture these geometric relationships for more robust and data-efficient semantic correspondence. First, we introduce a simple but effective method to build 3D object-class representations from monocular depth estimates and LVM features using a sparsely annotated image correspondence dataset. Second, we formulate an alignment energy that can be minimized using gradient descent to obtain an alignment between the 3D object-class representation and the object-class instance in the input RGB-image. Our method achieves state-of-the-art matching accuracy in multiple categories on the challenging SPair-71k dataset, increasing the PCK@0.1 score by more than 10 points on three categories and overall by 3.3 points from 85.6% to 88.9%. Additional resources and code are available at https://dub.sh/semalign3d.

arxiv情報

著者	Krispin Wandel,Hesheng Wang
発行日	2025-03-28 14:14:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー