Self-supervised co-salient object detection via feature correspondence at multiple scales

要約

私たちの論文では、セグメンテーションアノテーションを必要とせずに、画像グループ内の共起顕著物体 (CoSOD) を検出するための、新しい 2 段階の自己教師ありアプローチを紹介します。
パッチレベルの情報（パッチ記述子のクラスタリングなど）や計算量の多い市販の CoSOD コンポーネントのみに依存する既存の教師なし手法とは異なり、当社の軽量モデルはパッチレベルと領域レベルの両方で特徴の対応を活用し、予測パフォーマンスを大幅に向上させます。
最初の段階では、画像全体にわたるローカルなパッチレベルの特徴の対応関係を計算することで、共通顕著領域を検出する自己教師ありネットワークをトレーニングします。
信頼性に基づく適応しきい値処理を使用してセグメンテーション予測を取得します。
次の段階では、平均化された特徴表現が (前の段階の) すべてのクロスアテンションマップで平均化された前景特徴表現と類似していない検出された領域 (各画像内) を削除することによって、これらの中間セグメンテーションを洗練します。
3 つの CoSOD ベンチマークデータセットに対する広範な実験により、私たちの自己教師ありモデルが対応する最先端のモデルを大幅に上回るパフォーマンスを示しています (たとえば、CoCA データセットでは、私たちのモデルは教師なしの SOTA と比較して 13.7% の F 値ゲインを持っています)
CoSOD モデル)。
注目すべきことに、私たちの自己教師ありモデルは、3 つのテストデータセット上でいくつかの最近の完全教師あり CoSOD モデルよりも優れたパフォーマンスを発揮します (たとえば、CoCA データセットでは、私たちのモデルは最近の教師あり CoSOD モデルと比較して F 値が 4.6% 向上しています)。

要約(オリジナル)

Our paper introduces a novel two-stage self-supervised approach for detecting co-occurring salient objects (CoSOD) in image groups without requiring segmentation annotations. Unlike existing unsupervised methods that rely solely on patch-level information (e.g. clustering patch descriptors) or on computation heavy off-the-shelf components for CoSOD, our lightweight model leverages feature correspondences at both patch and region levels, significantly improving prediction performance. In the first stage, we train a self-supervised network that detects co-salient regions by computing local patch-level feature correspondences across images. We obtain the segmentation predictions using confidence-based adaptive thresholding. In the next stage, we refine these intermediate segmentations by eliminating the detected regions (within each image) whose averaged feature representations are dissimilar to the foreground feature representation averaged across all the cross-attention maps (from the previous stage). Extensive experiments on three CoSOD benchmark datasets show that our self-supervised model outperforms the corresponding state-of-the-art models by a huge margin (e.g. on the CoCA dataset, our model has a 13.7% F-measure gain over the SOTA unsupervised CoSOD model). Notably, our self-supervised model also outperforms several recent fully supervised CoSOD models on the three test datasets (e.g., on the CoCA dataset, our model has a 4.6% F-measure gain over a recent supervised CoSOD model).

arxiv情報

著者	Souradeep Chakraborty,Dimitris Samaras
発行日	2024-03-27 16:48:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-supervised co-salient object detection via feature correspondence at multiple scales

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー