Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models

要約

Salient Object Detection (SOD) は、シーン内の目立つ領域を識別してセグメント化することを目的としています。
従来のモデルは、正確なピクセルレベルの精度で手動で注釈を付けた擬似ラベルに依存しているため、時間がかかります。
私たちは、この課題に対処するために大規模な基礎モデルを活用することで、低コストで高精度のアノテーション手法を開発しました。
具体的には、弱い教師ありアプローチを使用して、テキストプロンプトを通じて大規模なモデルが疑似ラベルを生成するようにガイドします。
大規模なモデルでは画像の顕著な領域に効果的に焦点を当てることができないため、テキストのサブセットに手動で注釈を付けてモデルを微調整します。
擬似ラベルの正確かつ迅速な生成を可能にするこのアプローチに基づいて、新しいデータセットである BDS-TR を紹介します。
以前の DUTS-TR データセットと比較して、BDS-TR は規模がより際立っており、より幅広いカテゴリーやシーンを網羅しています。
この拡張により、より広範囲のシナリオにわたるモデルの適用性が強化され、将来の SOD 研究のためのより包括的な基礎データセットが提供されます。
さらに、動的アップサンプリングに基づくエッジデコーダを紹介します。これは、画像特徴の解像度を徐々に回復しながらオブジェクトのエッジに焦点を当てます。
5 つのベンチマークデータセットに対する包括的な実験により、私たちの手法が最先端のアプローチを大幅に上回り、既存の完全に教師付きの SOD 手法のいくつかも上回ることが実証されました。
コードと結果が公開されます。

要約(オリジナル)

Salient Object Detection (SOD) aims to identify and segment prominent regions within a scene. Traditional models rely on manually annotated pseudo labels with precise pixel-level accuracy, which is time-consuming. We developed a low-cost, high-precision annotation method by leveraging large foundation models to address the challenges. Specifically, we use a weakly supervised approach to guide large models in generating pseudo-labels through textual prompts. Since large models do not effectively focus on the salient regions of images, we manually annotate a subset of text to fine-tune the model. Based on this approach, which enables precise and rapid generation of pseudo-labels, we introduce a new dataset, BDS-TR. Compared to the previous DUTS-TR dataset, BDS-TR is more prominent in scale and encompasses a wider variety of categories and scenes. This expansion will enhance our model’s applicability across a broader range of scenarios and provide a more comprehensive foundational dataset for future SOD research. Additionally, we present an edge decoder based on dynamic upsampling, which focuses on object edges while gradually recovering image feature resolution. Comprehensive experiments on five benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches and also surpasses several existing fully-supervised SOD methods. The code and results will be made available.

arxiv情報

著者	Miaoyang He,Shuyong Gao,Tsui Qin Mok,Weifeng Ge,Wengqiang Zhang
発行日	2025-01-08 15:56:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー