Multi-Perspective Data Augmentation for Few-shot Object Detection

要約

最近の少数のショットオブジェクト検出（FSOD）メソッドは、新しいクラスの合成サンプルの増強に焦点を当てており、拡散モデルの台頭の有望な結果を示しています。
ただし、このようなデータセットの多様性は、特に前景と背景関係のコンテキストでは、典型的でハードサンプルの認識が欠けているため、代表性が限られていることがよくあります。
この問題に取り組むために、マルチ視点データ増強（MPAD）フレームワークを提案します。
前景前の関係に関しては、合成サンプルの詳細と空間情報を強化するために、境界ボックス調整を備えたオブジェクト合成（ICO）のコンテキスト内学習を提案します。
大きなマージンの原則に触発されて、サポートサンプルはクラスの境界を定義する上で重要な役割を果たします。
したがって、拡散モデルの生成プロセスの各時間ステップで促進する埋め込みを混合して、ハードな新規サンプルを生成するために、高調波プロンプト集約スケジューラ（HPA）を設計します。
フォアグラウンドバックグラウンドの関係については、典型的でハードな背景をサンプリングするための背景提案方法（BAP）を紹介します。
複数のFSODベンチマークでの広範な実験は、アプローチの有効性を示しています。
私たちのフレームワークは、従来の方法を大幅に上回り、Pascal VOCのベースラインでNAP50で平均17.5ドル\％$の増加を達成します。
コードはhttps://github.com/nvakhoa/mpadで入手できます。

要約(オリジナル)

Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of $17.5\%$ in nAP50 over the baseline on PASCAL VOC. Code is available at https://github.com/nvakhoa/MPAD.

arxiv情報

著者	Anh-Khoa Nguyen Vu,Quoc-Truong Truong,Vinh-Tiep Nguyen,Thanh Duc Ngo,Thanh-Toan Do,Tam V. Nguyen
発行日	2025-02-25 13:34:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-Perspective Data Augmentation for Few-shot Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー