Few-shot target-driven instance detection based on open-vocabulary object detection models

要約

現在の大規模なオープンビジョンモデルは、1 ショットまたは数ショットの物体認識に役立つ可能性があります。
それにもかかわらず、勾配ベースの再トレーニングソリューションはコストがかかります。
一方、オープン語彙オブジェクト検出モデルは、同じ潜在空間内に視覚的概念とテキスト概念を近づけ、少ない計算コストでプロンプトによるゼロショット検出を可能にします。
我々は、テキストによる説明を必要とせずに、後者をワンショットまたは数ショットの物体認識モデルに変える軽量な方法を提案します。
YOLO-World モデルをベースとして使用した TEgO データセットでの実験では、モデルのサイズ、サンプルの数、画像拡張の使用に応じてパフォーマンスが向上することがわかりました。

要約(オリジナル)

Current large open vision models could be useful for one and few-shot object recognition. Nevertheless, gradient-based re-training solutions are costly. On the other hand, open-vocabulary object detection models bring closer visual and textual concepts in the same latent space, allowing zero-shot detection via prompting at small computational cost. We propose a lightweight method to turn the latter into a one-shot or few-shot object recognition models without requiring textual descriptions. Our experiments on the TEgO dataset using the YOLO-World model as a base show that performance increases with the model size, the number of examples and the use of image augmentation.

arxiv情報

著者	Ben Crulis,Barthelemy Serres,Cyril De Runz,Gilles Venturini
発行日	2024-10-21 14:03:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Few-shot target-driven instance detection based on open-vocabulary object detection models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー