How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

要約

コンピュータービジョンにおけるオブジェクト検出 (OD) は近年大幅な進歩を遂げ、クローズドセットラベルから大規模なビジョン言語事前トレーニング (VLP) に基づくオープン語彙検出 (OVD) に移行しています。
ただし、現在の評価方法とデータセットは、オブジェクトタイプと参照式に対する一般化のテストに限定されており、OVD モデルの能力の体系的で詳細かつ正確なベンチマークは提供されません。
この論文では、OVDEval という新しいベンチマークを提案します。このベンチマークには 9 つのサブタスクが含まれており、常識知識、属性理解、位置理解、オブジェクト関係理解などの評価が導入されています。
このデータセットは、モデルの視覚的および言語的入力の真の理解に挑戦するハードネガを提供するために、細心の注意を払って作成されています。
さらに、これらのきめの細かいラベルデータセットでモデルをベンチマークする際の一般的な平均精度 (AP) メトリクスの問題を特定し、この問題に対処するために非最大抑制平均精度 (NMS-AP) と呼ばれる新しいメトリクスを提案します。
広範な実験結果は、既存のトップ OVD モデルが単純なオブジェクトタイプを除いてすべて新しいタスクに失敗することを示しており、現在の OVD モデルの弱点を正確に特定し、将来の研究を導く上で提案されたデータセットの価値を実証しています。
さらに、従来の AP メトリックでは欺瞞的な結果が得られるのに対し、提案された NMS-AP メトリックは実験によって検証され、OVD モデルのより正確な評価が提供されます。
データは \url{https://github.com/om-ai-lab/OVDEval} で入手できます。

要約(オリジナル)

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models’ abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models’ true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at \url{https://github.com/om-ai-lab/OVDEval}

arxiv情報

著者	Yiyang Yao,Peng Liu,Tiancheng Zhao,Qianqian Zhang,Jiajia Liao,Chunxin Fang,Kyusong Lee,Qing Wang
発行日	2023-08-25 04:54:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー