SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks

要約

Deep-Rearningベースの（DL）コンピュータービジョンアルゴリズムのさまざまな画像摂動への一般化は、確立が困難であり、研究の積極的な分野のままです。
過去の分析の大部分は、すでにキャプチャされた画像に焦点を当てていましたが、画像形成パイプラインと環境の効果はあまり研究されていません。
このホワイトペーパーでは、3つのビジョンタスクのDLモデルパフォーマンス（画像分類、オブジェクト検出、視覚的質問（VQA）のDLモデルのパフォーマンスに対するカメラパラメーターや照明などのキャプチャ条件の影響を分析することにより、この問題に対処します。
この目的のために、一般的なビジョンデータセットでのキャプチャバイアスを評価し、新しいベンチマーク、スナップ（$ \ textbf {s} $ hutter speed、iso se $ \ textbf {n} $ sitivity、および$ \ textbf {ap} $ erertureの場合）を作成します。
次に、多数のDLビジョンモデルを評価し、選択した各ビジョンタスクに対するキャプチャ条件の影響を示します。
最後に、VQAタスクの人間のベースラインを確立するための実験を実施します。
我々の結果は、コンピュータービジョンデータセットが大幅に偏っており、このデータで訓練されたモデルは、よく露出された画像でも人間の精度に達しておらず、カメラ設定の主要な露出の変化と微小な変動の両方に影響を与えていることを示しています。
コードとデータはhttps://github.com/ykotseruba/snapにあります

要約(オリジナル)

Generalization of deep-learning-based (DL) computer vision algorithms to various image perturbations is hard to establish and remains an active area of research. The majority of past analyses focused on the images already captured, whereas effects of the image formation pipeline and environment are less studied. In this paper, we address this issue by analyzing the impact of capture conditions, such as camera parameters and lighting, on DL model performance on 3 vision tasks — image classification, object detection, and visual question answering (VQA). To this end, we assess capture bias in common vision datasets and create a new benchmark, SNAP (for $\textbf{S}$hutter speed, ISO se$\textbf{N}$sitivity, and $\textbf{AP}$erture), consisting of images of objects taken under controlled lighting conditions and with densely sampled camera settings. We then evaluate a large number of DL vision models and show the effects of capture conditions on each selected vision task. Lastly, we conduct an experiment to establish a human baseline for the VQA task. Our results show that computer vision datasets are significantly biased, the models trained on this data do not reach human accuracy even on the well-exposed images, and are susceptible to both major exposure changes and minute variations of camera settings. Code and data can be found at https://github.com/ykotseruba/SNAP

arxiv情報

著者	Iuliia Kotseruba,John K. Tsotsos
発行日	2025-05-21 15:14:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー