Larger is not Better: A Survey on the Robustness of Computer Vision Models against Common Corruptions

要約

コンピュータービジョンモデルのパフォーマンスは、一般的な破損 (ノイズ、ぼやけ、照明の変化など) として知られる入力画像の予期せぬ変化の影響を受けやすく、実際のシナリオで展開した場合に信頼性が損なわれる可能性があります。
これらの破損は、モデルの一般化と堅牢性をテストするために常に考慮されるわけではありません。
この調査では、一般的な破損に対するコンピュータービジョンモデルの堅牢性を向上させる方法の包括的な概要を示します。
対象となるモデル部分とトレーニング方法に基づいて、方法を 4 つのグループ (データ拡張、表現学習、知識蒸留、ネットワークコンポーネント) に分類します。
また、破損の堅牢性に役立つ可能性がある、ショートカット学習の一般化と軽減のための間接的な方法についても説明します。
私たちは、複数のデータセットで堅牢性のパフォーマンスを比較し、文献内の評価の不一致に対処するための統合ベンチマークフレームワークをリリースします。
我々は、一般的なビジョンバックボーンの基本的な破損耐性の実験概要を提供し、破損耐性が必ずしもモデルサイズに比例しないことを示します。
非常に大規模なモデル (パラメータが 100M を超える) では、計算要件の増加を考慮すると、堅牢性は無視できる程度になります。
一般化可能で堅牢なコンピュータビジョンモデルを実現するには、限られたデータを効率的に活用し、望ましくない、または信頼性の低い学習動作を軽減するための新しい学習戦略を開発する必要があると予測しています。

要約(オリジナル)

The performance of computer vision models are susceptible to unexpected changes in input images, known as common corruptions (e.g. noise, blur, illumination changes, etc.), that can hinder their reliability when deployed in real scenarios. These corruptions are not always considered to test model generalization and robustness. In this survey, we present a comprehensive overview of methods that improve the robustness of computer vision models against common corruptions. We categorize methods into four groups based on the model part and training method addressed: data augmentation, representation learning, knowledge distillation, and network components. We also cover indirect methods for generalization and mitigation of shortcut learning, potentially useful for corruption robustness. We release a unified benchmark framework to compare robustness performance on several datasets, and address the inconsistencies of evaluation in the literature. We provide an experimental overview of the base corruption robustness of popular vision backbones, and show that corruption robustness does not necessarily scale with model size. The very large models (above 100M parameters) gain negligible robustness, considering the increased computational requirements. To achieve generalizable and robust computer vision models, we foresee the need of developing new learning strategies to efficiently exploit limited data and mitigate unwanted or unreliable learning behaviors.

arxiv情報

著者	Shunxin Wang,Raymond Veldhuis,Christoph Brune,Nicola Strisciuglio
発行日	2023-08-11 15:23:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Larger is not Better: A Survey on the Robustness of Computer Vision Models against Common Corruptions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー