Indoor scene recognition from images under visual corruptions

要約

屋内シーンの分類は、生活支援のためのインテリジェントロボット工学などのさまざまなアプリケーションにおいて重要な要素です。
ディープラーニングはこの分野を大幅に進歩させましたが、モデルは画像の破損によりパフォーマンスの低下に悩まされることがよくあります。
この論文では、マルチモーダルデータフュージョンを活用し、キャプションベースのセマンティック機能を視覚データと統合して精度と破損に対する堅牢性の両方を強化する、屋内シーン認識への革新的なアプローチを紹介します。
グラフ畳み込みネットワーク (GCN) を介して、CNN モデルの視覚的特徴とセマンティックキャプションを相乗させる 2 つのマルチモーダルネットワークを調べます。
私たちの調査では、この融合によってモデルのパフォーマンスが著しく向上し、Places365 データセットの破損したサブセットに対して評価した場合にトップ 1 の精度が顕著に向上したことが示されています。
さらに、スタンドアロンのビジュアルモデルは破損していない画像では高い精度を示しましたが、破損の深刻度が増すとパフォーマンスが大幅に低下しました。
逆に、マルチモーダルモデルは、クリーンな状態での精度の向上と、さまざまな画像破損に対する実質的な堅牢性を実証しました。
これらの結果は、キャプションを通じて高レベルのコンテキスト情報を組み込むことの有効性を強調し、分類システムの回復力を強化するための有望な方向性を示唆しています。

要約(オリジナル)

The classification of indoor scenes is a critical component in various applications, such as intelligent robotics for assistive living. While deep learning has significantly advanced this field, models often suffer from reduced performance due to image corruption. This paper presents an innovative approach to indoor scene recognition that leverages multimodal data fusion, integrating caption-based semantic features with visual data to enhance both accuracy and robustness against corruption. We examine two multimodal networks that synergize visual features from CNN models with semantic captions via a Graph Convolutional Network (GCN). Our study shows that this fusion markedly improves model performance, with notable gains in Top-1 accuracy when evaluated against a corrupted subset of the Places365 dataset. Moreover, while standalone visual models displayed high accuracy on uncorrupted images, their performance deteriorated significantly with increased corruption severity. Conversely, the multimodal models demonstrated improved accuracy in clean conditions and substantial robustness to a range of image corruptions. These results highlight the efficacy of incorporating high-level contextual information through captions, suggesting a promising direction for enhancing the resilience of classification systems.

arxiv情報

著者	Willams de Lima Costa,Raul Ismayilov,Nicola Strisciuglio,Estefania Talavera Martinez
発行日	2024-08-23 12:35:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Indoor scene recognition from images under visual corruptions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー