Multi-scale Feature Aggregation for Crowd Counting

要約

畳み込みニューラルネットワーク（CNN）に基づく群集計数法は、ここ数年、有望な結果を得ている。しかし、正確な計数推定を行うためには、スケール変動の問題が依然として大きな課題である。本論文では、この問題をある程度緩和できるマルチスケール特徴集約ネットワーク(MSFANet)を提案する。具体的には、本アプローチは、ショートアグリゲーション（ShortAgg）とスキップアグリゲーション（SkipAgg）という二つの特徴量集約モジュールから構成される。ShortAggモジュールは、隣接する畳み込みブロックの特徴量を集約する。その目的は、異なる受容野を持つ特徴をネットワークの下部から上部へ徐々に融合させることである。SkipAggモジュールは、受容野が小さい特徴量を、より大きい受容野を持つ特徴量に直接伝播させる。その目的は、大小の受容野を持つ特徴の融合を促進することである。特に、Skin Transformerブロックから局所自己注視特徴を導入することで、豊富な空間情報を取り込むことができる。さらに、非一様な群衆分布を考慮したローカル＆グローバルベースの計数損失を提示する。4つの困難なデータセット（ShanghaiTech dataset, UCF_CC_50 dataset, UCF-QNRF Dataset, WorldExpo’10 dataset）に対する大規模な実験により、提案する簡単に実装できるMSFANetが、従来の最先端アプローチと比較して、有望な結果を達成できることを実証する。

要約(オリジナル)

Convolutional Neural Network (CNN) based crowd counting methods have achieved promising results in the past few years. However, the scale variation problem is still a huge challenge for accurate count estimation. In this paper, we propose a multi-scale feature aggregation network (MSFANet) that can alleviate this problem to some extent. Specifically, our approach consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg). The ShortAgg module aggregates the features of the adjacent convolution blocks. Its purpose is to make features with different receptive fields fused gradually from the bottom to the top of the network. The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields. Its purpose is to promote the fusion of features with small and large receptive fields. Especially, the SkipAgg module introduces the local self-attention features from the Swin Transformer blocks to incorporate rich spatial information. Furthermore, we present a local-and-global based counting loss by considering the non-uniform crowd distribution. Extensive experiments on four challenging datasets (ShanghaiTech dataset, UCF_CC_50 dataset, UCF-QNRF Dataset, WorldExpo’10 dataset) demonstrate the proposed easy-to-implement MSFANet can achieve promising results when compared with the previous state-of-the-art approaches.

arxiv情報

著者	Xiaoheng Jiang,Xinyi Wu,Hisham Cholakkal,Rao Muhammad Anwer,Jiale Cao Mingliang Xu,Bing Zhou,Yanwei Pang,Fahad Shahbaz Khan
発行日	2022-08-10 10:23:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Multi-scale Feature Aggregation for Crowd Counting

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー