[1]孟月波,纪拓,刘光辉,等.编码-解码多尺度卷积神经网络人群计数方法[J].西安交通大学学报,2020,54(05):149-157.[doi:10.7652/xjtuxb202005020]
 MENG Yuebo,JI Tuo,LIU Guanghui,et al.Encoding-Decoding Multi-Scale Convolutional Neural Network for Crowd Counting[J].Journal of Xi'an Jiaotong University,2020,54(05):149-157.[doi:10.7652/xjtuxb202005020]
点击复制

编码-解码多尺度卷积神经网络人群计数方法
分享到:

《西安交通大学学报》[ISSN:0253-987X/CN:61-1069/T]

卷:
54
期数:
2020年第05期
页码:
149-157
栏目:
出版日期:
2020-05-10

文章信息/Info

Title:
Encoding-Decoding Multi-Scale Convolutional Neural Network for Crowd Counting
文章编号:
0253-987X(2020)05-0149-09
作者:
孟月波 纪拓 刘光辉 徐胜军 李彤月
西安建筑科技大学信息与控制工程学院, 710055, 西安
Author(s):
MENG Yuebo JI Tuo LIU Guanghui XU Shengjun LI Tongyue
School of Information and Control Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China
关键词:
人群计数 编码-解码结构 多尺度 空洞空间金字塔池化 计数误差 损失函数
Keywords:
crowd counting encoding-decoding multi-scale atrous space pyramid pooling
分类号:
TP391
DOI:
10.7652/xjtuxb202005020
文献标志码:
A
摘要:
针对基于多列卷积神经网络的人群计数方法存在的多尺度特征信息丢失、融合不佳以及密度图质量不高等问题,提出了一种编码-解码结构的多尺度卷积神经网络人群计数方法。编码器采用多列卷积捕获多尺度特征,通过空洞空间金字塔池化扩大感受野并减少参数量,保留尺度特征和图像的上下文信息; 解码器对编码器输出进行上采样,实现高层语义信息和编码器前端低层特征信息有效融合,从而提升了密度图的输出质量。为增强网络对计数的敏感性,在以往像素空间损失的基础上考虑了计数误差,提出了一种新型损失函数。采用Shanghai Tech、Mall以及自建数据集进行了对比实验,结果表明:与之前最优方法相比,所提方法在Shanghai Tech数据集Part_A部分的平均绝对误差和均方误差分别降低了8.3%和21.3%,Part_B部分分别降低了12.9%和12.0%,Mall数据集分别降低了15.1%和23.8%,自建数据集分别降低了13.5%和7.1%; 在不同人群场景下,所提方法的人群计数准确性和鲁棒性均优于其他对比方法的。
Abstract:
Aiming at the problems of multi-scale feature information loss, poor fusion and low quality of density map in the crowd counting method based on multi-column convolutional neural network, a new crowd counting method is proposed based on encoding-decoding multi-scale convolutional neural network. The encoder part adopts multi-column convolution to capture multi-scale features, expands the receptive field and reduces the amount of calculation via atrous space pyramid pooling, and retains the multi-scale feature and the context information of the image. The decoder part upsamples the encoder output to achieve effective fusion of the features with rich high-level semantic information and the features with rich low-level detail information to improve the output quality of the density map. To enhance the sensitivity of the network to counting, a new loss function is proposed by considering the previous pixel space loss and the counting error. Contrast experiments with previous methods on Shanghai Tech, Mall, and self-built datasets are conducted, and it is found that the mean absolute error and mean square error of this method on part_A of Shanghai Tech dataset are 8.3% and 21.3% lower than the previous optimal method, and 12.9% and 12.0% lower in part_B of Shanghai Tech dataset. The mean absolute error and mean square error decrease by 15.1% and 23.8% for the Mall dataset, and decrease by 13.5% and 7.1% for the self-built dataset. The experimental results on Shanghai Tech, Mall and self-built datasets show the higher accuracy and better robustness of the proposed method than the traditional methods.

参考文献/References:

[1] CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting [C]∥Proceedings of the 2012 British Machine Vision Conference(BMVC). Guildford, UK: BMVA Press, 2012: 3-13.
[2] CAI Zebin, YU Zhu Liang, LIU Hao. Counting people in crowded scenes by video analyzing [C]∥Proceedings of the 2014 IEEE 9th Conference on Industrial Electronics and Applications(ICIEA). Piscataway, NJ, USA: IEEE, 2014: 1841-1845.
[3] CHEN T Y, CHEN C H, WANG D J, et al. A people counting system based on face-detection [C]∥Proceedings of the 4th International Conference on Genetic and Evolutionary Computing(ICGEC). Piscataway, NJ, USA: IEEE, 2011: 699-702.
[4] MARANA A N, VELASTIN S A, COSTA L F, et al. Estimation of crowd density using image processing [C]∥Proceedings of the 1997 IEE Colloquium on Image Processing for Security Applications. Stevenage, UK: IEE, 1997: 11-1-11-8.
[5] KILAMB P, RIBNICK E, JOSHI A J, et al. Estimating pedestrian counts in groups [J]. Computer Vision and Image Understanding, 2008, 110(1): 43-59.
[6] CHEN K, GONG S, XIANG T, et al. Cumulative attribute space for age and crowd density estimation [C]∥Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2013: 2467-2474.
[7] CHANGE L C, GONG S, XIANG T. From semi-supervised to transfer counting of crowds [C]∥Proceedings of the 2013 IEEE International Conference on Computer Vision(CVPR). Piscataway, NJ, USA: IEEE, 2013: 2256-2263.
[8] CHO S Y, CHOW T W S, LEUNG C T. A neural-based crowd estimation by hybrid global learning algorithm [J]. IEEE Transactions on Cybernetics, 1999, 29(4): 535-541.
[9] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]∥Proceedings of the 2015 Annual Conference on Neural Information Processing Systems(NIPS). Vancouver, Canada: NIPS, 2015: 91-99.
[10] 曹玉良, 明廷锋, 贺国, 等. 基于深度学习的离心泵空化状态识别 [J]. 西安交通大学学报, 2017, 51(11): 165-172.
CAO Yuliang, MING Yanfeng, HE Guo, et al. Artificial recognition of centrifugal pump cavitation status based on deep learning [J]. Journal of Xi’an Jiaotong University, 2017, 51(11): 165-172.
[11] 常亮, 邓小明, 周明全, 等. 图像理解中的卷积神经网络 [J]. 自动化学报, 2016, 42(9): 1300-1312.
CHANG Liang, DENG Xiaoming, ZHOU Mingquan, et al. Convolutional neural networks in image understanding [J]. Journal of Automation, 2016, 42(9): 1300-1312.
[12] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[13] WANG C, ZHANG H, YANG L, et al. Deep people counting in extremely dense crowds [C]∥Proceedings of the 23rd ACM International Conference on Multimedia. New York, USA: ACM, 2015: 1299-1302.
[14] ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks [C]∥Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2015: 833-841.
[15] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network [C]∥Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2016: 589-597.
[16] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C/OL]∥Proceedings of the 3rd International Conference on Learning Representations(ICLR). London, UK: ICLR, 2015. [2019-07-01]. https:∥arxiv.org/pdf/1409.1556.pdf.
[17] SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting [C]∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2017: 4031-4039.
[18] ZENG L, XU X, CAI B, et al. Multi-scale convolutional neural networks for crowd counting [C]∥Proceedings of the 2017 IEEE International Conference on Image Processing(ICIP). Piscataway, NJ, USA: IEEE, 2017: 465-469.
[19] LEMPITSKY V, ZISSERMAN A. Learning to count objects in images [C]∥Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: NIPS, 2010: 1324-1332.
[20] JOSEPH R, SANTOSH D, ROSS G. You only look once: unified, real-time object detection [C]∥Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2016: 779-788.
[21] REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]∥Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ, USA: IEEE, 2017: 7263-7271.
[22] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]∥Proceedings of the European Conference on Computer Vision(ECCV). Berlin, Germany: Springer, 2018: 801-818.
[23] 张焯林, 赵建伟, 曹飞龙. 构建带空洞卷积的深度神经网络重建高分辨率图像 [J]. 模式识别与人工智能, 2019, 32(3): 259-267.
ZHANG Zhuolin, ZHAO Jianwei, CAO Feilong. Building deep neural networks with dilated convolutions to reconstruct high-resolution image [J]. Pattern Recognition and Artificial Intelligence, 2019, 32(3): 259-267.
[24] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
[25] 时增林, 叶阳东, 吴云鹏, 等. 基于序的空间金字塔池化网络的人群计数方法 [J]. 自动化学报, 2016, 42(6): 866-874.
SHI Zenglin, YE Yangdong, WU Yunpeng, et al. Crowd counting using rank-based spatial pyramid pooling network [J]. Journal of Automation, 2016, 42(6): 866-874.
[26] SINDAGI V A, PATEL V M. A survey of recent advances in CNN-based single image crowd counting and density estimation [J]. Pattern Recognition Letters, 2018, 107: 3-16.

备注/Memo

备注/Memo:
收稿日期: 2019-11-20。作者简介: 孟月波(1979—),女,副教授; 刘光辉(通信作者),男,副教授。基金项目: 国家自然科学基金资助项目(51678470); 陕西省教育厅专项科研计划资助项目(18JK0477); 西安建筑科技大学基础研究基金资助项目(JC1703)。
更新日期/Last Update: 2020-05-10