人群计数:CP-CNN --Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

The goal of this paper :

**

generating high-quality crowd density and lower count error map

**
The reason for doing this work :

  1. Now many works do not explicitly incorporate contextual information which is essential for achieving further improvements.
  2. Though existing approaches regress on density maps, they are more focused on improving count errors rather than quality of the density maps
  3. Existing CNN-based approaches are trained using a pixel-wise Euclidean loss which results in blurred density maps.

**

Contributions :

  1. CP-CNN
  2. high-quality density maps
  3. adversarial loss and Euclidean loss
  4. the contribution of contextual information and adversarial loss

**

what is CP-CNN ?

**

Answer : CP-CNN consists of four modules: Global Context Estimator (GCE), Local Context Estimator (LCE), Density Map Estimator (DME) and a Fusion-CNN (F-CNN)

This architecture is shown in the following picture :
在这里插入图片描述

**

The fountion of each part :

**
1. GCE is a VGG-16 based CNN that encodes global context and it is trained to classify input images into different density classes.

Detail : A VGG-16 based network is fine-tuned with the crowd training data, and the last three fully connected layers of VGG-16 are replaced with a different configuration of fully connected layers in order to cater to our task of classification into five categories. As is shown in the following picture.

在这里插入图片描述

2. LCE is another CNN that encodes local context information and it is trained to perform patch-wise classification of input images into different density classes

Detail : some kind of local contextual information can aid us to achieve better quality maps. Learn an image’s local context by learning to classify it’s local patches into one of the five classes. As is shown in the following picture.
在这里插入图片描述

3. DME is a multi-column architecture-based CNN that aims to generate high-dimensional feature maps

Detail : Density Map Estimator (DME) : transform the input image into a set of high-dimensional feature maps which will be concatenated with the contextual information provided by GCE and LCE. As is shown in the following picture.
在这里插入图片描述

4. F-CNN :fused the contextual information estimated by GCE and LCE and DME. And it uses a set of convolutional and fractionally-strided convolutional layers to generate high resolution and high-quality density maps.

Detail : CR(64,9)-CR(32,7)- TR(32)-CR(16,5)-TR(16)-C(1,1), where, C is convolutional layer, R is ReLU layer, T is fractionally-strided convolution layer and the first number inside every brace indicates the number of filters while the second number indicates filter size. Every fractionally-strided convolution layer increases the input resolution by a factor of 2, thereby ensuring that the output resolution is the same as that of input.

What is the conection of these parts ?

Answer : contextual information obtained by LCE and GCE is combined with the output of DME using a Fusion-CNN (F-CNN).

**

How to evaluate the performance of this network ?

**
The loss for training F-CNN and DME is defined as follows :

在这里插入图片描述
LT is the overall loss, LE is the pixel-wise Euclidean loss between estimated density map and it’s corresponding ground truth, λa is a weighting factor, LA is the adversarial loss, X is the input image of dimensions W × H, Y is the ground truth density map, φ is the network consisting of DME and F-CNN and φD is the discriminator sub-network for calculating the adversarial loss. Following structure is used for the discriminator sub-network: CP(64)-CP(128)-M-CP(256)-MCP(256)-CP(256)-M-C(1)-Sigmoid, where C represents convolutional layer, P represents PReLU layer and M is max-pooling layer.

**

Keeping on fighting

**

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值