人群计数：SFCN--Learning from Synthetic Data for Crowd Counting in the Wild-优快云博客

本文介绍了一种创新的人群计数方法，通过自动收集和标注数据，创建了首个大规模合成人群计数数据集GCC。该方法利用预训练方案提高了真实数据上的表现，采用域适应策略减少了合成与真实数据间的差距，实现了前沿成果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Now, there is a serious problem, the scarce data led many methods suffer from over-fitting to a different extent.

contribution：

We are the first to develop a data collector and labeler for crowd counting, which can automatically collect and annotate images without any labor costs. By using them, we create the first large-scale, synthetic and diverse crowd counting dataset.
We present a pretrained scheme to facilitate the original method’s performance on the real data, which can more effectively reduce the estimation errors compared with random initialization and ImageNet model. Further, through the strategy, our proposed SFCN achieves the state-of-the-art results.
We are the first to propose a crowd counting method via domain adaptation, which does not use any label of the real data. By our designed SE Cycle GAN, the domain gap between the synthetic and real data can be significantly reduced. Finally, the proposed method outperforms the two baselines.

GCC dataset:

**
The full name of GCC is GTA5 Crowd Counting. It has four highlights:

free collection and annotation
larger data volume and higher resolution
more diversified scenes
more accurate annotations

The process of getting a image for training:
a) select a location and setup the cameras
b) segment Region of interest (ROI) for crowd
c) set weather and time.
Place persons:
a) create persons in the ROI and get the head positions
b) obtain the person mask from stencil
c) integrate multiple images into one image
d) remove the positions of occluded heads.

How to use GCC ?

Random splitting the training set and testing set.
Cross-camera splitting: as for a specific location, one surveillance camera is randomly selected for testing and the others for training.
Cross-location splitting: we randomly choose 75/25 locations for training/testing.

This table shows the advantage of using GCC to pretrain their model:

在这里插入图片描述

generating density map:

**
There are two ways to estimate the destiny map:
1. superised crowd counting: pretrained GCC model on finetuning real dataset.
2. Crowd Counting via Domain Adaptation: learning mapping between the synthetic domain S and the real-world domain R, then training the SFCN just on GCC.

The relationship of them is shown in below:
在这里插入图片描述

superrised crowd counting:

A spatial encoder via a sequence of convolution on the four directions (down, up, left-to-right and right-to-left). After the spatial encoder, a regression layer is added, which directly outputs the density map with input’s 1/8 size.

We design a spatial FCN (SFCN) to produce the density map, which adopt VGG-16 or ResnNet-101 as the backbone. We modify the stride size to 1 in conv4 x of ResNet-101 backbone, which makes conv4 x output the feature maps with 1/8 size of the
input image. 在这里插入图片描述

Crowd Counting via Domain Adaptation:

Propose a crowd counting method via domain adaptation learns specific patterns or features from the synthetic data and transfers them to the real world.
To be specific, we present a SSIM Embedding (SE) Cycle GAN to transform the synthetic image to the photo-realistic image. Then we will train a SFCN on the translated data. No finetune on the real dataset. Then we get a satisfactory result:
在这里插入图片描述