A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning

介绍了一种结合残差学习与Inception风格层的ResCeption网络,用于单次观察下的车辆计数任务。该方法不依赖于特定车型或场景,并探讨了COWC数据集的特点及预处理方式。通过对比实验展示了ResCeption网络相较于其他模型的优势。

这篇文章值得关注的点:

1. This network combines residual learning with Inception-style layers and is used to count cars in one look. This is a new way to count objects rather than by localization or density estimation.

2. The counting method is not car or scene specific. It would be easy to train this method to count other kinds of objects and counting over new scenes requires no extra set up or assumptions about object locations.

3.COWC数据集:
3.1 The image set is annotated by single pixel points, 不用处理车辆大小不变性,因为地图的空间分辨率作为先验信息。
3.2 Large trucks are completely omitted since it can be unclear when something stops being a light vehicle and starts to become a truck. Vans and pickups are included as cars even
if they are large. All boats, trailers and construction vehicles are always added as negatives.
3.3 生成红色部分是测试集,蓝色部分是训练集。不觉得这样的选取很奇怪吗!!!
这里写图片描述
3.4 验证集的图片来自犹他州,图片内容与训练集的差别很大
3.5 训练集大小256X256, 每15度旋转扩充训练集,This yielded a set of 308,988 training
patches and 79,447 testing patches.Patch中间区域48 × 48 有车辆才认为有车,所以负样本里面也有车。此外,边缘处的32个像素灰色处理,因为这样实验对比出来,模型性能最好(An edge margin of 32 pixels was grayed out in each patch)。
4.The ResCeption network: We created a third network to synthesize Residual Learning [26] with Inception. We called this one ResCeption network.与标准的残差网络不一样的地方是增加了projection shortcut.
这里写图片描述
5. 对比试验与本文的方法在caffe上完成,AlexNet、GoogLeNet/Inception、ResCeption network。
6.
这里写图片描述
patch边缘灰化对性能的影响1%左右,作者认为上下文场景太多对车辆有不好的提示。
(That we can have too much context might be a result of too much irrelevant information or
bad hints from objects that are too far from a car.)
7. 测试用10张图片,滑动窗口的方式,窗口大小224X224,32像素的灰化区域, 步长8个像素。极大值抑制阈值0.75。Verification测试标准下,split和merge都没有计入(TP - x)/(TP+FP - 2x),所以值较大相比正常的统计TP/(TP+FP)

这里写图片描述

这里写图片描述

Q:发现没有correct = (TP+TN)/(TP+TN+FP+FN);F = 2*P*R/(P+R),可table2竟然拿这两个指标对比(Ideally the verification score should …)。

8. 计数,每个patch,车辆中心至少八个像素才算,整个网络变成一个回归问题,64个输出,表示一个patch可以输出最多64辆车。试验在GoogleNet(22层),
这里写图片描述

### Improved YOLOv7 with InceptionNeXt and Attention Mechanism for Vehicle Detection in Low-Light or Adverse Lighting Conditions The combination of the InceptionNeXt architecture and attention mechanisms can significantly enhance the performance of YOLOv7, especially in challenging conditions such as low-light or adverse lighting scenarios. Below is a detailed explanation of how this integration works and its potential benefits. #### 1. **InceptionNeXt Architecture Integration** The InceptionNeXT architecture introduces an efficient way to process feature maps by leveraging multi-scale convolutional layers. This allows the network to capture richer contextual information across different scales[^3]. By integrating InceptionNeXt into YOLOv7, the model becomes more adept at handling complex scenes where vehicles may appear in various sizes and under varying illumination conditions. The multi-scale processing capability ensures that small vehicle details are not overlooked, even in low-light environments. ```python # Example of incorporating InceptionNeXt blocks into YOLOv7 backbone class InceptionNeXtBlock(nn.Module): def __init__(self, in_channels, out_channels): super(InceptionNeXtBlock, self).__init__() self.branch1x1 = nn.Conv2d(in_channels, out_channels // 4, kernel_size=1) self.branch3x3 = nn.Conv2d(in_channels, out_channels // 4, kernel_size=3, padding=1) self.branch5x5 = nn.Conv2d(in_channels, out_channels // 4, kernel_size=5, padding=2) self.branch_pool = nn.Sequential( nn.MaxPool2d(kernel_size=3, stride=1, padding=1), nn.Conv2d(in_channels, out_channels // 4, kernel_size=1) ) def forward(self, x): branch1x1 = self.branch1x1(x) branch3x3 = self.branch3x3(x) branch5x5 = self.branch5x5(x) branch_pool = self.branch_pool(x) outputs = [branch1x1, branch3x3, branch5x5, branch_pool] return torch.cat(outputs, 1) ``` #### 2. **Attention Mechanisms** Attention mechanisms play a critical role in focusing on relevant features while suppressing irrelevant ones. For instance, channel-wise and spatial attention can be integrated into the YOLOv7 architecture to improve its robustness against poor lighting conditions. Channel attention helps prioritize feature maps that carry more discriminative information about vehicles, while spatial attention highlights regions within the image where vehicles are likely to appear[^4]. ```python # Example of integrating spatial and channel attention class SpatialAttention(nn.Module): def __init__(self, kernel_size=7): super(SpatialAttention, self).__init__() self.conv = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2, bias=False) def forward(self, x): avg_out = torch.mean(x, dim=1, keepdim=True) max_out, _ = torch.max(x, dim=1, keepdim=True) x = torch.cat([avg_out, max_out], dim=1) x = self.conv(x) return torch.sigmoid(x) class ChannelAttention(nn.Module): def __init__(self, in_planes, ratio=16): super(ChannelAttention, self).__init__() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.fc1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False) self.relu1 = nn.ReLU() self.fc2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False) def forward(self, x): avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x)))) max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x)))) out = avg_out + max_out return torch.sigmoid(out) ``` #### 3. **Low-Light Adaptation** In low-light conditions, the contrast and brightness of images decrease, making it harder for object detectors to identify vehicles accurately. To address this issue, preprocessing techniques such as histogram equalization or Retinex-based enhancement methods can be applied before feeding the data into the model[^5]. Additionally, training the model with augmented datasets that simulate low-light conditions ensures better generalization during inference. ```python # Example of Retinex-based enhancement for low-light images def retinex_enhancement(image): img_log = np.log1p(image) img_fft = np.fft.fft2(img_log) lowpass_filter = generate_lowpass_filter(image.shape) filtered_fft = img_fft * lowpass_filter enhanced_img = np.expm1(np.real(np.fft.ifft2(filtered_fft))) return enhanced_img ``` #### 4. **Experimental Results** Preliminary experiments demonstrate that the improved YOLOv7 model with InceptionNeXt and attention mechanisms achieves higher precision and recall rates compared to the standard YOLOv7 when tested on datasets with low-light or adverse lighting conditions[^6]. These improvements are attributed to the enhanced feature extraction capabilities provided by the InceptionNeXt architecture and the effective utilization of attention mechanisms. ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值