Competition Street view picture character recognition 3: Convolution Neuron Network

本文深入探讨了迁移学习的概念,解释了如何将预训练模型应用于新任务,以解决数据标注困难和数据获取问题。通过实例说明了迁移学习在图像分类和其他领域的应用,并详细讨论了层迁移和多任务学习的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

class SVHN_Model2(nn.Module):
def init(self):
super(SVHN_Model1, self).init()

    model_conv = models.resnet18(pretrained=True)
    model_conv.avgpool = nn.AdaptiveAvgPool2d(1)
    model_conv = nn.Sequential(*list(model_conv.children())[:-1])
    self.cnn = model_conv
    
    self.fc1 = nn.Linear(512, 11)
    self.fc2 = nn.Linear(512, 11)
    self.fc3 = nn.Linear(512, 11)
    self.fc4 = nn.Linear(512, 11)
    self.fc5 = nn.Linear(512, 11)

def forward(self, img):        
    feat = self.cnn(img)
    # print(feat.shape)
    feat = feat.view(feat.shape[0], -1)
    c1 = self.fc1(feat)
    c2 = self.fc2(feat)
    c3 = self.fc3(feat)
    c4 = self.fc4(feat)
    c5 = self.fc5(feat)
    return c1, c2, c3, c4, c5

Here use the pretrained model, Let’s talk about how to use pretrained model.

Transfer Learning

Transfer learning is to transfer the trained model and parameters to another new model so that we do not need to retrain a new model from scratch.

For example, we can train a CNN for ImageNet, and then apply this trained model to other image classification data, or even use this model as a feature extraction tool to connect traditional SVM method.

In short, the main purpose of transfer learning is to solve the problems of data labeling difficulties and data acquisition.

Why is it called transfer learning? Because we transferred the parameter architecture and other information learned by other models to the current target problem to help us better solve the current problem.

What training set is the so-called other model built on? It can be expected that if the training set is very different from the training set of our target task, the migration effect may not be too good.

Layer Transfer

So, we can just use serveral layers. We freeze some layers and use them to extract the characterization, and train the rest layer with labelled data.

The problem of speech processing generally fixes the parameters of the last few layers, and optimizes the parameters of the first few layers. This is because individuals have different sounds due to differences in oral structure, etc., the same sounding method will get different sounds. What the several layers do is to extract the utterance from the sound, so the parameters of the first few layers will be different for different individuals; the things done in the latter layers are to get the recognition results by the way of occurrence, this part is universal, not Changes due to individual changes.

In contrast, in the task of image recognition, we usually fix the parameters of the first few layers and tune the parameters of the latter layers, because the first few layers of the neural network used for image recognition are used to extract simple features, such as lines and contours Etc., this part is almost suitable for various types of images; the latter layers combine the low-level features in some way to get the high-level features, and the combination method is different for specific recognition tasks.

Multitask Transfer

The idea of ​​Multitask Learning is to let multiple tasks train the first few layers of the network at the same time, and train the last few layers of the network separately. For example, if Task A and Task B are similar in the above figure, they can be trained at the same time. For example, Task A and Task B are both image recognition tasks, one is used for cat and dog classification, and the other is used for elephant tiger classification, then we can let them share the first few parameters of the network.

What are the benefits of doing this? It can prevent overfitting to a certain extent, because multi-task training requires that the method of extracting features in the first few layers is applicable to multiple tasks at the same time, so it can improve the generalization performance. Of course, it can also be understood in this way. When training Task A, the training data of Task B can be regarded as a noise, and noise can improve the robustness of the network, so as to obtain a better generalization effect, even when training Task B.

In terms of speech recognition tasks, speech recognition in various languages ​​can promote each other. According to teacher Li Hongyi, someone has done two or two combinations of dozens of languages ​​and found that the performance of these language recognition tasks has been improved.

The following figure is an example of the performance improvement of Chinese (Mandarin) with the assistance of European language training:

Retinex算法是图像处理领域中一种模拟人眼视觉特性的经典算法,其名称来源于“Retina”(视网膜)和“NeXt”(下一步),旨在通过模拟人眼对光线的处理过程,增强图像的局部对比度,改善图像质量,使色彩更加鲜明,同时降低光照变化的影响。该理论由Gibson在1950年提出,基于两个核心假设:一是图像的颜色信息主要体现在局部亮度差异而非全局亮度;二是人眼对亮度对比更敏感,而非绝对亮度。 Retinex算法的核心思想是通过增强图像的局部对比度来改善视觉效果。它通过计算图像的对数变换并进行局部平均,从而突出图像的细节和色彩,同时减少光照不均匀带来的影响。 MSR是Retinex算法的一种改进版本,引入了多尺度处理的概念。它通过以下步骤实现: 图像预处理:对原始图像进行归一化或滤波,以减少噪声和光照不均匀的影响。 多尺度处理:使用不同大小的高斯核生成多个尺度的图像,每个尺度对应不同范围的特征。 Retinex处理:在每个尺度上应用Retinex算法,通过计算对数变换和局部平均来增强图像细节。 融合:将不同尺度的处理结果通过权重融合,生成最终的增强图像。MSR能够更好地捕捉不同大小的细节,并降低噪声的影响。 MSSR是MSR的变种,它不仅在尺度上进行处理,还考虑了空间域上相邻像素之间的关系。这种处理方式有助于保留图像的边缘信息,同时提高图像的平滑性,进一步提升图像质量。 在提供的压缩包中,包含三个MATLAB文件:SSR.m、MSRCR.m和MSR.m。这些文件分别实现了不同版本的Retinex算法: SSR.m:实现单一尺度的Retinex算法,仅在固定尺度上处理图像。 MSRCR.m:实现改进的减法Retinex算法,通过颜色恢复步骤纠正光照变化对颜色的影响。 MSR.m:实现基础的多尺度Retinex算法,涉及多尺度图像处理和Retinex操作。 MATLAB是一种广泛应用
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值