Competition Street view picture character recognition 3: Convolution Neuron Network

最新推荐文章于 2023-09-06 16:05:51 发布

datawhale-leafy

最新推荐文章于 2023-09-06 16:05:51 发布

阅读量165

点赞数

分类专栏：深度学习

本文链接：https://blog.youkuaiyun.com/qq_42906799/article/details/106365708

版权

深度学习专栏收录该内容

23 篇文章

订阅专栏

本文深入探讨了迁移学习的概念，解释了如何将预训练模型应用于新任务，以解决数据标注困难和数据获取问题。通过实例说明了迁移学习在图像分类和其他领域的应用，并详细讨论了层迁移和多任务学习的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

class SVHN_Model2(nn.Module):
def init(self):
super(SVHN_Model1, self).init()

    model_conv = models.resnet18(pretrained=True)
    model_conv.avgpool = nn.AdaptiveAvgPool2d(1)
    model_conv = nn.Sequential(*list(model_conv.children())[:-1])
    self.cnn = model_conv
    
    self.fc1 = nn.Linear(512, 11)
    self.fc2 = nn.Linear(512, 11)
    self.fc3 = nn.Linear(512, 11)
    self.fc4 = nn.Linear(512, 11)
    self.fc5 = nn.Linear(512, 11)

def forward(self, img):        
    feat = self.cnn(img)
    # print(feat.shape)
    feat = feat.view(feat.shape[0], -1)
    c1 = self.fc1(feat)
    c2 = self.fc2(feat)
    c3 = self.fc3(feat)
    c4 = self.fc4(feat)
    c5 = self.fc5(feat)
    return c1, c2, c3, c4, c5

Here use the pretrained model, Let’s talk about how to use pretrained model.

Transfer Learning

Transfer learning is to transfer the trained model and parameters to another new model so that we do not need to retrain a new model from scratch.

For example, we can train a CNN for ImageNet, and then apply this trained model to other image classification data, or even use this model as a feature extraction tool to connect traditional SVM method.

In short, the main purpose of transfer learning is to solve the problems of data labeling difficulties and data acquisition.

Why is it called transfer learning? Because we transferred the parameter architecture and other information learned by other models to the current target problem to help us better solve the current problem.

What training set is the so-called other model built on? It can be expected that if the training set is very different from the training set of our target task, the migration effect may not be too good.

Layer Transfer

So, we can just use serveral layers. We freeze some layers and use them to extract the characterization, and train the rest layer with labelled data.

The problem of speech processing generally fixes the parameters of the last few layers, and optimizes the parameters of the first few layers. This is because individuals have different sounds due to differences in oral structure, etc., the same sounding method will get different sounds. What the several layers do is to extract the utterance from the sound, so the parameters of the first few layers will be different for different individuals; the things done in the latter layers are to get the recognition results by the way of occurrence, this part is universal, not Changes due to individual changes.

In contrast, in the task of image recognition, we usually fix the parameters of the first few layers and tune the parameters of the latter layers, because the first few layers of the neural network used for image recognition are used to extract simple features, such as lines and contours Etc., this part is almost suitable for various types of images; the latter layers combine the low-level features in some way to get the high-level features, and the combination method is different for specific recognition tasks.

Multitask Transfer

The idea of Multitask Learning is to let multiple tasks train the first few layers of the network at the same time, and train the last few layers of the network separately. For example, if Task A and Task B are similar in the above figure, they can be trained at the same time. For example, Task A and Task B are both image recognition tasks, one is used for cat and dog classification, and the other is used for elephant tiger classification, then we can let them share the first few parameters of the network.

What are the benefits of doing this? It can prevent overfitting to a certain extent, because multi-task training requires that the method of extracting features in the first few layers is applicable to multiple tasks at the same time, so it can improve the generalization performance. Of course, it can also be understood in this way. When training Task A, the training data of Task B can be regarded as a noise, and noise can improve the robustness of the network, so as to obtain a better generalization effect, even when training Task B.

In terms of speech recognition tasks, speech recognition in various languages can promote each other. According to teacher Li Hongyi, someone has done two or two combinations of dozens of languages and found that the performance of these language recognition tasks has been improved.

The following figure is an example of the performance improvement of Chinese (Mandarin) with the assistance of European language training: