CNN的实现与理解

本文围绕CNN这一经典机器学习算法展开,记录学习所得。介绍了训练集与训练标签的设置,使用mnist数据集;阐述激活函数作用及类型;讲解前向传播、损失函数和反向传播的原理与计算,最后提及模型训练、测试及具体例子。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

0 写在前

CNN做为经典机器学习的算法之一,是深度学习的基础.为此写了这篇博客,以记录学习中的所得.

1 训练集与训练标签的设置

本文适用于train为[None,X],其中None为样本,X为样本属性大小.labels为热编码后的标签.并使用mnist数据集.数据读取:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data',one_hot = True)
# 原始数据占位
data = mnist.test.images
labels = mnist.test.labels

2 激活函数

对于该算法,激活函数是必不可少的,其作用在于,将线性变换进行非线性转化.使得学习更具有泛化能力.本节有relu函数,sigmoid函数,softmax函数,tanh函数

def sigmoid(x):
    return 1.0/(1+np.exp(-x))
def relu(x):
    return (np.abs(x) + x) / 2
def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
def softmax(X):
    orig_shape = X.shape
    if len(X.shape)>1:
        X -= np.max(X,axis = 1,keepdims = True)
        a = np.exp(X)
        b = np.sum(np.exp(X),axis=1)
        for i in range(len(b)):
            X[i]=a[i]/b[i]
    else:
        X -=np.max(X,axis=0,keepdims=True)
        X = np.exp(X)/np.sum(np.exp(X),axis=0)
    assert X.shape == orig_shape
    return X 
     

3 前向传播

在这里插入图片描述
Z1=W1X+b1Z_{1}=W_{1}X+b_{1}Z1=W1X+b1 H1=f(Z1)H_{1}=f(Z_{1})H1=f(Z1) Z2=W2X+b2Z_{2}=W_{2}X+b_{2}Z2=W2X+b2 H2=f(Z2)H_{2}=f(Z_{2})H2=f(Z2) Z3=W3X+b3Z_{3}=W_{3}X+b_{3}Z3=W3X+b3 H3=f(Z3)H_{3}=f(Z_{3})H3=f(Z3)
其参数设置首先确定网络结构,并初始化权值参数,及代码如下:

def weight_bias(layerdims):
    W = {}
    b = {}
    for i in range(1,len(layerdims)):
        W['W'+str(i)] = np.random.randn(layerdims[i-1],layerdims[i])
        b['b'+str(i)] = np.random.randn(layerdims[i],)
    return W,b    
# 前向传播
def forword(data,Weight,bias,layerdims,activation):
    # 非线性输出
    H = {}
    H['H0'] = data
    # 线性输出
    Z = {}
    for i in range(1,len(layerdims)):
        # 
        Z['Z'+str(i)] = np.dot(H['H'+str(i-1)],Weight['W'+str(i)])+bias['b'+str(i)]
        exec("H['H' + str(i)] = " + activation[i-1] + "(Z['Z' + str(i)])")
    return Z,H

损失函数

Loss=−1mN∑j=1N∑i=1mylog⁡(y^i+(1−y)log⁡(1−y^i))Loss = -\frac{1}{mN} \sum_{j=1}^{N}\sum_{i=1}^{m}y\log(\hat{y}^{i}+(1-y)\log(1-\hat{y}^{i}))Loss=mN1j=1Ni=1mylog(y^i+(1y)log(1y^i))
其中m对应一个样本的属性多少.N为样本量.
其实现代码如下:

def loss_function(H,labels):
    lens = len(H)
    n = labels.shape[0]
    m = labels.shape[1]
    H_end = H['H'+str(lens-1)]
    y_ = H_end
    loss = -np.sum(np.sum(labels*np.log(y_)+(1-labels)*np.log(1-y_),axis = 1))/(m*n)    
    return loss

反向传播

为了方便观看,我们把之前整个的一个前向传播的过程复制过来:
Z1=W1X+b1Z_{1}=W_1X+b_{1}Z1=W1X+b1H1=RELU(Z1)H_{1}=RELU(Z_1)H1=RELU(Z1)Z2=W2H1+b2Z_2=W_2H_1+b_2Z2=W2H1+b2H2=RELU(Z2)H_{2}=RELU(Z_2)H2=RELU(Z2)Z3=W3H2+b3Z_3=W_3H_2+b_3Z3=W3H2+b3y^=sigmoid(Z3)\hat y = sigmoid(Z_3)y^=sigmoid(Z3)

同时,把损失函数也弄过来:
J(w,b)=−1mN∑j=1N∑i=1mylog⁡(y^i+(1−y)log⁡(1−y^i))J(w,b) = -\frac{1}{mN} \sum_{j=1}^{N}\sum_{i=1}^{m}y\log(\hat{y}^{i}+(1-y)\log(1-\hat{y}^{i}))J(w,b)=mN1j=1Ni=1mylog(y^i+(1y)log(1y^i))
注意,下面有的式子,为了直观没有写出来对矩阵求导的转置,写代码的时候需要注意这一点
首先第一件事是对z3z_3z3进行求导
∂J∂z3=∂J∂y^∂y^∂z3=y^−y=δ3\frac{\partial J}{\partial z_3}=\frac{\partial J}{\partial \hat y}\frac{\partial \hat y}{\partial z_3}=\widehat y - y=\delta_3z3J=y^Jz3y^=yy=δ3
为什么求这个式子,当然是为了链式相乘了,然后我们开始对参数w和b进行求导:
∂J∂w3=∂J∂z3∂z3∂w3=δ3H2\frac{\partial J}{\partial w_3}=\frac{\partial J}{\partial z_3}\frac{\partial z_3}{\partial w_3}=\delta_3 H_2 w3J=z3Jw3z3=δ3H2
∂J∂b3=∂J∂z3∂z3∂b3=δ3\frac{\partial J}{\partial b_3}=\frac{\partial J}{\partial z_3}\frac{\partial z_3}{\partial b_3}=\delta_3 b3J=z3Jb3z3=δ3
到这里我们完成了对w3和b3这两个参数进行求导,后面的基本上类似,就是运用链式求导的法则一层层往前求即可,我们我们继续向下写。
∂J∂z2=∂J∂z3∂z3∂H2∂H2∂z2=δ3w3relu′(z2)=δ2\frac{\partial J}{\partial z_2}=\frac{\partial J}{\partial z_3}\frac{\partial z_3}{\partial H_2}\frac{\partial H_2}{\partial z_2}=\delta_3 w_3 relu'(z_2)=\delta_2z2J=z3JH2z3z2H2=δ3w3relu(z2)=δ2
∂J∂w2=∂J∂z2∂z2∂w2=δ2H1\frac{\partial J}{\partial w_2}=\frac{\partial J}{\partial z_2}\frac{\partial z_2}{\partial w_2}=\delta_2 H_1 w2J=z2Jw2z2=δ2H1
∂J∂b2=∂J∂z2∂z2∂b2=δ2\frac{\partial J}{\partial b_2}=\frac{\partial J}{\partial z_2}\frac{\partial z_2}{\partial b_2}=\delta_2 b2J=z2Jb2z2=δ2
对于w1和b1也是一样的
∂J∂z1=∂J∂z2∂z2∂H1∂H1∂z1=δ2w2relu′(z1)=δ1\frac{\partial J}{\partial z_1}=\frac{\partial J}{\partial z_2}\frac{\partial z_2}{\partial H_1}\frac{\partial H_1}{\partial z_1}=\delta_2 w_2 relu'(z_1)=\delta_1z1J=z2JH1z2z1H1=δ2w2relu(z1)=δ1
∂J∂w1=∂J∂z1∂z1∂w1=δ1x\frac{\partial J}{\partial w_1}=\frac{\partial J}{\partial z_1}\frac{\partial z_1}{\partial w_1}=\delta_1 x w1J=z1Jw1z1=δ1x
∂J∂b1=∂J∂z1∂z1∂b1=δ1\frac{\partial J}{\partial b_1}=\frac{\partial J}{\partial z_1}\frac{\partial z_1}{\partial b_1}=\delta_1 b1J=z1Jb1z1=δ1

ok 我们完成了所有的反向求导,接下来我们终于可以愉快的写代码了!

# 反向传播
def backward_propagation(X, labels, weight, bias, H, activation):
    m = X.shape[0]
    gradients = {}
    L = len(weight)
    ## 计算导数
    gradients['dZ' + str(L)] = H['H' + str(L)] - labels
    gradients['dW' + str(L)] = 1./m * np.dot( H['H' + str(L-1)].T,gradients['dZ' + str(L)]) 
    gradients['db' + str(L)] = 1./m * np.sum(gradients['dZ' + str(L)].T, axis=1, keepdims = True)
    for l in range(L-1, 0, -1):
        gradients['dH' + str(l)] = np.dot(gradients['dZ'+str(l+1)],weight['W'+str(l+1)].T)
        if activation[l-1] == 'relu':
            gradients['dZ'+str(l)] = np.multiply(gradients['dH'+str(l)], np.int64(H['H'+str(l)] > 0))
        elif activation[l-1] == 'tanh':
            gradients['dZ'+str(l)] = np.multiply(gradients['dH'+str(l)], 1 - np.power(H['H'+str(l)], 2))
        elif activation[l-1] == 'sigmoid':
            gradients['dZ'+str(l)] = np.multiply(gradients['dH'+str(l)], sigmoid_gard(H['H'+str(l)]))
        gradients['dW'+str(l)] = 1./m * np.dot(H['H' + str(l-1)].T,gradients['dZ' + str(l)])
        gradients['db'+str(l)] = 1./m * np.sum(gradients['dZ' + str(l)].T, axis=1, keepdims = True)
    return gradients

本代码先计算最后一层的损失导数,然后递推得到前面每一层的参数导数.最后就是参数更新了

# 参数更新
def updata_parameters(weight, bias, gradients, lr):
    ## 更新参数,lr 为 learning rate,是代表参数的学习率
    ## 太小会使网络收敛很慢,太大可能会使网络在最低点附近徘徊而不会收敛
    for i in range(1, len(weight)+1):
        weight['W'+str(i)] -= lr * gradients['dW'+str(i)]
        bias['b'+str(i)] -= lr * gradients['db'+str(i)][0]

    return weight, bias  

模型训练

def nn_fit(X,Y, Weight, bias, activation, lr, lambd=0.7, num_iterations=5000, print_cost=[True, 100]):
    ## num_iteration 是迭代的次数,print_cost,可以在每几次迭代后打印成本。
    ## 每次迭代都会按「前向传播,计算成本,计算梯度,更新梯度」的顺序执行 
    for i in range(num_iterations):
        Z,H = forword(X,Weight, bias,layerdims,activation)
        cost = loss_function(H,Y)
        grads = backward_propagation(X, Y, Weight, bias, H, activation)
        Weight, b  = updata_parameters(Weight, bias, grads, lr)

        if print_cost[0] and i % print_cost[1] == 0:
            print("Cost after iteration %i: %f" % (i, cost))
    return Weight, bias

模型测试

def cls_predict(X, Weight, bias, activation):
    ## 输出大于 0.5 的视为 1
    Z,H = forword(X,Weight, bias,layerdims,activation)
    prediction = (H['H'+str(len(H)-1)] > 0.5)
    return prediction

具体例子1:

import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
import keras
# In[2]:
# 准备数据
def load_data():
    # 训练样本有 300 个,测试样本有 100 个
    train_X, train_Y = make_circles(n_samples=6600, noise=.02)
    test_X, test_Y = make_circles(n_samples=1200, noise=.02)
    # 可视化数据
    plt.scatter(train_X[:, 0], train_X[:, 1], c=train_Y, s=40, cmap=plt.cm.Spectral);
    train_X = train_X.T
    train_Y = train_Y.reshape((1, train_Y.shape[0]))
    test_X = test_X.T
    test_Y = test_Y.reshape((1, test_Y.shape[0]))
    return train_X, train_Y, test_X, test_Y

train_X, train_Y, test_X, test_Y = load_data()
X = train_X.T
Y = keras.utils.to_categorical(train_Y, 2)[0]
activation = ['relu','relu','sigmoid']
layerdims = [2,18,7,2]
Weight, bias = weight_bias(layerdims)
Weight, bias = nn_fit(X,Y, Weight, bias, activation, lr=0.1, lambd=0.7, num_iterations=5000, print_cost=[True, 100])
X1 = test_X.T
Y1 = keras.utils.to_categorical(test_Y, 2)[0]
prediction = cls_predict(X1, Weight, bias, activation)
accuracy = np.mean((prediction == Y1),dtype=np.float64)
print(accuracy)

具体例子2:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data',one_hot = True)
train_X, train_Y, test_X, test_Y = mnist.train.images,mnist.train.labels,mnist.validation.images,mnist.validation.labels
activation = ['sigmoid','sigmoid','sigmoid']
layerdims = [784,256,64,10]
Weight, bias = weight_bias(layerdims)
Weight, bias = nn_fit(train_X, train_Y, Weight, bias, activation, lr=0.05, lambd=0.2, num_iterations=2000, print_cost=[True, 50])
prediction = cls_predict(test_X, Weight, bias, activation)
accuracy = np.mean((prediction== test_Y),dtype=np.float64)
print(accuracy)
本课程适合具有一定深度学习基础,希望发展为深度学习之计算机视觉方向的算法工程师和研发人员的同学们。基于深度学习的计算机视觉是目前人工智能最活跃的领域,应用非常广泛,如人脸识别和无人驾驶中的机器视觉等。该领域的发展日新月异,网络模型和算法层出不穷。如何快速入门并达到可以从事研发的高度对新手和中级水平的学生而言面临不少的挑战。精心准备的本课程希望帮助大家尽快掌握基于深度学习的计算机视觉的基本原理、核心算法和当前的领先技术,从而有望成为深度学习之计算机视觉方向的算法工程师和研发人员。本课程系统全面地讲述基于深度学习的计算机视觉技术的原理并进行项目实践。课程涵盖计算机视觉的七大任务,包括图像分类、目标检测、图像分割(语义分割、实例分割、全景分割)、人脸识别、图像描述、图像检索、图像生成(利用生成对抗网络)。本课程注重原理和实践相结合,逐篇深入解读经典和前沿论文70余篇,图文并茂破译算法难点, 使用思维导图梳理技术要点。项目实践使用Keras框架(后端为Tensorflow),学员可快速上手。通过本课程的学习,学员可把握基于深度学习的计算机视觉的技术发展脉络,掌握相关技术原理和算法,有助于开展该领域的研究开发实战工作。另外,深度学习之计算机视觉方向的知识结构及学习建议请参见本人优快云博客。本课程提供课程资料的课件PPT(pdf格式)和项目实践代码,方便学员学习和复习。本课程分为上下两部分,其中上部包含课程的前五章(课程介绍、深度学习基础、图像分类、目标检测、图像分割),下部包含课程的后四章(人脸识别、图像描述、图像检索、图像生成)。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值