neural network

neural network

概念;

​ 神经网络是一种人工智能思想方法,从人脑的角度来学习如何智能化。


模型:

neuron network

我们把第一部分叫作输入层,第二部分叫作隐藏层,第三部分叫作输出层

每一条一直到底的线都是一个运算过程

我们可以得出如下:
a 1 ( 2 ) = g ( θ 10 ( 1 ) x 0 + θ 11 ( 1 ) x 1 + θ 12 ( 1 ) x 2 + θ 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( θ 20 ( 1 ) x 0 + θ 21 ( 1 ) x 1 + θ 22 ( 1 ) x 2 + θ 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( θ 30 ( 1 ) x 0 + θ 31 ( 1 ) x 1 + θ 32 ( 1 ) x 2 + θ 33 ( 1 ) x 3 ) h θ ( x ) = a 1 ( 3 ) = g ( θ 10 ( 1 ) a 0 ( 2 ) + θ 11 ( 1 ) a 1 ( 2 ) + θ 12 ( 1 ) a 2 ( 2 ) + θ 13 ( 1 ) a 3 ( 2 ) ) a_1^{(2)} = g(\theta_{10}^{(1)}x_0+\theta_{11}^{(1)}x_1+\theta_{12}^{(1)}x_2+\theta_{13}^{(1)}x_3) \\ a_2^{(2)} = g(\theta_{20}^{(1)}x_0+\theta_{21}^{(1)}x_1+\theta_{22}^{(1)}x_2+\theta_{23}^{(1)}x_3) \\ a_3^{(2)} = g(\theta_{30}^{(1)}x_0+\theta_{31}^{(1)}x_1+\theta_{32}^{(1)}x_2+\theta_{33}^{(1)}x_3) \\ h_\theta(x) = a_1^{(3)} = g(\theta_{10}^{(1)}a_0^{(2)}+\theta_{11}^{(1)}a_1^{(2)}+\theta_{12}^{(1)}a_2^{(2)}+\theta_{13}^{(1)}a_3^{(2)}) a1(2)=g(θ10(1)x0+θ11(1)x1+θ12(1)x2+θ13(1)x3)a2(2)=g(θ20(1)x0+θ21(1)x1+θ22(1)x2+θ23(1)x3)a3(2)=g(θ30(1)x0+θ31(1)x1+θ32(1)x2+θ33(1)x3)hθ(x)=a1(3)=g(θ10(1)a0(2)+θ11(1)a1(2)+θ12(1)a2(2)+θ13(1)a3(2))
即每一条线都代表一个权重/参数,而单纯看另一边的即隐藏层和输出层,其实就是一个线性回归。

我们作如下假设
$$
x = [x_0,x_1,x_2,x_3] \
\theta = [\theta_0,\theta_1,\theta_2,\theta_3]\

z = \theta^Tx\
a = [a_1,a_2,a_3]\
a_1 = g(z)
$$
我们称这个过程为前向传播,第一层的数据传给第二层,第二层的数据传给第三层。


简化模型:

​ g(x)其实就是sigmod函数,在之前的文章已经阐述过了,这里不赘述。

​ 我们假设一个二维模型,有
h θ ( x ) = g ( θ 0 + θ 1 x 1 + θ 2 x 2 ) h_\theta(x) = g(\theta_0+\theta_1x_1+\theta_2x_2) hθ(x)=g(θ0+θ1x1+θ2x2)
我们列出如下真值表:

image-20210711164923837

代入里面就可以得到
g ( θ 0 )   g ( θ 0 + θ 2 )   g ( θ 0 + θ 1 )   g ( θ 0 + θ 1 + θ 2 ) g(\theta_0)\ g(\theta_0+\theta_2)\ g(\theta_0+\theta_1)\ g(\theta_0+\theta_1+\theta_2) g(θ0) g(θ0+θ2) g(θ0+θ1) g(θ0+θ1+θ2)
我们可以根据具体的数据来得到一个逻辑表达式。


多分类:

​ 例如我们识别四种不同的事物(图片),我们可以想到输入是三种情况(rgb),输出呢就是四种情况(对应着四种事物)

​ 在二分类中,我们把输出y定义为{0,1}或者{-1,+1},那在这里我们能否把它构建成{1,2,3,4}呢?

​ 显然是不行的,根据上面的知识,输出一般为0,1,那么同为3的情况就有很多种,所以我们使用二进制编码来表示,即

[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]这四种情况。我们可以得到它的基本神经网络:

神经网络1

我们定义一个神经网络的式子为 h θ ( x ) ∈ R k h_\theta(x) \in R^k hθ(x)Rk 其中k = 4(即我们要判断的分类总数)

我们可以知道
y = [ [ 1 , 0 , 0 , 0 ] , [ 0 , 1 , 0 , 0 ] , [ 0 , 0 , 1 , 0 ] , [ 0 , 0 , 0 , 1 ] ] y = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]] y=[[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
由此开始构造我们的损失函数:

根据上面的知识,我们有 h θ ( x ) = g ( θ 10 ( 3 ) a 1 ( 3 ) + θ 20 ( 3 ) a 2 ( 3 ) + θ 30 ( 3 ) a 3 ( 3 ) + θ 40 ( 3 ) a 4 ( 3 ) + θ 50 ( 3 ) a 5 ( 3 ) ) h_\theta(x) = g(\theta_{10} ^{(3)} a_1^{(3)} + \theta_{20} ^{(3)} a_2^{(3)}+\theta_{30} ^{(3)} a_3^{(3)}+\theta_{40} ^{(3)} a_4^{(3)}+\theta_{50} ^{(3)} a_5^{(3)} ) hθ(x)=g(θ10(3)a1(3)+θ20(3)a2(3)+θ30(3)a3(3)+θ40(3)a4(3)+θ50(3)a5(3))

根据线性回归的结论,有
C o s t ( h θ ( x ) , y ) = y i l o g ( h θ ( x i ) ) − ( 1 − y i ) ( 1 − l o g ( 1 − h θ ( x i ) ) ) J ( θ ) = − 1 m ∑ C o s t ( h θ ( x ) , y ) Cost(h_\theta(x),y) = y^{i}log(h_\theta(x^{i})) - (1-y^{i})(1-log(1-h_\theta(x^{i}))) \\ J(\theta) = -\frac{1}{m}\sum Cost(h_\theta(x), y) Cost(hθ(x),y)=yilog(hθ(xi))(1yi)(1log(1hθ(xi)))J(θ)=m1Cost(hθ(x),y)
同时每一条线都是一条单独的神经元。故最终应对每个神经元进行叠加
J ( θ ) = − 1 m ∑ k = 1 K ∑ i = 1 m y k i l o g ( h θ ( x i ) ) k − ( 1 − y k i ) ( 1 − l o g ( 1 − h θ ( x i ) ) k J(\theta) = -\frac{1}{m} \sum^{K}_{k=1} \sum^m_{i=1} y_k^{i}log(h_\theta(x^{i}))_k - (1-y_k^{i})(1-log(1-h_\theta(x^{i}))_k J(θ)=m1k=1Ki=1mykilog(hθ(xi))k(1yki)(1log(1hθ(xi))k
此外,我们引入正则项防止过拟合,即: λ ∑ θ j 2 \lambda \sum \theta_j^2 λθj2

故,在这里我们有:
r e g u r i z a t i o n = λ 2 ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s l + 1 ( θ j i ( l ) ) 2 J ( θ ) = − 1 m ∑ k = 1 K ∑ i = 1 m y k i l o g ( h θ ( x i ) ) k − ( 1 − y k i ) ( 1 − l o g ( 1 − h θ ( x i ) ) k + λ 2 ∑ l = 1 L − 1 ∑ i = 1 s l ∑ j = 1 s l + 1 ( θ j i ( l ) ) 2 regurization = \frac{\lambda}{2} \sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{l+1}}(\theta_{ji}^{(l)})^2 \\ J(\theta) = -\frac{1}{m} \sum^{K}_{k=1} \sum^m_{i=1} y_k^{i}log(h_\theta(x^{i}))_k - (1-y_k^{i})(1-log(1-h_\theta(x^{i}))_k+\frac{\lambda}{2} \sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_{l+1}}(\theta_{ji}^{(l)})^2 regurization=2λl=1L1i=1slj=1sl+1(θji(l))2J(θ)=m1k=1Ki=1mykilog(hθ(xi))k(1yki)(1log(1hθ(xi))k+2λl=1L1i=1slj=1sl+1(θji(l))2
可知 θ j i ( l ) \theta_{ji}^{(l)} θji(l)指的是第l层的第j个神经元的第i个权重


反向传播算法:

​ 根据已知的 J ( θ ) J(\theta) J(θ)函数,可知通过调节 h θ ( x ) h_\theta(x) hθ(x)来控制 J ( θ ) J(\theta) J(θ),而 h θ ( x ) h_\theta(x) hθ(x)与权重矩阵 θ \theta θ与偏置向量 b b b有关,根据梯度下降的规则我们需要更新这两个量如下:
θ l = θ l − α ∂ E ∂ θ l b l = b l − α ∂ E ∂ b l \theta^l = \theta^l - \alpha \frac{\partial E}{\partial \theta^l} \\ b^l = b^l - \alpha \frac{\partial E}{\partial b^l} θl=θlαθlEbl=blαblE

​ 依旧是以上图的神经网络为例,我们已知label 即y,和 a ( 4 ) a^{(4)} a(4),那么我们可以定义:
δ i ( l ) = ∂ E ∂ z i l E = 1 2 ∣ ∣ y − a ∣ ∣ = 1 2 ∑ ( y k − a k ) 2 ∂ E ∂ θ 11 4 = − ( y 1 − a 1 ( 4 ) ) ∂ a 1 ( 4 ) ∂ θ 11 4 = − ( y 1 − a 1 ( 4 ) ) g ′ ( z 1 ( 4 ) ) a 1 ( 3 ) = ∂ E ∂ z 1 3 ∂ z 1 3 ∂ θ 11 4 = δ i ( 4 ) a 1 ( 3 ) \delta_i^{(l)} = \frac{\partial E}{\partial z_i^l} \\ E = \frac{1}{2} ||y-a|| = \frac{1}{2}\sum(y_k-a_k)^2 \\ \frac{\partial E}{\partial \theta^4_{11}} = -(y_1 - a_1^{(4)}) \frac{\partial a_1^{(4)}}{\partial \theta^4_{11}} = -(y_1 - a_1^{(4)})g'(z_1^{(4)})a_1^{(3)} = \frac{\partial E}{\partial z_1^3} \frac{\partial z_1^3}{\partial \theta_{11}^4} = \delta_i^{(4)}a_1^{(3)} δi(l)=zilEE=21ya=21(ykak)2θ114E=(y1a1(4))θ114a1(4)=(y1a1(4))g(z1(4))a1(3)=z13Eθ114z13=δi(4)a1(3)
下面我们推导第l+1层的公式
δ i ( l ) = ∂ E ∂ z i ( l ) = ∑ j = 1 n l + 1 ∂ E ∂ z i ( l + 1 ) ∂ z i ( l + 1 ) ∂ z i ( l ) = ∑ j = 1 n l + 1 δ ( l + 1 ) ∂ z i ( l + 1 ) ∂ z i ( l ) = ∑ j = 1 n l + 1 δ ( l + 1 ) θ j i l + 1 g ′ ( z i l ) \delta^{(l)}_i = \frac{\partial E}{\partial z_i^{(l)}} =\sum_{j=1}^{n_{l+1}} \frac{\partial E}{\partial z_i^{(l+1)}} \frac{\partial z_i^{(l+1)}}{\partial z_i^{(l)}} =\sum_{j=1}^{n_{l+1}} \delta^{(l+1)}\frac{\partial z_i^{(l+1)}}{\partial z_i^{(l)}} = \sum_{j=1}^{n_{l+1}} \delta^{(l+1)} \theta_{ji}^{l+1} g'(z_i^l) δi(l)=zi(l)E=j=1nl+1zi(l+1)Ezi(l)zi(l+1)=j=1nl+1δ(l+1)zi(l)zi(l+1)=j=1nl+1δ(l+1)θjil+1g(zil)
对于偏置参数b,我们有:
∂ E ∂ b i ( l ) = ∂ E ∂ z i ( l ) ∂ z i ( l ) ∂ b i ( l ) = δ ( l ) \frac{\partial E}{\partial b_i^{(l)}} = \frac{\partial E}{\partial z_i^{(l)}} \frac{\partial z_i^{(l)}}{\partial b_i^{(l)}} = \delta^{(l)} bi(l)E=zi(l)Ebi(l)zi(l)=δ(l)
总结有:
δ ( L ) = − ( y − a ( L ) ) g ′ ( z ( L ) ) δ ( l ) = δ ( l + 1 ) θ ( l + 1 ) . ∗ g ′ ( z ( l ) ) \delta^{(L)} = -(y-a^{(L)})g'(z^{(L)}) \\ \delta^{(l)} = \delta^{(l+1)}\theta^{(l+1)}.*g'(z^{(l)}) δ(L)=(ya(L))g(z(L))δ(l)=δ(l+1)θ(l+1).g(z(l))
其实 δ \delta δ根本不是误差,而是这个参数影响整个式子的程度。


利用神经网络实现多分类
加载数据:
input_layer_size = 400
hidden_layer_size = 45
num_labels = 10

# part 1 Loading and Visualizing data

data = sio.loadmat('ex3data1.mat')
X = data["X"]
y = data["y"]
m = X.shape[0]
随机抽取数据可视化:
def displayData(x):
    example_width = int(np.round(np.sqrt(np.size(x, 1))))
    m, n = x.shape
    example_height = int(n / example_width)
    display_rows = int(np.floor(np.sqrt(m)))
    display_cols = int(np.ceil(m / display_rows))
    pad = 1
    display_array = - np.ones((pad + display_rows * (example_height + pad), pad + display_cols * (example_width + pad)))
    curr_ex = 0
    for j in range(display_rows):
        for i in range(display_cols):
            if curr_ex > m:
                break
            max_vals = np.max(np.abs(X[curr_ex, :]))
            display_array[pad + j * (example_height + pad):pad + j * (example_height + pad) + example_height,
            pad + i * (example_width + pad):pad + i * (example_width + pad) + example_width] \
                = x[curr_ex, :].reshape((example_height, example_width)) / max_vals
            curr_ex += 1
        if curr_ex > m:
            break
    plt.figure()
    plt.imshow(display_array.T, cmap='gray', extent=[-1, 1, -1, 1])
    plt.axis('off')
    plt.show()
rand_indices = np.random.permutation(m)
sel = X[rand_indices[0:100], :]
displayData(sel)
_ = input('Press [enter] to continue')
加载参数:
print('loading Saved Neutral Network Parameters')
para = sio.loadmat('ex3weights.mat')
theta1 = para['Theta1']
theta2 = para['Theta2']
预测:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def predict(thata1, theta2, X):
    m = X.shape[0]
    num_labels = theta2.shape[0]
    p = np.zeros(m)
    X = np.c_[np.ones(m), X]
    a2 = sigmoid(X.dot(theta1.T))
    a2 = np.c_[np.ones(a2.shape[0]), a2]
    a3 = sigmoid(a2.dot(theta2.T))
    p = np.argmax(a3, axis=1)
    return p


p = predict(theta1, theta2, X)
print('Training Set Accuracy: ', np.mean(np.double(p+1 == y.flatten())) * 100)
结果:
Training Set Accuracy:  97.52
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值