<Pytorch深度学习实践>(二):梯度下降算法 (Gradient Descent)

梯度下降算法

  穷举法和观察法不可行,因为w的数量如果过大,将会大大增加时间开销且可能找到局部最优解
  梯度下降:(能够找到局部最优解,但也许找不到全局最优解(没有任何一个局部最优比他好))
  那么为什么深度学习中大多数还是使用梯度下降来寻找最优解,因为在很多学习中得到结果,深度学习最优化问题中并不存在很多局部最优解。
但存在一个点(鞍点,偏导为0)
   y = w x + b y=wx+b y=wx+b
  MSE:
c o s t ( w ) = 1 N ∑ n = 1 N ( y ^ n − y n ) 2 cost(w)=\frac{1}{N}\sum_{n=1}^{N}{(\hat{y}_n-y_n)^2} cost(w)=N1n=1N(y^nyn)2
  优化目标: w ∗ = arg min ⁡ w c o s t ( w ) w^*=\argmin_{w}{cost(w)} w=wargmincost(w)
  求导数: ∂ c o s t ∂ w \frac{\partial cost}{\partial w} wcost
  update: w = w − α ∂ c o s t ∂ w , w=w-\alpha \frac{\partial cost}{\partial w}, w=wαwcost, α 是 学 习 率 ( 步 长 ) \alpha 是学习率(步长) α
  然后就是迭代就最优的过程:
∂ c o s t ∂ w = ∂ 1 N ∑ n = 1 N ( x n ⋅ w − y n ) 2 ∂ w \frac{\partial cost}{\partial w}=\frac{\partial \frac{1}{N} \sum_{n=1}^{N}(x_n\cdot w-y_n)^2}{ \partial w} wcost=wN1n=1N(xnwyn)2
= 1 N ∑ n = 1 N ∂ ( x n ⋅ w − y n ) 2 ∂ w =\frac{1}{N}\sum_{n=1}^{N}{\frac{\partial (x_n\cdot w-y_n)^2}{\partial w}} =N1n=1Nw(xnwyn)2
= 1 N ∑ n = 1 N 2 ⋅ ( x n ⋅ w − y n ) ∂ ( x n ⋅ w − y n ) ∂ w =\frac{1}{N}\sum_{n=1}^{N}{2\cdot (x_n\cdot w-y_n)\frac{\partial (x_n\cdot w-y_n)}{\partial w}} =N1n=1N2(xnwyn)w(xnwyn)
= 1 N ∑ n = 1 N 2 ⋅ x n ⋅ ( x n ⋅ w − y n ) =\frac{1}{N}\sum_{n=1}^{N}{2\cdot x_n\cdot (x_n\cdot w-y_n)} =N1n=1N2xn(xnwyn)
Update:
w = w − α 1 N ∑ n = 1 N 2 ⋅ x n ⋅ ( x n ⋅ w − y n ) w=w-\alpha\frac{1}{N}\sum_{n=1}^{N}{2\cdot x_n\cdot (x_n\cdot w-y_n)} w=wαN1n=1N2xn(xnwyn)
  
  
  代码:

import random

import matplotlib.pyplot as plt

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = random.random() # 随机初始化初始权重


def forward(x):
    """
    定义的线性模型
    """
    return x * w


def cost(x, y):
    """
    损失函数
    """
    cost = 0
    for x_i, y_i in zip(x, y):
        y_pred = forward(x_i)
        cost += (y_pred - y_i) ** 2
    return cost / (len(x))


def gradient(x, y):
    grad = 0
    for x_i, y_i in zip(x, y):
        grad += 2 * x_i * (x_i * w - y_i)
    return grad / (len(x))


epoch_list = []
cost_list = []
for epoch in range(100):
    cost_val = cost(x_data, y_data)
    grad_val = gradient(x_data, y_data)
    w -= 0.01 * grad_val
    epoch_list.append(epoch)
    cost_list.append(cost_val)
    print("epoch:", epoch, "w:", w, "loss(MSE):", cost_val)
print("Predict:", forward(4))

plt.plot(epoch_list, cost_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()

  误差随着训练轮次的增加逐渐减少
在这里插入图片描述

  
  
  
  
  
  
  
  
  
  
  
  
  
  

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值