PYSpark

原创已于 2022-12-13 18:43:31 修改 · 559 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python

于 2022-09-19 10:45:04 首次发布

Python 专栏收录该内容

1 篇文章

订阅专栏

本文通过一个具体的实例详细介绍了如何使用随机梯度下降法(SGD)进行参数优化。通过对给定的数据集进行迭代训练，逐步调整参数θ，使得模型能够更好地拟合数据。文中还提供了完整的Python代码实现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

画图

#SGD（Stochastic gradientdescent）随机梯度下降法：每次迭代使用一组样本
#-*- coding: utf-8 -*-  
import random  
#用y = Θ1*x1 + Θ2*x2来拟合下面的输入和输出  
#input1  1   2   5   4  
#input2  4   5   1   2  
#output  19  26  19  20  
input_x = [[1,4], [2,5], [5,1], [4,2]]  #输入  
y = [19,26,19,20]       #输出  
theta = [1,1]           #θ参数初始化  
loss = 10               #loss先定义一个数，为了进入循环迭代  
step_size = 0.01        #步长  
eps =0.0001             #精度要求  
max_iters = 10000       #最大迭代次数  
error =0                #损失值  
iter_count = 0          #当前迭代次数  
   
   
while( loss > eps and iter_count < max_iters):  #迭代条件  
    loss = 0  
    #这里每次批量选取的是2组样本进行更新，另一个点是随机点+1的相邻点  
    i = random.randint(0,3)     #随机抽取一组样本  
    j = (i+1)%4                 #抽取另一组样本，j=i+1  
    pred_y0 = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]  #预测值1  
    pred_y1 = theta[0]*input_x[j][0]+theta[1]*input_x[j][1]  #预测值2  
    theta[0] = theta[0] - step_size * (1/2) * ((pred_y0 - y[i]) * input_x[i][0]+(pred_y1 - y[j]) * input_x[j][0])  #对应5式  
    theta[1] = theta[1] - step_size * (1/2) * ((pred_y0 - y[i]) * input_x[i][1]+(pred_y1 - y[j]) * input_x[j][1])  #对应5式  
    for i in range (3):  
        pred_y = theta[0]*input_x[i][0]+theta[1]*input_x[i][1]  #总预测值  
        error = (1/(2*2))*(pred_y - y[i])**2                    #损失值  
        loss = loss + error       #总损失值  
    iter_count += 1  
    print ('iters_count', iter_count)  
   
print ('theta: ',theta )  
print ('final loss: ', loss)  
print ('iters: ', iter_count)

http://hbasefly.com/2017/03/19/sparksql-basic-join/?hmxure=1utjw3

https://www.cnblogs.com/zhaoyibing/p/9051428.html

https://www.cnblogs.com/zhaoyibing/p/9628759.html