随机梯度下降法
批量:每次看所有的样本。稳定。一定是朝着下降最快的方向下降的
随机:每次只看一个样本。速度快,跳出了局部最优解。但不稳定,梯度下降的方向是不确定的
小批量梯度下降法
随着循环次数的增加,学习率eta减少:eta = 1 / i_iters
为缓减在初始时eta变化太大,eta = (1+a) /(i_iters + b){一般a=5,b=50}
scikit-learn中的随机梯度下降法
#波士顿房价数据
In [55]: from sklearn import datasets
In [56]: boston = datasets.load_boston()
In [57]: X = boston.data
In [58]: y = boston.target
In [59]: X = X[y<50.0]
In [60]: y = y[y<50.0]
#数据集划分
In [62]: from sklearn.model_selection import train_test_split
In [63]: X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state=666)
#数据归一化
In [65]: from sklearn.preprocessing import StandardScaler
In [66]: standardScaler = StandardScaler()
...: standardScaler.fit(X_train)
...: X_train.standard = standardScaler.transform(X_train)
...: X_test_standard = standardScaler.transform(X_test)
#scikit-learn
In [72]: from sklearn.linear_model import SGDRegressor
In [73]: sgd_reg = SGDRegressor()
...: %time sgd_reg.fit(X_train_standard,y_train)
...: sgd_reg.score(X_test_standard, y_test)
Out[74]: 0.8044183916164827
# n_iter默认为5.指循环这个数据集多少遍
In [75]: sgd_reg = SGDRegressor(n_iter=100)
...: %time sgd_reg.fit(X_train_standard,y_train)
...: sgd_reg.score(X_test_standard, y_test)
[75]: 0.8128815495743619
关于梯度的调试
// 使用这种方法求theta效率会慢
用两个蓝点连线的斜率近似的代替红点的斜率
对于多维的函数,也可以用这种方法:
In [86]: def dJ_debug(theta,X_b,y,epsilon=0.01):
...: res = np.empty(len(theta))
...: for i in range(len(theta)):
...: theta_1 = theta.copy()
...: theta_1[i] += epsilon
...: theta_2 = theta.copy()
...: theta_2[i] -= epsilon
...: res[i] = (J(theta_1, X_b, y) - J(theta_2, X_b, y)) / (2*epsilon)
...: return res