特征缩放:在我们面对多维特征问题的时候,我们要保证这些特征都具有相近的尺度,这将帮助梯度下降算法更快地收敛。 解决的方法是尝试将所有特征的尺度都尽量缩放到相同的区间,比如-1到+1之间或-0.5到+0.5之间或0到1之间等。一种最常见的方法是 让每个xi等于xi减少对应的均值然后除以对应的标准偏差。
学习率与迭代次数的关系:学习率过小,迭代次数增加;学习率过大,迭代次数减少,但可能成本函数J不会收敛。 学习率的设定需要通过观察损失值(即成本函数J的值)与迭代次数之间的函数曲线来决定使用哪个学习速率。
成本函数J(theta): 有两种形式,其中向量形式有利于matlab编程。
详细介绍参考英文原处。
这个程序的主要目的:
1 演示针对某一个模型,其学习率、成本函数值和迭代次数的关系
2 梯度下降法和正规方程的比较
以下是增加注释的源程序:
% Exercise 3 -- Multivariate Linear Regression
clear all; close all; clc
x = load('ex3x.dat');
y = load('ex3y.dat');
m = length(y);
% Add intercept term to x
x = [ones(m, 1), x];
% Save a copy of the unscaled features for later 用于正规方程计算
x_unscaled = x;
% Scale features and set them to zero mean 缩放后x2,x3的样本均值为零 。Matlab中下标从1开始
mu = mean(x);
sigma = std(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);
% Prepare for plotting
figure;
% plot each alpha's data points in a different style
% braces indicate a cell, not just a regular array.
plotstyle = {'b', 'r', 'g', 'k', 'b--', 'r--'};
% Gradient Descent 只测试这6种学习率
alpha = [0.01, 0.03, 0.1, 0.3, 1, 1.3];
MAX_ITR = 100; %迭代次数
% this will contain my final values of theta
% after I've found the best learning rate
theta_grad_descent = zeros(size(x(1,:)));
for i = 1:length(alpha)
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
J = zeros(MAX_ITR, 1);
for num_iterations = 1:MAX_ITR
% Calculate the J term 成本函数
J(num_iterations) = (0.5/m) .* (x * theta - y)' * (x * theta - y);
% The gradient 梯度
grad = (1/m) .* x' * ((x * theta) - y);
% Here is the actual update 更新公式
theta = theta - alpha(i) .* grad;
end
% Now plot the first 50 J terms 画成本函数J的曲线
plot(0:49, J(1:50), char(plotstyle(i)), 'LineWidth', 2)
hold on
% After some trial and error, I find alpha=1
% is the best learning rate and converges
% before the 100th iteration
%
% so I save the theta for alpha=1 as the result of
% gradient descent 观察到 alpha=1时梯度下降最快
if (alpha(i) == 1)
theta_grad_descent = theta;
end
end
legend('0.01','0.03','0.1', '0.3', '1', '1.3')
xlabel('Number of iterations')
ylabel('Cost J')
% force Matlab to display more than 4 decimal places
% formatting persists for rest of this session
format long
% Display gradient descent's result
theta_grad_descent
% Estimate the price of a 1650 sq-ft, 3 br house 点乘 只要保证两个向量元素个数相等即可
price_grad_desc = dot(theta_grad_descent, [1, (1650 - mu(2))/sigma(2),...
(3 - mu(3))/sigma(3)])
% Calculate the parameters from the normal equation 通过正规方程直接计算theta
%不需要学习率,也不需要多次迭代,只要计算一次即可
theta_normal = (x_unscaled' * x_unscaled)\x_unscaled' * y
%Estimate the house price again,也不需要特征缩放
price_normal = dot(theta_normal, [1, 1650, 3])