多变量线性回归 Multivariate Linear Regression

最新推荐文章于 2021-12-03 23:42:42 发布

woneil

最新推荐文章于 2021-12-03 23:42:42 发布

阅读量2.6k

点赞数

CC 4.0 BY-SA版权

分类专栏：深度学习

本文链接：https://blog.youkuaiyun.com/ahbbshenfeng/article/details/40215157

深度学习专栏收录该内容

14 篇文章

订阅专栏

本文探讨了特征缩放的重要性及其对梯度下降算法的影响，并对比了不同学习率下的梯度下降效果。同时，文章还介绍了正规方程法求解线性回归问题的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

英文原处

中文练习

特征缩放：在我们面对多维特征问题的时候，我们要保证这些特征都具有相近的尺度，这将帮助梯度下降算法更快地收敛。解决的方法是尝试将所有特征的尺度都尽量缩放到相同的区间，比如-1到+1之间或-0.5到+0.5之间或0到1之间等。一种最常见的方法是让每个xi等于xi减少对应的均值然后除以对应的标准偏差。

学习率与迭代次数的关系：学习率过小，迭代次数增加；学习率过大，迭代次数减少，但可能成本函数J不会收敛。学习率的设定需要通过观察损失值（即成本函数J的值）与迭代次数之间的函数曲线来决定使用哪个学习速率。

成本函数J（theta): 有两种形式，其中向量形式有利于matlab编程。

详细介绍参考英文原处。

这个程序的主要目的：

1 演示针对某一个模型，其学习率、成本函数值和迭代次数的关系

2 梯度下降法和正规方程的比较

以下是增加注释的源程序：

% Exercise 3 -- Multivariate Linear Regression

clear all; close all; clc

x = load('ex3x.dat'); 
y = load('ex3y.dat');

m = length(y);

% Add intercept term to x
x = [ones(m, 1), x];

% Save a copy of the unscaled features for later 用于正规方程计算
x_unscaled = x;

% Scale features and set them to zero mean 缩放后x2,x3的样本均值为零 。Matlab中下标从1开始
mu = mean(x);
sigma = std(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);

% Prepare for plotting
figure;
% plot each alpha's data points in a different style
% braces indicate a cell, not just a regular array.
plotstyle = {'b', 'r', 'g', 'k', 'b--', 'r--'};


% Gradient Descent  只测试这6种学习率
alpha = [0.01, 0.03, 0.1, 0.3, 1, 1.3];
MAX_ITR = 100;    %迭代次数
% this will contain my final values of theta
% after I've found the best learning rate
theta_grad_descent = zeros(size(x(1,:))); 

for i = 1:length(alpha)
    theta = zeros(size(x(1,:)))'; % initialize fitting parameters
    J = zeros(MAX_ITR, 1);
    for num_iterations = 1:MAX_ITR
        % Calculate the J term    成本函数
        J(num_iterations) = (0.5/m) .* (x * theta - y)' * (x * theta - y);
        
        % The gradient 梯度
        grad = (1/m) .* x' * ((x * theta) - y);
        
        % Here is the actual update   更新公式
        theta = theta - alpha(i) .* grad;
    end
    % Now plot the first 50 J terms    画成本函数J的曲线
    plot(0:49, J(1:50), char(plotstyle(i)), 'LineWidth', 2)
    hold on
    
    % After some trial and error, I find alpha=1
    % is the best learning rate and converges
    % before the 100th iteration
    %
    % so I save the theta for alpha=1 as the result of 
    % gradient descent   观察到 alpha=1时梯度下降最快
    if (alpha(i) == 1)
        theta_grad_descent = theta;
    end
end
legend('0.01','0.03','0.1', '0.3', '1', '1.3')
xlabel('Number of iterations')
ylabel('Cost J')

% force Matlab to display more than 4 decimal places
% formatting persists for rest of this session
format long

% Display gradient descent's result
theta_grad_descent

% Estimate the price of a 1650 sq-ft, 3 br house   点乘 只要保证两个向量元素个数相等即可
price_grad_desc = dot(theta_grad_descent, [1, (1650 - mu(2))/sigma(2),...
                    (3 - mu(3))/sigma(3)])

% Calculate the parameters from the normal equation   通过正规方程直接计算theta
%不需要学习率，也不需要多次迭代，只要计算一次即可
theta_normal = (x_unscaled' * x_unscaled)\x_unscaled' * y

%Estimate the house price again，也不需要特征缩放
price_normal = dot(theta_normal, [1, 1650, 3])