Machine Learning - Gradient Descent in Practice

最新推荐文章于 2024-07-09 17:16:49 发布

iracer

最新推荐文章于 2024-07-09 17:16:49 发布

阅读量1.8k

点赞数

分类专栏： Mooc Machine Learning Bin博的机器视觉工作间

本文链接：https://blog.youkuaiyun.com/iracer/article/details/50670674

版权

Bin博的机器视觉工作间同时被 3 个专栏收录

71 篇文章

订阅专栏

Mooc

51 篇文章

订阅专栏

Machine Learning

19 篇文章

订阅专栏

This article contains some skills in the implementation of Gradient Descent, including feature scaling, mean normalization and choosing learning rate.

Gradient descent in practice

1. Feature Scaling

Feature Scaling

Idea: Make sure features are on a similar scale, so the gradient descent will converge much faster.

E.g. x₁ = size (0-2000 feet²)

x₂ = number of bedrooms (1-5)

Get every feature into approximately a -1≤ x_i≤1 range.

These ranges are ok:

These ranges are not good

Mean normalization

Replace x_i with x_i -μ_i to make features have approximately zero mean (Do not apply tox₀ =1). Where μ_iis the mean value of feature i.

E.g.

2. Learning rate

Gradient descent

“Debugging”: How to make sure gradient descent is working correctly.
How to choose learning rate α .

Making sure gradient descent is working correctly.

J(θ) should decrease every iteration.

Example automatic convergence test:

Declare convergence if J(θ) decreases by less than 10^-3in one iteration.

Gradient descent not working: Use smaller α.

For sufficiently small α , J(θ) should decrease on every iteration.
But if α is too small, gradient descent can be slow to converge.

Summary:

If α is too small: slow convergence.
If α is too large: J(θ) may not decrease on every iteration; may not converge.

To choose α, try: ...,0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...

3. Matlab Code of Gradient descent with Multiple Variables

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
//% X is the training data (design matrix)
//% y is the label of training data (in vector)
//% theta is the vector of parameters
//% alpha is the learning rate
//% num_iters is the number of iterations

//% Initialize some values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

    for iter = 1:num_iters
 
        //% Perform a single gradient step on the parameter vector  theta. 
 
        for i = 1:size(X,2)
            theta_tmp(i) = theta(i) - alpha*(sum((X*theta-y).*X(:,i)))/m;
        end
        theta = theta_tmp';
 
         //% Save the cost J in every iteration    
        J_history(iter) = computeCostMulti(X, y, theta);  //% compute J(theta)
 
    end 
end

function J = computeCostMulti(X, y, theta)
//% COMPUTECOSTMULTI Compute cost for linear regression with multiple variables

//% Initialize some useful values
m = length(y); % number of training examples
J = 0;

//% Compute the cost of a particular choice of theta
J = 0.5*(X*theta-y)'*(X*theta-y)/m;
  
end