Machine Learning - Gradient Descent in Practice

This article contains some skills in the implementation of Gradient Descent, including feature scaling, mean normalization and choosing learning rate.


Gradient descent in practice


1. Feature Scaling


Feature Scaling

Idea: Make sure features are on a similar scale, so the gradient descent will converge much faster.

  E.g.     x1 = size (0-2000 feet2)

              x2 = number of bedrooms (1-5) 



Get every feature into approximately a -1≤ x≤1 range.

These ranges are ok:


These ranges are not good

Mean normalization 

Replace xi  with xi -μi to make features have approximately zero mean (Do not apply tox0 =1). Where μiis the mean value of feature i.

E.g.




2. Learning rate


Gradient descent 

  • “Debugging”: How to make sure gradient descent is working correctly.
  • How to choose learning rate α

Making sure gradient descent is working correctly.

  • J(θ) should decrease every iteration.


Example automatic convergence test:

Declare convergence if J(θ) decreases by less than 10-3 in one iteration.

  • Gradient descent not working: Use smaller α.


  • For sufficiently small α , J(θ) should decrease on every iteration.
  • But if α is too small, gradient descent can be slow to converge.

Summary: 

  • If α is too small: slow convergence.
  • If α is too large: J(θ) may not decrease on every iteration; may not converge.

To choose α, try:  ...,0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ...


3. Matlab Code of Gradient descent with Multiple Variables

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
//% X is the training data (design matrix)
//% y is the label of training data (in vector)
//% theta is the vector of parameters
//% alpha is the learning rate
//% num_iters is the number of iterations

//% Initialize some values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

    for iter = 1:num_iters
 
        //% Perform a single gradient step on the parameter vector  theta. 
 
        for i = 1:size(X,2)
            theta_tmp(i) = theta(i) - alpha*(sum((X*theta-y).*X(:,i)))/m;
        end
        theta = theta_tmp';
 
         //% Save the cost J in every iteration    
        J_history(iter) = computeCostMulti(X, y, theta);  //% compute J(theta)
 
    end 
end


function J = computeCostMulti(X, y, theta)
//% COMPUTECOSTMULTI Compute cost for linear regression with multiple variables

//% Initialize some useful values
m = length(y); % number of training examples
J = 0;

//% Compute the cost of a particular choice of theta
J = 0.5*(X*theta-y)'*(X*theta-y)/m;
  
end


### LGD Optimizer in Machine Learning In the context of machine learning algorithms, an LGD (Line Gradient Descent) optimizer is not a standard term within common literature; however, assuming this refers to variations or specific implementations related to gradient descent methods used for optimizing loss functions during model training[^1]. The typical approach involves adjusting parameters iteratively based on gradients computed from input data. For implementing such optimization techniques effectively: #### Implementation Example Using Python with TensorFlow/Keras A widely accepted method employs libraries like TensorFlow which provide built-in optimizers including those derived from gradient descent principles. Below demonstrates how one might implement a custom optimizer resembling line search-based adjustments over traditional stochastic gradient descent (SGD). ```python import tensorflow as tf class CustomLGD(tf.keras.optimizers.Optimizer): def __init__(self, learning_rate=0.01, name="CustomLGD", **kwargs): super().__init__(name=name, **kwargs) self.learning_rate = learning_rate def get_updates(self, params, constraints, loss): updates = [] grads = self.get_gradients(loss, params) for p, g in zip(params, grads): new_p = p - self.learning_rate * g if callable(constraints[p]): new_p = constraints[p](new_p) updates.append(K.update(p, new_p)) return updates ``` This code snippet defines a simple custom optimizer class that inherits from `tf.keras.optimizers.Optimizer`. It overrides necessary methods to apply parameter updates according to calculated gradients scaled by a specified learning rate. #### Usage Scenario When applying this type of optimizer in practice, consider scenarios where fine-tuning models requires more sophisticated control over update rules compared to off-the-shelf solutions provided directly through popular deep learning frameworks.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值