Machine Learning 03 - Logistic Regression and Regularization

本文深入探讨了机器学习中的分类问题,介绍了假设函数的逻辑斯蒂回归形式,并讨论了决策边界的概念。此外,还详细解释了成本函数及其梯度下降算法,并提供了正则化的解决方案来应对过拟合和欠拟合问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。

Week 03

3.1 Classification Problem

3.1.1 Hypothesis representation

The form for hypothesis function in classification problem is :

hθ(x)=g(θTx)=11+eθTxhθ(x)=g(θTx)=11+e−θTx

The function is called “Logistic Function” or “Sigmoid Function”.

An intuitive feeling of the logistic function : y=11+exy=11+e−x

Logistic Function

3.1.2 Decision boundary

To get discrete 0, 1 classification, we have :

hθ(x)by=1hθ(x)<by=0hθ(x)≥b→y=1hθ(x)<b→y=0

then we have

hθ(x)=g(θTx)=bθTx=ahθ(x)=g(θTx)=b⇔θTx=a

When parameter vector θθ has been selected, we get the decision boundary :

θTx=aθTx=a

For a better understanding, see the following examples:

  • Case 1 : Linear boundary

We have a dataset below, the hypothesis function is hθ(x)=g(θ0+θ1x1+θ2x2)hθ(x)=g(θ0+θ1x1+θ2x2), b=0.5b=0.5, a=0a=0, assume we get the parameters θ0=3θ0=−3, θ1=1θ1=1, and θ2=1θ2=−1.

Then 3+x1+x20−3+x1+x2≥0 is the decision boundary, which is show in the image below (pink straight line).

linear boundary

  • Case 2 : Non-linear boundary

We have a dataset below, the hypothesis function is hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x22)hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x22), b=0.5b=0.5, a=0a=0, assume we get the parameters θ0=1θ0=−1, θ1=0θ1=0, θ2=0θ2=0, θ3=1θ3=1, θ4=1θ4=1.

Then x21+x22=1x12+x22=1 is the dicision boundary, which is show in the image below (the pink circle).

Non-linear boundary

3.1.3 Cost function

For a logistic regression, we have the cost :

Cost(hθ(x),y)=yloghθ(x)(1y)log(1hθ(x))Cost(hθ(x),y)=−yloghθ(x)−(1−y)log(1−hθ(x))

For an intuitive understanding, see the following graphs (function prototype) of case y=1y=1 and y=0y=0 :

log function

And the cost function is :

J(θ)=1mi=1m[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]J(θ)=−1m∑i=1m[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]

3.1.4 Gradient Descent

  • Gradient descent for logistic regression - Algorithm 2

Repeat {

θj:=θjαθjJ(θ)=θjαi=1m(hθ(x(i))y(i))x(i)jθj:=θj−α∂∂θjJ(θ)=θj−α∑i=1m(hθ(x(i))−y(i))xj(i)

(simultaneously update all θjθj)
}

3.1.5 Some nother methods : Advanced Optimaization

  • Conjugate gradient
  • BFGS
  • L-BFGS

They are more sophisticated, faster ways to optimize θθ.

Use in the Octave/Matlab :

  • (1) Provide a function that evaluates J(θ)J(θ) and θjJ(θ)∂∂θjJ(θ) for a given input θθ.

For example :

function [jVal, Gradient] = costFunction(theta)
    jVal = [... code to compute J(theta) ...];
    gradient = [... code to compute derivative of J(theta) ...];
end
  • (2) Use optimization algorithm “fminunc()” and function “optimset()”.

For example :

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

3.2 Multiclass Classification

3.2.1 Multiclass classification problem

The classification problem with more than two catagories calls muticlass classification problem.

comparison

3.2.2 One-vs-all method

Since y{0,1,,n}y∈{0,1,⋯,n}, we can divide the problem into n+1n+1 binary classification problems.

y{0,1,,n}y∈{0,1,⋯,n}

hiθ(x)=P(y=i|x;θ)i=0,1,,nhθi(x)=P(y=i|x;θ)i=0,1,⋯,n

prediction=maxi(hiθ(x))prediction=maxi(hθi(x))

For example, when n=2n=2, we can get three binary classification problems :

one-vs-all

3.3 Problems and solutions in fitting

3.3.1 Underfitting and overfitting

Underfitting : or high bias, is the form of the hypothesis function maps poorly to the trend of the data.

It is usually caused by a function that is too simple or uses too few features.

Overfitting : or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data.

It is usually caused by a complicated function that creates a lot of unnecessary cunrves and angles unrelated to data.

For example : (left-underfitting, middle-standar output, right-overfitting)

Logistic regression

3.3.2 Solution

The solution to underfitting :

(1) Use more features.
(2) Increase the number of training times.

The solution to overfitting :

(1) Reduce the number of features

  • Manually select which features to keep.
  • Use a model selection algorithm.

(2) Regularization

  • Keep all the features, but reduce the magnitude of parameter θjθj.

3.3.3 Details of the Reglarization

Key point :

  • Regularization works well when we have a lot of sightly useful features.

  • We can add suitable form of some parameters that need to be small.

A classical form :

minθ12mi=1m(hθ(x(i))y(i))2+λj=1nθ2jminθ12m∑i=1m(hθ(x(i))−y(i))2+λ∑j=1nθj2

The λλ is the regularization parameter. With this formulation, we can smooth the output of our hypothesis function to reduce overfitting.

3.3.4 Application

Regularized Linear Regression :

(1) Gradient Descent

The cost function of mutivariate linear regression is

J(θ)=12mi=1m(hθ(x(i))y(i))2J(θ)=12m∑i=1m(hθ(x(i))−y(i))2

And the regularized cost function is

J(θ)=12mi=1m(hθ(x(i))y(i))2+λj=1nθ2jJ(θ)=12m∑i=1m(hθ(x(i))−y(i))2+λ∑j=1nθj2

So we get the regularized algorithm of gradient descent :

Repeat {
θ0:=θ0αmmi=1(hθ(x(i))y(i))x(i)0θ0:=θ0−αm∑i=1m(hθ(x(i))−y(i))x0(i)
θj:=θj(1αλm)αmmi=1(hθ(x(i))y(i))x(i)0θj:=θj(1−αλm)−αm∑i=1m(hθ(x(i))−y(i))x0(i)
j=1,2,,nj=1,2,…,n
}

(2) Normal Equation

The regularized form of normal equation is :

θ=(XTX+λL)1XTyθ=(XTX+λ⋅L)−1XTy

whereL=0111whereL=[011⋱1]

It can be prove that (XTX+λL)(XTX+λ⋅L) is invertiable.

Regularized Logistic Regression :

The cost function of logistic regression is

J(θ)=1mi=1m[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]J(θ)=−1m∑i=1m[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]

And the regularied cost function is

J(θ)=1mi=1m[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]+λ2mj=1nθ2jJ(θ)=−1m∑i=1m[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]+λ2m∑j=1nθj2

So we get the regularized algorithm of gradient descent :

Repeat {
θ0:=θ0αmmi=1(hθ(x(i))y(i))x(i)0θ0:=θ0−αm∑i=1m(hθ(x(i))−y(i))x0(i)
θj:=θj(1αλm)αmmi=1(hθ(x(i))y(i))x(i)0θj:=θj(1−αλm)−αm∑i=1m(hθ(x(i))−y(i))x0(i)
j=1,2,,nj=1,2,…,n
}

Similarly, when using the advanced optimization algorithm, calculate the regularized cost function J(θ)J(θ) and θjJ(θ)∂∂θjJ(θ).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值