学习笔记 - GreedyAI - DeepLearningCV - Lesson1 Introduction

本文深入讲解逻辑回归原理,包括逻辑函数、损失函数及其梯度、Hessian矩阵分析,通过实例展示参数意义,并介绍梯度下降法及应用。适用于理解二元分类问题中逻辑回归的数学基础。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

第3章 逻辑回归

任务学习12 二元分类问题

任务学习13 逻辑函数

f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+ex1

11+e∞=11+∞=0\frac{1}{1 + e^{\infty}} = \frac{1}{1 + \infty} = 01+e1=1+1=0

11+e−∞=11−∞=1\frac{1}{1 + e^{- \infty}} = \frac{1}{1 - \infty} = 11+e1=11=1

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp( - x ))

x = np.linspace(-6, 6, 100)
y = sigmoid(x)
mark = 0.5 * np.ones(x.shape)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.plot(x, mark, ":")
ax.set_xlabel("$x$")
ax.set_ylabel("$f(x)$")
ax.grid()
plt.show()

在这里插入图片描述

任务学习14 指数与对数 、逻辑回归

  • 指数与对数
def exp(x):
    return np.exp(x)

def ln(x):
    return np.log(x)

def lin(x):
    return x

x = np.linspace(-4, 4, 100)
y_exp = exp(x)
y_ln = ln(x[np.nonzero(x > 0)])
y_lin = lin(x)

fig = plt.figure(figsize = (5, 5))
ax = fig.add_subplot(111)
ax.plot(x, y_exp, label="$y = e^{x}$")
ax.plot(x[np.nonzero(x > 0)], y_ln, label="$y = ln(x)$")
ax.plot(x, y_lin, label="$y = x$")
ax.set_xlabel("$x$")
ax.set_ylabel("$f(x)$")
ax.set_ylim(-4, 4)
ax.grid()
ax.legend()
plt.show()

在这里插入图片描述

  • 逻辑回归

解决二元(0、1)分类问题

P(y=1∣x;θ)=f(x;θ)=11+e−θTxP(y = 1 | \mathbf{x}; \mathbf{\theta}) = f(\mathbf{x}; \mathbf{\theta}) = \frac{1}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}P(y=1x;θ)=f(x;θ)=1+eθTx1

θTx=θ0+θ1x1+θ2x2+⋯\mathbf{\theta}^{\mathrm{T}} \mathbf{x} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdotsθTx=θ0+θ1x1+θ2x2+

θ=[θ0,θ1,θ2,⋯ ]\mathbf{\theta} = \left[ \theta_0, \theta_1, \theta_2, \cdots \right]θ=[θ0,θ1,θ2,]

x=[1,x1,x2,⋯ ]\mathbf{x} = \left[ 1, x_1, x_2, \cdots \right]x=[1,x1,x2,]

P(y=1∣x)>0.5P(y = 1 | \mathbf{x}) > 0.5P(y=1x)>0.5,推理为1;否则推理为0。

  • 逻辑回归知识点

类别1的概率:P=11+e−θTxP = \frac{1}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}P=1+eθTx1

类别0的概率:1−P=e−θTx1+e−θTx=11+eθTx1 - P = \frac{e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}} = \frac{1}{1 + e^{\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}1P=1+eθTxeθTx=1+eθTx1

类别1与0概率的比值:P1−P=eθTx\frac{P}{1 - P} = e^{\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}1PP=eθTx

类别1与0概率比值的自然对数:ln⁡P1−P=θTx\ln \frac{P}{1 - P} = \mathbf{\theta}^{\mathrm{T}} \mathbf{x}ln1PP=θTx

任务学习15 逻辑回归示例

年龄(x1x_1x1年收入(x2x_2x2)(万元)是否买车(1:是;2:否)
2030
2371
31101
42131
5070
6050
288
from sklearn import linear_model

X = [[20, 3],
     [23, 7],
     [31, 10],
     [42, 13],
     [50, 7],
     [60, 5]]

y = [0,
     1,
     1,
     1,
     0,
     0]

lr = linear_model.LogisticRegression()
lr.fit(X, y)

testX = [[28, 8]]

label = lr.predict(testX)
print("predicted label = {}".format(label))

prob = lr.predict_proba(testX)
print("probability = {}".format(prob))

print("theta_0 = {0[0]}, theta_1 = {1[0][0]}, theta_0 = {1[0][1]}".format(lr.intercept_, lr.coef_))
predicted label = [1]
probability = [[0.14694811 0.85305189]]
theta_0 = -0.04131837596993478, theta_1 = -0.1973000136829152, theta_0 = 0.915557452347983

任务学习16 损失函数

类别1概率:

P(y=1∣x;θ)=f(x;θ)=11+e−θTxP(y = 1 | \mathbf{x}; \mathbf{\theta}) = f(\mathbf{x}; \mathbf{\theta}) = \frac{1}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}P(y=1x;θ)=f(x;θ)=1+eθTx1

损失函数:

J(θ)=−∑i=1N[y(i)ln⁡P(Y=1∣X=x(i);θ)+(1−y(i))ln⁡(1−P(Y=1∣X=x(i);θ))]J(\mathbf{\theta}) = - \sum_{i=1}^{N} \left[ y^{(i)} \ln P(Y = 1 | \mathbf{X} = \mathbf{x}^{(i)}; \theta) + \left( 1 - y^{(i)} \right) \ln \left( 1 - P(Y = 1 | \mathbf{X} = \mathbf{x}^{(i)}; \theta) \right) \right]J(θ)=i=1N[y(i)lnP(Y=1X=x(i);θ)+(1y(i))ln(1P(Y=1X=x(i);θ))]

损失函数梯度:

∇θJ(θ)=∑i=1N(P(Y=1∣X=x(i);θ)−y(i))x(i)=∑i=1Nx(i)(f(x(i);θ)−y(i))\begin{aligned} \nabla_{\mathbf{\theta}} J(\mathbf{\theta}) = & \sum_{i=1}^{N} \left( P(Y = 1 | \mathbf{X} = \mathbf{x}^{(i)}; \theta) - y^{(i)} \right) \mathbf{x}^{(i)} \\ = & \sum_{i=1}^{N} \mathbf{x}^{(i)} \left( f(\mathbf{x}^{(i)}; \mathbf{\theta}) - y^{(i)} \right) \\ \end{aligned}θJ(θ)==i=1N(P(Y=1X=x(i);θ)y(i))x(i)i=1Nx(i)(f(x(i);θ)y(i))

任务学习17 损失函数推演

  1. 求导

(f(x)g(x))′=f′(x)g(x)+f(x)g′(x)\left( f(x)g(x) \right) ^{\prime} = f^{\prime}(x)g(x) + f(x)g^{\prime}(x)(f(x)g(x))=f(x)g(x)+f(x)g(x)

  1. 对数

log⁡(xy)=log⁡(x)+log⁡(y)\log(xy) = \log(x) + \log(y)log(xy)=log(x)+log(y)

log⁡′(x)=1x\log^{\prime}(x) = \frac{1}{x}log(x)=x1

  1. 链式法则

z=f(y)y=g(x)↓dzdx=dzdydydx \begin{aligned} z = & f(y) \\ y = & g(x) \\ \downarrow & \\ \frac{d z}{d x} = & \frac{d z}{d y} \frac{d y}{d x} \end{aligned} z=y=dxdz=f(y)g(x)dydzdxdy

  1. sigmoid

f(x)=11+e−x↓f′(x)=(−1)e−x(−1)(1+e−x)2=e−x1+e−x11+e−x=f(x)(1−f(x)) \begin{aligned} f(x) = & \frac{1}{1 + e^{-x}} \\ \downarrow & \\ f^{\prime}(x) = & (-1) \frac{e^{-x} (-1)}{\left( 1 + e^{-x} \right)^2} \\ = & \frac{e^{-x}}{1 + e^{-x}} \frac{1}{1 + e^{-x}} \\ = & f(x) \left( 1- f(x) \right) \end{aligned} f(x)=f(x)===1+ex1(1)(1+ex)2ex(1)1+exex1+ex1f(x)(1f(x))

f(z)=11+e−zz=θx↓dfdx=f(z)(1−f(z))θ \begin{aligned} f(z) = & \frac{1}{1 + e^{-z}} \\ z = & \theta x \\ \downarrow & \\ \frac{d f}{d x} = & f(z) \left( 1- f(z) \right) \theta \end{aligned} f(z)=z=dxdf=1+ez1θxf(z)(1f(z))θ

  1. 损失函数

训练数据集{(xi,yi)}\{ \left( \mathbf{x}_i, y_i \right) \}{(xi,yi)}i∈{1,2,⋯ ,N}i \in \{1, 2, \cdots, N \}i{1,2,,N}xi∈Rm\mathbf{x}_i \in R^mxiRmyi∈{0,1}y_i \in \{ 0, 1 \}yi{0,1}

逻辑函数表示给定样本xi\mathbf{x}_ixi,分类器推理为yi=1y_i = 1yi=1的概率:

Pi=P(yi=1∣θ:xi)=f(θTxi) \begin{aligned} P_i = & P\left( y_i = 1 | \mathbf{\theta}: \mathbf{x}_i \right) \\ = & f(\mathbf{\theta}^{\mathrm{T}} \mathbf{x}_i) \end{aligned} Pi==P(yi=1θ:xi)f(θTxi)

似然函数

L(θ)=∏i∣yi=1NPi⋅∏i∣yi=0N(1−Pi) \begin{aligned} L(\mathbf{\theta}) = & \prod^{N}_{i | y_i = 1} P_i \cdot \prod^{N}_{i | y_i = 0} \left( 1 - P_i \right) \end{aligned} L(θ)=iyi=1NPiiyi=0N(1Pi)

目标是求使L(θ)L(\mathbf{\theta})L(θ)最大时的θ\thetaθ

θ=arg⁡max⁡θL(θ) \mathbf{\theta} = \arg \max_{\theta} L(\mathbf{\theta}) θ=argθmaxL(θ)

对数似然函数

l(θ)=log⁡L(θ)=log⁡[∑i∣yi=1NPi+∑i∣yi=0N(1−Pi)]=∑i∣yi=1Nlog⁡Pi+∑i∣yi=0Nlog⁡(1−Pi)=∑i=1N[yilog⁡Pi+(1−yi)log⁡(1−Pi)] \begin{aligned} l(\theta) = \log L(\mathbf{\theta}) = & \log \left[ \sum^{N}_{i | y_i = 1} P_i + \sum^{N}_{i | y_i = 0} \left( 1 - P_i \right) \right ]\\ = & \sum^{N}_{i | y_i = 1} \log P_i + \sum^{N}_{i | y_i = 0} \log \left( 1 - P_i \right) \\ = & \sum^{N}_{i = 1} \left[ y_i \log P_i + \left( 1 - y_i \right) \log \left( 1 - P_i \right) \right] \\ \end{aligned} l(θ)=logL(θ)===logiyi=1NPi+iyi=0N(1Pi)iyi=1NlogPi+iyi=0Nlog(1Pi)i=1N[yilogPi+(1yi)log(1Pi)]

dl(θ)dθ=∑i=1N[yidlog⁡Pidθ+(1−yi)dlog⁡(1−Pi)dθ]=∑i=1N[yiPi(1−Pi)Pixi+(1−yi)(−1)Pi(1−Pi)1−Pixi]=∑i=1N[yi(1−Pi)xi−(1−yi)Pixi]=∑i=1N(yi−Pi)xi \begin{aligned} \frac{d l(\theta)}{d \theta} = & \sum^{N}_{i = 1} \left[ y_i \frac{d \log P_i}{d \theta} + \left( 1 - y_i \right) \frac{d \log \left( 1 - P_i \right)}{d \theta} \right] \\ = & \sum^{N}_{i = 1} \left[ y_i \frac{P_i \left( 1 - P_i \right)}{P_i} \mathbf{x}_i + \left( 1 - y_i \right) \frac{(- 1) P_i \left( 1 - P_i \right)}{1 - P_i} \mathbf{x}_i \right] \\ = & \sum^{N}_{i = 1} \left[ y_i \left( 1 - P_i \right) \mathbf{x}_i - \left( 1 - y_i \right) P_i \mathbf{x}_i \right] \\ = & \sum^{N}_{i = 1} \left( y_i - P_i \right) \mathbf{x}_i \end{aligned} dθdl(θ)====i=1N[yidθdlogPi+(1yi)dθdlog(1Pi)]i=1N[yiPiPi(1Pi)xi+(1yi)1Pi(1)Pi(1Pi)xi]i=1N[yi(1Pi)xi(1yi)Pixi]i=1N(yiPi)xi

l(θ)=log⁡L(θ)l(\theta) = \log L (\theta)l(θ)=logL(θ)是求L(θ)L (\theta)L(θ)的最大期望,定义损失函数为:

loss(θ)=−l(θ) loss(\theta) = - l(\theta) loss(θ)=l(θ)

则:

dloss(θ)dθ=∑i=1N(Pi−yi)xi \frac{d loss(\theta)}{d \theta} = \sum^{N}_{i = 1} \left( P_i - y_i \right) \mathbf{x}_i dθdloss(θ)=i=1N(Piyi)xi

任务学习18 梯度下降法

f(x;θ)=11+e−θTxf(\mathbf{x}; \mathbf{\theta}) = \frac{1}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}f(x;θ)=1+eθTx1

θ=θ−α∇θJ(θ)=θ−α∑i=1Nx(i)(f(x(i);θ)−y(i)) \mathbf{\theta} = \mathbf{\theta} - \alpha \nabla_{\mathbf{\theta}} J(\mathbf{\theta}) = \mathbf{\theta} - \alpha \sum_{i=1}^{N} \mathbf{x}^{(i)} \left( f(\mathbf{x}^{(i)}; \mathbf{\theta}) - y^{(i)} \right)θ=θαθJ(θ)=θαi=1Nx(i)(f(x(i);θ)y(i))

  • 系数的意义

概率比值odds=P1−P=eθTxodds = \frac{P}{1 - P} = e^{\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}odds=1PP=eθTx

系数θj\theta_jθj意味着:假设原始odds=λ1odds = \lambda_1odds=λ1,若对应的特征xjx_jxj增加1,假设新的odds=λ2odds = \lambda_2odds=λ2,则λ1λ2≡eθj\frac{\lambda_1}{\lambda_2} \equiv e^{\theta_j}λ2λ1eθj

theta_0 = lr.intercept_
theta_1 = lr.coef_[0][0]
theta_2 = lr.coef_[0][1]

print("theta_0 = {0[0]}, theta_1 = {1}, theta_2 = {2}".format(theta_0, theta_1, theta_2))

testX = [[28, 8]]
ratio = prob[0][1] / prob[0][0]

testX = [[28, 9]]
prob_new = lr.predict_proba(testX)
ratio_new = prob_new[0][1] / prob_new[0][0]

ratio_of_ratio = ratio_new / ratio
print("ratio of ratio = {0}".format(ratio_of_ratio))

import math
theta2_e = math.exp(theta_2)
print("theta2 e = {}".format(theta2_e))
theta_0 = -0.04131837596993478, theta_1 = -0.1973000136829152, theta_2 = 0.915557452347983
ratio of ratio = 2.4981674731438943
theta2 e = 2.4981674731438948

θ2=0.92\theta_2 = 0.92θ2=0.92意味着,如果年收入增加1万,一个人买车和不买车的概率的比值与之前的比值相比较,增加了e0.92=2.5e^{0.92}=2.5e0.92=2.5倍。

θ1=−0.20\theta_1 = -0.20θ1=0.20意味着,如果年龄增加1岁,一个人买车和不买车的概率的比值与之前的比值相比较,降低了e−0.20=0.82e^{-0.20}=0.82e0.20=0.82倍。

任务学习19 应用

import pandas as pd
from sklearn import linear_model
from sklearn.feature_extraction.text import TfidfVectorizer

df = pd.read_csv("./data/SMSSpamCollection.csv", delimiter=',', header=None)
y, X_train = df[0], df[1]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(X_train)

lr = linear_model.LogisticRegression()
lr.fit(X, y)

testX = vectorizer.transform(["URGENT! Your mobile No. 1234 was awarded a Prize.",
                              "Hey honey, what's up?"])

predictions = lr.predict(testX)
print(predictions)

['spam' 'ham']

PS:损失函数J(θ)J(\mathbf{\theta})J(θ)θ\thetaθ的Hessian矩阵:

  • 损失函数:

J(θ)=−∑i=1N[y(i)ln⁡f(x;θ)+(1−y(i))ln⁡(1−f(x;θ))]J(\mathbf{\theta}) = - \sum_{i=1}^{N} \left[ y^{(i)} \ln f(\mathbf{x}; \mathbf{\theta}) + \left( 1 - y^{(i)} \right) \ln \left( 1 - f(\mathbf{x}; \mathbf{\theta}) \right) \right]J(θ)=i=1N[y(i)lnf(x;θ)+(1y(i))ln(1f(x;θ))]

其中,
f(x;θ)=11+e−θTxf(\mathbf{x}; \mathbf{\theta}) = \frac{1}{1 + e^{-\mathbf{\theta}^{\mathrm{T}} \mathbf{x}}}f(x;θ)=1+eθTx1
x=[1,x1,x2,⋯ ,xn]T\mathbf{x} = \left[ 1, x_1, x_2, \cdots, x_n \right]^{\mathrm{T}}x=[1,x1,x2,,xn]T
θ=[θ0,θ1,θ2,⋯ ,θn]T\mathbf{\theta} = \left[ \theta_0, \theta_1, \theta_2, \cdots, \theta_n \right]^{\mathrm{T}}θ=[θ0,θ1,θ2,,θn]T

其中,x(i)\mathbf{x}^{(i)}x(i)为表示第iii条样本的列向量。

  • 损失函数J(θ)J(\mathbf{\theta})J(θ)θ\mathbf{\theta}θ的梯度:

∇θJ(θ)=∑i=1Nx(i)(f(x(i);θ)−y(i))\begin{aligned} \nabla_{\mathbf{\theta}} J(\mathbf{\theta}) = & \sum_{i=1}^{N} \mathbf{x}^{(i)} \left( f(\mathbf{x}^{(i)}; \mathbf{\theta}) - y^{(i)} \right) \\ \end{aligned}θJ(θ)=i=1Nx(i)(f(x(i);θ)y(i))

  • 损失函数J(θ)J(\mathbf{\theta})J(θ)θ\thetaθ的Hessian矩阵:

易知,J(θ)J(\mathbf{\theta})J(θ)θp\theta_pθp的一阶偏导数为:

∂J(θ)∂θp=∑i=1Nxp(i)(f(x(i);θ)−y(i))\begin{aligned} \frac{\partial J(\mathbf{\theta})}{\partial \theta_p} = & \sum_{i=1}^{N} x^{(i)}_p \left( f(\mathbf{x}^{(i)}; \mathbf{\theta}) - y^{(i)} \right) \\ \end{aligned}θpJ(θ)=i=1Nxp(i)(f(x(i);θ)y(i))

J(θ)J(\mathbf{\theta})J(θ)θp\theta_pθpθq\theta_qθq的二阶偏导数为:

∂2J(θ)∂θp∂θq=∑i=1Nxp(i)∂f(x(i);θ)∂θq=∑i=1Nxp(i)f(x(i);θ)(1−f(x(i);θ))xq(i)=∑i=1Nf(x(i);θ)(1−f(x(i);θ))xp(i)xq(i)\begin{aligned} \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_p \partial \theta_q} = & \sum_{i=1}^{N} x^{(i)}_p \frac{\partial f(\mathbf{x}^{(i)}; \mathbf{\theta})}{\partial \theta_q} \\ = & \sum_{i=1}^{N} x^{(i)}_p f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) x^{(i)}_q \\ = & \sum_{i=1}^{N} f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) x^{(i)}_p x^{(i)}_q \\ \end{aligned}θpθq2J(θ)===i=1Nxp(i)θqf(x(i);θ)i=1Nxp(i)f(x(i);θ)(1f(x(i);θ))xq(i)i=1Nf(x(i);θ)(1f(x(i);θ))xp(i)xq(i)

注意f(x(i);θ)(1−f(x(i);θ))f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right)f(x(i);θ)(1f(x(i);θ))为标量,且大于零。

H(J(θ))=[∂2J(θ)∂θ1∂θ1∂2J(θ)∂θ1∂θ2⋯∂2J(θ)∂θ1∂θn∂2J(θ)∂θ2∂θ1∂2J(θ)∂θ2∂θ2⋯∂2J(θ)∂θ2∂θn⋮⋮⋱⋮∂2J(θ)∂θn∂θ1∂2J(θ)∂θn∂θ2⋯∂2J(θ)∂θn∂θn]=∑i=1N(f(x(i);θ)(1−f(x(i);θ))[x1(i)x1(i)x1(i)x2(i)⋯x1(i)xn(i)x2(i)x1(i)x2(i)x2(i)⋯x2(i)xn(i)⋮⋮⋱⋮xn(i)x1(i)xn(i)x2(i)⋯xn(i)xn(i)])=∑i=1Nf(x(i);θ)(1−f(x(i);θ))x(i)(x(i))T\begin{aligned} H \left(J(\mathbf{\theta}) \right) = & \begin{bmatrix} \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_1 \partial \theta_1} & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_1 \partial \theta_2} & \cdots & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_1 \partial \theta_n} \\ \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_2 \partial \theta_1} & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_2 \partial \theta_2} & \cdots & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_2 \partial \theta_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_n \partial \theta_1} & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_n \partial \theta_2} & \cdots & \frac{\partial^2 J(\mathbf{\theta})}{\partial \theta_n \partial \theta_n} \\ \end{bmatrix} \\ = & \sum_{i=1}^{N} \left( f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) \begin{bmatrix} x^{(i)}_1 x^{(i)}_1 & x^{(i)}_1 x^{(i)}_2 & \cdots & x^{(i)}_1 x^{(i)}_n \\ x^{(i)}_2 x^{(i)}_1 & x^{(i)}_2 x^{(i)}_2 & \cdots & x^{(i)}_2 x^{(i)}_n \\ \vdots & \vdots & \ddots & \vdots \\ x^{(i)}_n x^{(i)}_1 & x^{(i)}_n x^{(i)}_2 & \cdots & x^{(i)}_n x^{(i)}_n \\ \end{bmatrix} \right)\\ = & \sum_{i=1}^{N} f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) \mathbf{x}^{(i)} (\mathbf{x}^{(i)})^{\mathrm{T}}\\ \end{aligned}H(J(θ))===θ1θ12J(θ)θ2θ12J(θ)θnθ12J(θ)θ1θ22J(θ)θ2θ22J(θ)θnθ22J(θ)θ1θn2J(θ)θ2θn2J(θ)θnθn2J(θ)i=1Nf(x(i);θ)(1f(x(i);θ))x1(i)x1(i)x2(i)x1(i)xn(i)x1(i)x1(i)x2(i)x2(i)x2(i)xn(i)x2(i)x1(i)xn(i)x2(i)xn(i)xn(i)xn(i)i=1Nf(x(i);θ)(1f(x(i);θ))x(i)(x(i))T

  • Hessian矩阵正定性分析

H(J(θ))=∑i=1mf(x(i);θ)(1−f(x(i);θ))x(i)(x(i))T\begin{aligned} H \left(J(\mathbf{\theta}) \right) = & \sum_{i=1}^{m} f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) \mathbf{x}^{(i)} (\mathbf{x}^{(i)})^{\mathrm{T}} \\ \end{aligned}H(J(θ))=i=1mf(x(i);θ)(1f(x(i);θ))x(i)(x(i))T

(1)f(x(i);θ)(1−f(x(i);θ))>0f(\mathbf{x}^{(i)}; \mathbf{\theta}) \left( 1 - f(\mathbf{x}^{(i)}; \mathbf{\theta}) \right) \gt 0f(x(i);θ)(1f(x(i);θ))>0

(2)H(J(θ))H \left(J(\mathbf{\theta}) \right)H(J(θ))在形式上类似于随机过程向量的自相关矩阵

m≫0m \gg 0m0时,可得:

E[xjxk]≈1m∑i=1mxj(i)xk(i)\mathrm{E}\left[ x_j x_k \right] \approx \frac{1}{m} \sum_{i=1}^{m} x^{(i)}_j x^{(i)}_kE[xjxk]m1i=1mxj(i)xk(i)

x(i)\mathbf{x}^{(i)}x(i)的各分量xjx_jxj相互独立时,可知:

E[xjxk]{=0,if j̸=k>0,if j=k\mathrm{E}\left[ x_j x_k \right] \begin{cases} = 0, & \quad \text{if} \ j \not= k \\ \gt 0, & \quad \text{if} \ j = k \\ \end{cases}E[xjxk]{=0,>0,if j̸=kif j=k

m≫nm \gg nmn时,E[xxT]\mathrm{E}\left[ \mathbf{x} \mathbf{x}^\text{T} \right]E[xxT]为满秩对角矩阵,且对角元素均大于零,H(J(θ))H \left(J(\mathbf{\theta}) \right)H(J(θ))是正定的(positive definite);否则H(J(θ))H \left(J(\mathbf{\theta}) \right)H(J(θ))是半正定的(semi-positive definite)。

H(J(θ))H \left(J(\mathbf{\theta}) \right)H(J(θ))满足正定条件时(m≫nm \gg nmn),J(θ)J(\mathbf{\theta})J(θ)为凸优函数,有全局最优解,即批量梯度下降(batch gradient descent)能够保证J(θ)J(\mathbf{\theta})J(θ)收敛到全局最小值;当H(J(θ))H \left(J(\mathbf{\theta}) \right)H(J(θ))满足半正定条件时(m&lt;nm \lt nm<n),即小批量梯度下降(batch gradient descent)或随机梯度下降(stochastic gradient descent)可能使J(θ)J(\mathbf{\theta})J(θ)陷入局部最小值。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值