假设函数表示
y ∈ {0, 1},因变量y只有0,1两种取值,
为此改变假设函数的形式,使假设函数h θ ( x ) hθ(x) 满足0 ≤ h θ ( x ) ≤ 1 0≤hθ(x)≤1
h θ ( x ) = g ( θ T x ) z = θ T x g ( z ) = 1 1 + e − z hθ(x)=g(θTx)z=θTxg(z)=11+e−z
得到假设函数:
h θ ( x ) = 1 1 + e − θ T x hθ(x)=11+e−θTx
称为逻辑函数(Logistic Function)或者S型函数(Sigmoid Function)
对于样本x,
h θ ( x ) hθ(x) 给出输出值为1的概率,即
P ( y = 1 | x ; θ ) P(y=1|x;θ)
决策边界
为了得到离散的0和1的两个分类,我们将假设函数做以下转化
h θ ( x ) ≥ 0.5 → y = 1 h θ ( x ) < 0.5 → y = 0 hθ(x)≥0.5→y=1hθ(x)<0.5→y=0
即当假设函数值大于等于0.5时,预测y=1;小于0.5时,预测y=0.
有
h θ ( x ) = g ( θ T x ) ≥ 0.5 w h e n θ T x ≥ 0 hθ(x)=g(θTx)≥0.5whenθTx≥0
,所以
θ T x ≥ 0 ⇒ y = 1 θ T x < 0 ⇒ y = 0 θTx≥0⇒y=1θTx<0⇒y=0
决策边界就是区分预测y=1的区域和y=0的区域的曲线,它是假设函数的属性,与数据集无关。
曲线
θ T x = 0 θTx=0 即决策边界。
代价函数
逻辑回归的代价函数:
J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) ) , y ( i ) ) C o s t ( h θ ( x ) , y ) = − log ( h θ ( x ) ) C o s t ( h θ ( x ) , y ) = − log ( 1 − h θ ( x ) ) if y = 1 if y = 0 J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))Cost(hθ(x),y)=−log(hθ(x))if y = 1Cost(hθ(x),y)=−log(1−hθ(x))if y = 0
y=1, ,y=0,
C o s t ( h θ ( x ) , y ) = 0 if h θ ( x ) = y C o s t ( h θ ( x ) , y ) → ∞ if y = 0 a n d h θ ( x ) → 1 C o s t ( h θ ( x ) , y ) → ∞ if y = 1 a n d h θ ( x ) → 0 Cost(hθ(x),y)=0 if hθ(x)=yCost(hθ(x),y)→∞ if y=0andhθ(x)→1Cost(hθ(x),y)→∞ if y=1andhθ(x)→0
C o s t ( h θ ( x ) , y ) = − y log ( h θ ( x ) ) − ( 1 − y ) log ( 1 − h θ ( x ) ) Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
完整的代价函数
J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) log ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ( 1 − h θ ( x ( i ) ) ) ] J(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
向量表示
h = g ( X θ ) J ( θ ) = 1 m ⋅ ( − y T log ( h ) − ( 1 − y ) T log ( 1 − h ) ) h=g(Xθ)J(θ)=1m⋅(−yTlog(h)−(1−y)Tlog(1−h))
梯度下降
R e p e a t { θ j := θ j − α ∂ ∂ θ j J ( θ ) } Repeat{θj:=θj−α∂∂θjJ(θ)}
R e p e a t { θ j := θ j − α m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) j } Repeat{θj:=θj−αm∑i=1m(hθ(x(i))−y(i))xj(i)}
向量运算实现:θ := θ − α m X T ( g ( X θ ) − y ⃗ ) θ:=θ−αmXT(g(Xθ)−y→)
梯度运算的推导:
∂ ∂ θ j J ( θ ) = − 1 m ∑ i = 1 m ( y ( i ) 1 h θ ( x ( i ) ) ∂ h θ ( x ( i ) ) ∂ θ j − ( 1 − y ( i ) ) 1 1 − h θ ( x ( i ) ) ∂ h θ ( x ( i ) ) ∂ θ j ) = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) ∂ g ( θ T x ( i ) ) ∂ θ j = − 1 m ∑ i = 1 m ( y ( i ) 1 g ( θ T x ( i ) ) − ( 1 − y ( i ) ) 1 1 − g ( θ T x ( i ) ) ) g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) x ( i ) j = − 1 m ∑ i = 1 m ( y ( i ) ( 1 − g ( θ T x ( i ) ) ) − ( 1 − y ( i ) ) g ( θ T x ( i ) ) ) x ( i ) j = − 1 m ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) x ( i ) j = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) j ∂∂θjJ(θ)=−1m∑i=1m(y(i)1hθ(x(i))∂hθ(x(i))∂θj−(1−y(i))11−hθ(x(i))∂hθ(x(i))∂θj)=−1m∑i=1m(y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i)))∂g(θTx(i))∂θj=−1m∑i=1m(y(i)1g(θTx(i))−(1−y(i))11−g(θTx(i)))g(θTx(i))(1−g(θTx(i)))xj(i)=−1m∑i=1m(y(i)(1−g(θTx(i)))−(1−y(i))g(θTx(i)))xj(i)=−1m∑i=1m(y(i)−g(θTx(i)))xj(i)=1m∑i=1m(hθ(x(i))−y(i))xj(i)
2到3步:
∂ g ( θ T x ( i ) ) ∂ θ j = d d z ( 1 1 + e − z ) ∂ ( θ T x ( i ) ) ∂ θ j ⋯ 令 z = θ T x ( i ) = e − z ( 1 + e − z ) 2 ∂ ( θ 0 + θ 1 x 1 + ⋯ + θ j x j + ⋯ + θ m x m ) ∂ θ j = e − z + 1 − 1 ( 1 + e − z ) 2 x ( i ) j = [ 1 1 + e − z − ( 1 1 + e − z ) 2 ] x ( i ) j = ( g ( z ) − g 2 ( z ) ) x ( i ) j = g ( θ T x ( i ) ) ( 1 − g ( θ T x ( i ) ) ) x ( i ) j ∂g(θTx(i))∂θj=ddz(11+e−z)∂(θTx(i))∂θj⋯令z=θTx(i)=e−z(1+e−z)2∂(θ0+θ1x1+⋯+θjxj+⋯+θmxm)∂θj=e−z+1−1(1+e−z)2xj(i)=[11+e−z−(11+e−z)2]xj(i)=(g(z)−g2(z))xj(i)=g(θTx(i))(1−g(θTx(i)))xj(i)
多类别分类
选择一个类别,将其余的类别都划为第二类,由此得到一个分类器,以此类推,对n个类别获得n个分类器
y ∈ { 0 , 1 . . . n } h ( 0 ) θ ( x ) = P ( y = 0 | x ; θ ) h ( 1 ) θ ( x ) = P ( y = 1 | x ; θ ) ⋯ h ( n ) θ ( x ) = P ( y = n | x ; θ ) p r e d i c t i o n = max i ( h ( i ) θ ( x ) ) y∈{0,1...n}hθ(0)(x)=P(y=0|x;θ)hθ(1)(x)=P(y=1|x;θ)⋯hθ(n)(x)=P(y=n|x;θ)prediction=maxi(hθ(i)(x))
预测值取各个分类器结果中最大值,即为预测结果。