P(y(i)=k|x(i);θ)=exp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
似然函数
L=∏i=1M∏k=1KP(y(i)=k|x(i);θ)1{y(i)=k}
对数损失函数为:
J(θ)=−⎡⎣∑i=1m∑k=1K1{y(i)=k}logexp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))⎤⎦
1{⋅} is the ”‘indicator function,”’ so that 1{a true statement}=1, and 1{a false statement}=0.
现在对对数损失函数求偏导
∇θ(n)J(θ)=∑i=1my(i)∂P(y(i)=n|xi;θ)∂θ(n)+∑k=1,k≠nKy(i)∂P(y(i)=k|xi;θ)∂θ(n)
其中,
P(y(i)=n|xi;θ)=logexp(θ(n)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
P(y(i)=k|xi;θ)=logexp(θ(k)⊤x(i))∑Kj=1exp(θ(j)⊤x(i))
∂P(y(i)=n|xi;θ)∂θ(n)=∑Kj=1exp(θ(j)⊤x(i))exp(θ(n)⊤x(i))∗⎛⎝⎜⎜exp(θ(n)⊤x(i))∗x(i)∑Kj=1exp(θ(j)⊤x(i))−exp(θ(n)⊤x(i))∗exp(θ(n)⊤x(i))x(i)[∑Kj=1exp(θ(j)⊤x(i))]2⎞⎠⎟⎟=x(i)−exp(θ(n)⊤x(i))x(i)∑Kj=1exp(θ(j)⊤x(i))=x(i)(1−P(y(i)=n|xi;θ))
另外一个,
∂P(y(i)=k|xi;θ)∂θ(n)=∑Kj=1exp(θ(j)⊤x(i))exp(θ(k)⊤x(i))⎛⎝⎜⎜−exp(θ(k)⊤x(i))∗exp(θ(n)⊤x(i))x(i)[∑Kj=1exp(θ(j)⊤x(i))]2⎞⎠⎟⎟=−exp(θ(n)⊤x(i))x(i)∑Kj=1exp(θ(j)⊤x(i))=−P(y(i)=n|xi;θ)x(i)
∇θ(k)J(θ)=−∑i=1m[x(i)(1{y(i)=k}−P(y(i)=k|x(i);θ))]
本文详细解析了多分类问题中对数损失函数的定义及其梯度的推导过程,通过数学公式展示了如何从概率分布出发,计算每个样本对于参数θ的贡献,并进一步推导出了损失函数关于参数θ的偏导数。
692

被折叠的 条评论
为什么被折叠?



