作业 1: 线性回归模型的极大似然估计
1. E{wMLE}E\{\textbf w_{MLE}\}E{wMLE}
E{w^}=∫w^p(y∣X,w,δ2)dtE\{\widehat{\textbf w}\}=\int \widehat{\textbf w}p(y|\textbf X,w,\delta^2)dtE{w}=∫wp(y∣X,w,δ2)dt
其中w^=(XTX)−1XTy\widehat{\textbf w}=(\textbf X^T\textbf X)^{-1}\textbf X^T\textbf yw=(XTX)−1XTy,带入
E{w^}=(XTX)−1XT∫yp(y∣X,w,δ2)dt=(XTX)−1XTE{y}=(XTX)−1XTXw^=w^E\{\widehat{\textbf w}\}=(\textbf X^T\textbf X)^{-1}\textbf X^T\int \textbf yp(y|\textbf X,w,\delta^2)dt\\
=(\textbf X^T\textbf X)^{-1}\textbf X^TE\{\textbf y\}\\
=(\textbf X^T\textbf X)^{-1}\textbf X^T\textbf X\widehat{\textbf w}\\
=\widehat{\textbf w}E{w}=(XTX)−1XT∫yp(y∣X,w,δ2)dt=(XTX)−1XTE{y}=(XTX)−1XTXw=w
2. Cov{wMLE}Cov\{\textbf w_{MLE}\}Cov{wMLE}
cov{w^}=E{w^w^T}−E{w^}E{w^}T=E{w^w^T}−wwTcov\{\widehat{\textbf w}\}=E\{\widehat{\textbf w}\widehat{\textbf w}^T\}-E\{\widehat{\textbf w}\}E\{\widehat{\textbf w}\}^T\\ =E\{\widehat{\textbf w}\widehat{\textbf w}^T\}-\textbf w\textbf w^Tcov{w}=E{wwT}−E{w}E{w}T=E{wwT}−wwT
其中
E{w^w^T}=E{((XTX)−1XTy)((XTX)−1XTy)T}=(XTX)−1XTE{yyT}X(XTX)−1E\{\widehat{\textbf w}\widehat{\textbf w}^T\}=E\{((\textbf X^T\textbf X)^{-1}\textbf X^T\textbf y)((\textbf X^T\textbf X)^{-1}\textbf X^T\textbf y)^T\}\\
=(\textbf X^T\textbf X)^{-1}\textbf X^TE\{\textbf y\textbf y^T\}\textbf X(\textbf X^T\textbf X)^{-1}E{wwT}=E{((XTX)−1XTy)((XTX)−1XTy)T}=(XTX)−1XTE{yyT}X(XTX)−1
其中
cov{y}=δ2I=E{yyT}−E{y}E{y}TE{yyT}=E{y}E{y}T+δ2I=Xw(Xw)T+δ2I=XwwTXT+δ2Icov\{\textbf y\}=\delta^2I=E\{\textbf y\textbf y^T\}-E\{\textbf y\}E\{\textbf y\}^T\\
E\{\textbf y\textbf y^T\}=E\{\textbf y\}E\{\textbf y\}^T+\delta^2I\\
=\textbf X\textbf w(\textbf X\textbf w)^T+\delta^2I\\
=\textbf X\textbf w\textbf w^T\textbf X^T+\delta^2Icov{y}=δ2I=E{yyT}−E{y}E{y}TE{yyT}=E{y}E{y}T+δ2I=Xw(Xw)T+δ2I=XwwTXT+δ2I
因此
E{w^w^T}=(XTX)−1XTXwwTXTX(XTX)−1+δ2(XTX)−1XTX(XTX)−1=wwT+δ2(XTX)−1E\{\widehat{\textbf w}\widehat{\textbf w}^T\}=(\textbf X^T\textbf X)^{-1}\textbf X^T\textbf X\textbf w\textbf w^T\textbf X^T\textbf X(\textbf X^T\textbf X)^{-1}\\
+\delta^2(\textbf X^T\textbf X)^{-1}\textbf X^T\textbf X(\textbf X^T\textbf X)^{-1}\\
=\textbf w\textbf w^T+\delta^2(\textbf X^T\textbf X)^{-1}E{wwT}=(XTX)−1XTXwwTXTX(XTX)−1+δ2(XTX)−1XTX(XTX)−1=wwT+δ2(XTX)−1
cov{w^}=wwT+δ2(XTX)−1−wwT=δ2(XTX)−1cov\{\widehat{\textbf w}\}=\textbf w\textbf w^T+\delta^2(\textbf X^T\textbf X)^{-1}-\textbf w\textbf w^T\\ =\delta^2(\textbf X^T\textbf X)^{-1}cov{w}=wwT+δ2(XTX)−1−wwT=δ2(XTX)−1
作业 2: 关于岭回归与正则化
1. 求P(w∣D)求P(w|D)求P(w∣D)的均值和方差
p(w∣D)≃p(D∣w)p(w)=(12πδ)ne−(y−Xw)T(y−Xw)2δ2I×(12πΣ0)ne−(w−μ0)T(w−μ0)2Σ0≃e−12((y−Xw)T(y−Xw)δ2I+(w−μ0)T(w−μ0)Σ0)≃e−12(−2yTXw+wTXTXwδ2I+wTw−2μ0TwΣ0)p(w|D)\simeq p(D|w)p(w)\\ =(\frac{1}{\sqrt{2\pi}\delta})^ne^{-\frac{(\textbf y-\textbf X\textbf w)^T(\textbf y-\textbf X\textbf w)}{2\delta^2\textbf I}}\times (\frac{1}{\sqrt{2\pi\Sigma_0}})^ne^{-\frac{(\textbf w-\mu_0)^T(\textbf w-\mu_0)}{2\Sigma_0}}\\ \simeq e^{-\frac{1}{2}(\frac{(\textbf y-\textbf X\textbf w)^T(\textbf y-\textbf X\textbf w)}{\delta^2\textbf I}+\frac{(\textbf w-\mu_0)^T(\textbf w-\mu_0)}{\Sigma_0})}\\ \simeq e^{-\frac{1}{2}(\frac{-2\textbf y^T\textbf X\textbf w+\textbf w^T\textbf X^T\textbf X\textbf w}{\delta^2\textbf I}+\frac{\textbf w^T\textbf w-2\mu_0^T\textbf w}{\Sigma_0})}p(w∣D)≃p(D∣w)p(w)=(2πδ1)ne−2δ2I(y−Xw)T(y−Xw)×(2πΣ01)ne−2Σ0(w−μ0)T(w−μ0)≃e−21(δ2I(y−Xw)T(y−Xw)+Σ0(w−μ0)T(w−μ0))≃e−21(δ2I−2yTXw+wTXTXw+Σ0wTw−2μ0Tw)
后验概率应该为
p(w∣D)=N(μw,Σw)≃e−(w−μw)T(w−μw)2Σw≃e−wTw−2μwTw2Σwp(\textbf w|D)=N(\mu_w,\Sigma_w)\\
\simeq e^{-\frac{(\textbf w-\mu_w)^T(\textbf w-\mu_w)}{2\Sigma_w}}\\
\simeq e^{-\frac{\textbf w^T\textbf w-2\mu_w^T\textbf w}{2\Sigma_w}}p(w∣D)=N(μw,Σw)≃e−2Σw(w−μw)T(w−μw)≃e−2ΣwwTw−2μwTw
这里面有一个线性二次项,对应相等,方差可解
wTwΣw=wTXTXwδ2I+wTwΣ0=wT(XTXδ2I+1Σ0)w1Σw=XTXδ2I+1Σ0Σw=(1δ2XTX+Σ0−1)−1\frac{\textbf w^T\textbf w}{\Sigma_w}=\frac{\textbf w^T\textbf X^T\textbf X\textbf w}{\delta^2\textbf I}+\frac{\textbf w^T\textbf w}{\Sigma_0}\\
=\textbf w^T(\frac{\textbf X^T\textbf X}{\delta^2\textbf I}+\frac{1}{\Sigma_0})\textbf w\\
\frac{1}{\Sigma_w}=\frac{\textbf X^T\textbf X}{\delta^2\textbf I}+\frac{1}{\Sigma_0}\\
\Sigma_w=(\frac{1}{\delta^2}\textbf X^T\textbf X+\Sigma_0^{-1})^{-1}ΣwwTw=δ2IwTXTXw+Σ0wTw=wT(δ2IXTX+Σ01)wΣw1=δ2IXTX+Σ01Σw=(δ21XTX+Σ0−1)−1
同样的,将线性一次项相等,会获得均值μw\mu_wμw,这样www的方差,均值都获得了
−2μwTwΣw=−2yTXwδ2+−2μ0TwΣ0μwTΣw=yTXδ2+μ0TΣ0μwT=(yTXδ2+μ0TΣ0)Σwμw=Σw(1δ2XTy+Σ0−1μ0)=(1δ2XTX+Σ0−1)−1(1δ2XTy+Σ0−1μ0)\frac{-2\mu_w^T\textbf w}{\Sigma_w}=\frac{-2\textbf y^T\textbf X\textbf w}{\delta^2}+\frac{-2\mu_0^T\textbf w}{\Sigma_0}\\
\frac{\mu_w^T}{\Sigma_w}=\frac{\textbf y^T\textbf X}{\delta^2}+\frac{\mu_0^T}{\Sigma_0}\\
\mu_w^T=(\frac{\textbf y^T\textbf X}{\delta^2}+\frac{\mu_0^T}{\Sigma_0})\Sigma_w\\
\mu_w=\Sigma_w(\frac{1}{\delta^2}\textbf X^T\textbf y+\Sigma_0^{-1}\mu_0)\\
=(\frac{1}{\delta^2}\textbf X^T\textbf X+\Sigma_0^{-1})^{-1}(\frac{1}{\delta^2}\textbf X^T\textbf y+\Sigma_0^{-1}\mu_0)Σw−2μwTw=δ2−2yTXw+Σ0−2μ0TwΣwμwT=δ2yTX+Σ0μ0TμwT=(δ2yTX+Σ0μ0T)Σwμw=Σw(δ21XTy+Σ0−1μ0)=(δ21XTX+Σ0−1)−1(δ21XTy+Σ0−1μ0)
2. 为什么wMAP=(XT+λI)−1XTy?w_{MAP}=(X^T+\lambda I)^{-1}X^Ty?wMAP=(XT+λI)−1XTy?
因为后验概率是高斯分布,www最有可能的地方就是在均值处,如果μ0=[0,0,...,0]T\mu_0=[0,0,...,0]^Tμ0=[0,0,...,0]T,那么就变成了wMAP=μw=(1δ2XTX+Σ0−1)−11δ2XTyw_{MAP}=\mu_w=(\frac{1}{\delta^2}\textbf X^T\textbf X+\Sigma_0^{-1})^{-1}\frac{1}{\delta^2}\textbf X^T\textbf ywMAP=μw=(δ21XTX+Σ0−1)−1δ21XTy
基本相同。
3. 为什么XTX+λIX^TX+\lambda IXTX+λI是可逆的?
(XTX)T=XTX(\textbf X^T\textbf X)^T=\textbf X^T\textbf X(XTX)T=XTX,因此XTX\textbf X^T\textbf XXTX是对称阵,明显的特点是对称轴都大于等于0,加上λI\lambda IλI后对称轴上都大于0,说明它为正定矩阵,正定矩阵可逆。
该文详细探讨了线性回归模型的极大似然估计,包括期望值和协方差的计算。同时,深入分析了岭回归与正则化的后验概率,解释了MAP估计的原理,并讨论了正则化矩阵可逆的原因。内容涵盖了概率分布、线性代数和统计推断等多个方面。
288

被折叠的 条评论
为什么被折叠?



