目标:线性回归问题,找到最佳参数使得损失函数最小
一、损失函数定义
- 线性方程:y=ax+by=ax+by=ax+b
- 对于每个样本点 x(i)x^{(i)}x(i) ,其预测值为 y^(i)=ax(i)+b\hat y^{(i)}=ax^{(i)}+by^(i)=ax(i)+b
- 对于每个样本点 x(i)x^{(i)}x(i) ,其真实值为 y(i)y^{(i)}y(i)
- 那么损失函数 loss=(y(i)−y^(i))2loss=\left(y^{(i)}-\hat y^{(i)}\right)^2loss=(y(i)−y^(i))2(使用平方差的形式是为使loss函数连续,方便求导)
- 那么涵盖所有样本点的损失函数即为:
loss=∑i=1n(y(i)−y^(i))2\color{red}{loss=\sum_{i=1}^n \left(y^{(i)}-\hat y^{(i)}\right)^2}loss=i=1∑n(y(i)−y^(i))2 - 将y^(i)=ax(i)+b\hat y^{(i)}=ax^{(i)}+by^(i)=ax(i)+b带入上面的公式中,整理得:loss=∑i=1n(y(i)−ax(i)−b)2\color{green}{loss=\sum_{i=1}^n \left(y^{(i)}-ax^{(i)}-b\right)^2}loss=i=1∑n(y(i)−ax(i)−b)2
- 因为x(i)x^{(i)}x(i)和y(i)y^{(i)}y(i)均为已知量,则上面的公式即为随未知量a\color{red}{a}a与b\color{red}{b}b的变化公式:
J(a,b)=∑i=1n(y(i)−ax(i)−b)2 \color{black}{J\left(a,b\right)=\sum_{i=1}^n \left(y^{(i)}-ax^{(i)}-b\right)^2}J(a,b)=i=1∑n(y(i)−ax(i)−b)2
二、最小二乘法(方程形式)
- 找到合适的a和b,使得J(a,b)J\left(a,b\right)J(a,b)尽可能小,使用偏微分方程求极值的方法求解a和b。∂J(a,b)∂a=0 ∂J(a,b)∂b=0\begin{aligned} \frac{\partial J\left(a,b\right)}{\partial a}=0 &&\text{ } \frac{\partial J\left(a,b\right)}{\partial b}=0 \end{aligned}∂a∂J(a,b)=0 ∂b∂J(a,b)=0
- 对b求偏导得:b=yˉ−axˉb=\bar y-a\bar xb=yˉ−axˉ
∂J(a,b)∂b=∑i=1n2(y(i)−ax(i)−b)(−1)=∑i=1n(y(i)−ax(i)−b)=∑i=1ny(i)−a∑i=1nx(i)−∑i=1nb=∑i=1ny(i)−a∑i=1nx(i)−nb=0b=∑i=1ny(i)+a∑i=1nx(i)n根据式2.4求得b=yˉ−axˉ最终得到b\begin{aligned} \frac{\partial J\left(a,b\right)}{\partial b} &=\sum_{i=1}^n\bcancel{2}\left(y^{(i)}-ax^{(i)}-b\right)\bcancel{\left(-1\right)}\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-b\right)\\ &=\sum_{i=1}^ny^{(i)}-a\sum_{i=1}^nx^{(i)}-\color{red}{\sum_{i=1}^nb}\\ &=\sum_{i=1}^ny^{(i)}-a\sum_{i=1}^nx^{(i)}-\color{red}{nb}=0\\ \\ b&=\frac{\sum_{i=1}^ny^{(i)}+a\sum_{i=1}^nx^{(i)}}{n}&&\text{根据式2.4求得b}\\ &=\color{green}{\bar y-a\bar x}&&\text{最终得到b} \end{aligned}∂b∂J(a,b)b=i=1∑n2(y(i)−ax(i)−b)(−1)=i=1∑n(y(i)−ax(i)−b)=i=1∑ny(i)−ai=1∑nx(i)−i=1∑nb=i=1∑ny(i)−ai=1∑nx(i)−nb=0=n∑i=1ny(i)+a∑i=1nx(i)=yˉ−axˉ根据式2.4求得b最终得到b - 对a求偏导得:a=∑i=1n(x(i)−xˉ)(y(i)−yˉ)∑i=1n(x(i)−xˉ)2a=\frac{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y\right)}{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)^2}a=∑i=1n(x(i)−xˉ)2∑i=1n(x(i)−xˉ)(y(i)−yˉ)
(3.5)∂J(a,b)∂a=∑i=1n2(y(i)−ax(i)−b)(−x(i))=∑i=1n(y(i)−ax(i)−b)x(i)=∑i=1n(y(i)−ax(i)−yˉ+axˉ)x(i)将公式2.6带入=∑i=1n(x(i)y(i)−a(x(i))2−x(i)yˉ+axˉx(i))展开公式=∑i=1n(x(i)y(i)−x(i)yˉ)−a∑i=1n((x(i))2−xˉx(i))=0a=∑i=1n(x(i)y(i)−x(i)yˉ)∑i=1n((x(i))2−xˉx(i))根据式3.5求得a=∑i=1n(x(i)y(i)−x(i)yˉ−xˉy(i)+xˉ⋅yˉ)∑i=1n((x(i))2−xˉx(i)−xˉx(i)+xˉ2)根据式3.9变换=∑i=1n(x(i)−xˉ)(y(i)−yˉ)∑i=1n(x(i)−xˉ)2最终得到a\begin{aligned} \frac{\partial J\left(a,b\right)}{\partial a} &=\sum_{i=1}^n\bcancel{2}\left(y^{(i)}-ax^{(i)}-b\right)\left(\bcancel{-}x^{(i)}\right)\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-\color{red}{b}\right)x^{(i)}\\ &=\sum_{i=1}^n\left(y^{(i)}-ax^{(i)}-\color{red}{\bar y+a\bar x}\right)x^{(i)}&&\text{将公式2.6带入}\\ &=\sum_{i=1}^n\left(\color{blue}{x^{(i)}y^{(i)}}\color{green}{-a\left(x^{(i)}\right)^2}\color{blue}{-x^{(i)}\bar y}\color{green}{+a\bar xx^{(i)}}\right)&&\text{展开公式}\\ &=\sum_{i=1}^n\left(\color{blue}{x^{(i)}y^{(i)}-x^{(i)}\bar y}\right)-a\sum_{i=1}^n\left(\color{green}{\left(x^{(i)}\right)^2-\bar xx^{(i)}}\right)=0\tag{3.5}\\ \\ a&=\frac{\sum_{i=1}^n\left(x^{(i)}y^{(i)}-x^{(i)}\bar y\right)}{\sum_{i=1}^n\left(\left(x^{(i)}\right)^2-\bar xx^{(i)}\right)}&&\text{根据式3.5求得a}\\ &=\frac{\sum_{i=1}^n\left(x^{(i)}y^{(i)}-x^{(i)}\bar y-\color{#A00}{\bar xy^{(i)}}+\color{#A00}{\bar x\cdot\bar y}\right)}{\sum_{i=1}^n\left(\left(x^{(i)}\right)^2-\bar xx^{(i)}-\color{#A0A}{\bar xx^{(i)}}+\color{#A0A}{\bar x^2}\right)}&&\text{根据式3.9变换}\\ &=\color{green}{\frac{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)\left(y^{(i)}-\bar y\right)}{\sum_{i=1}^n\left(x^{(i)}-\bar x\right)^2}}&&\text{最终得到a}\\ \\ \end{aligned}∂a∂J(a,b)a=i=1∑n2(y(i)−ax(i)−b)(−x(i))=i=1∑n(y(i)−ax(i)−b)x(i)=i=1∑n(y(i)−ax(i)−yˉ+axˉ)x(i)=i=1∑n(x(i)y(i)−a(x(i))2−x(i)yˉ+axˉx(i))=i=1∑n(x(i)y(i)−x(i)yˉ)−ai=1∑n((x(i))2−xˉx(i))=0=∑i=1n((x(i))2−xˉx(i))∑i=1n(x(i)y(i)−x(i)yˉ)=∑i=1n((x(i))2−xˉx(i)−xˉx(i)+xˉ2)∑i=1n(x(i)y(i)−x(i)yˉ−xˉy(i)+xˉ⋅yˉ)=∑i=1n(x(i)−xˉ)2∑i=1n(x(i)−xˉ)(y(i)−yˉ)将公式2.6带入展开公式根据式3.5求得a根据式3.9变换最终得到a(3.5)
yˉ∑i=1nx(i)⇋∑x(i)⇔nxˉnyˉ⋅xˉ⇋xˉ∑i=1ny(i)⇋∑i=1nx(i)yˉ⇋∑i=1nxˉ⋅yˉ⇋∑i=1ny(i)xˉ \bar y \sum_{i=1}^nx^{(i)}\xleftrightharpoons{\sum x^{(i)}\Leftrightarrow n\bar x} n\bar y\cdot\bar x \xleftrightharpoons{}\bar x \sum_{i=1}^ny^{(i)} \xleftrightharpoons{} \sum_{i=1}^nx^{(i)}\bar y\xleftrightharpoons{}\sum_{i=1}^n\bar x\cdot\bar y\xleftrightharpoons{}\sum_{i=1}^ny^{(i)}\bar x yˉi=1∑nx(i)∑x(i)⇔nxˉnyˉ⋅xˉxˉi=1∑ny(i)i=1∑nx(i)yˉi=1∑nxˉ⋅yˉi=1∑ny(i)xˉ
三、最小二乘法(矩阵形式)
######待续