数学表达式魔鬼训练
作业
- 将向量下标为偶数的分量 (
x
2
,
x
4
,
…
x_2, x_4, \dots
x2,x4,…) 累加, 写出相应表达式.
∑ i m o d 2 = 0 x i \sum_{i \mod2 =0} {x_i } imod2=0∑xi - 各出一道累加、累乘、积分表达式的习题, 并给出标准答案.
(1)将100以内的, m o d 3 = 0 \mod3=0 mod3=0的数累加起来
∑ 1 ≤ i ≤ 100 , i m o d 3 = 0 i \sum_{1\leq i \leq100, i \mod3 =0} i 1≤i≤100,imod3=0∑i
(2)写出 1 , 2 , . . . , 10 1,2,...,10 1,2,...,10分之一的积
∏ i = 1 10 1 i \prod_{i = 1}^{10} \frac{1}{i} i=1∏10i1
(3)求以原点为中心,半径为R的圆的面积
∫ − R + R 2 π R d x \int_{-R}^{+R} 2\pi R \mathrm{d}x ∫−R+R2πRdx - 你使用过三重累加吗? 描述一下其应用.
∑ 1 ≤ i ≤ 100 ∑ 1 ≤ j ≤ 100 ∑ 1 ≤ k ≤ 100 ( x i j k ) \sum_{1\leq i\leq100}\sum_{1\leq j\leq100}\sum_{1\leq k\leq100} \left(x_{ijk}\right) ∑1≤i≤100∑1≤j≤100∑1≤k≤100(xijk) - 给一个常用的定积分, 将手算结果与程序结果对比.
定积分: ∫ 1 2 x d x \int_{1}^{2} x \mathrm{d}x ∫12xdx
手算: ∫ 1 2 x d x = 1 2 x 2 ∣ 1 2 = 3 2 \int_{1}^{2} x \mathrm{d}x = \frac{1}{2}x^2 |_{1}^2=\frac{3}{2} ∫12xdx=21x2∣12=23
程序:
from sympy import *
x = symbols(‘x’)
print(integrate(x, (x, 1, 2)))
- 自己写一个小例子来验证最小二乘法.
[ α β ] = ( [ 1 x 1 1 x 2 ⋮ ⋮ 1 x n ] T [ 1 x 1 1 x 2 ⋮ ⋮ 1 x n ] ) − 1 [ 1 x 1 1 x 2 ⋮ ⋮ 1 x n ] T [ y 1 y 2 ⋮ y n ] \left[\begin{array}{c}\alpha \\ \beta\end{array}\right]=\left(\left[\begin{array}{cc}1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n}\end{array}\right]^{T}\left[\begin{array}{cc}1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n}\end{array}\right]\right)^{-1}\left[\begin{array}{cc}1 & x_{1}\\ 1 & x_{2}\\ \vdots & \vdots \\ 1 & x_{n}\end{array}\right]^{T}\left[\begin{array}{c}y_{1} \\ y_{2} \\ \vdots \\ y_{n}\end{array}\right] [αβ]=⎝⎜⎜⎜⎜⎛⎣⎢⎢⎢⎡11⋮1x1x2⋮xn⎦⎥⎥⎥⎤T⎣⎢⎢⎢⎡11⋮1x1x2⋮xn⎦⎥⎥⎥⎤⎠⎟⎟⎟⎟⎞−1⎣⎢⎢⎢⎡11⋮1x1x2⋮xn⎦⎥⎥⎥⎤T⎣⎢⎢⎢⎡y1y2⋮yn⎦⎥⎥⎥⎤
X
=
[
1
,
2
,
3
]
,
Y
=
[
2
,
3
,
7
]
\mathbf{X} = [1,2,3],\mathbf{Y} = [2,3,7]
X=[1,2,3],Y=[2,3,7]
[
α
β
]
=
[
2.5
−
1
]
\begin{bmatrix} \alpha \\ \beta\end{bmatrix}\quad = \begin{bmatrix} 2.5 \\ -1\end{bmatrix}\quad
[αβ]=[2.5−1]
得:
y
=
2.5
x
−
1
y=2.5x-1
y=2.5x−1
6. 线性回归公式推导
推导过程参考2020年魔鬼训练闵老师授课内容。
损失函数:
∑
i
=
1
m
(
x
i
w
−
y
i
)
2
\sum_{i=1}^{m}\left(\mathbf{x}_{i} \mathbf{w}-y_{i}\right)^{2}
i=1∑m(xiw−yi)2
矩阵化表达:
∥
X
w
−
Y
∥
2
\|\mathbf{X} \mathbf{w}-\mathbf{Y}\|^{2}
∥Xw−Y∥2
矩阵化展开式:
L
(
X
,
Y
,
w
)
=
(
X
w
−
Y
)
T
(
X
w
−
Y
)
L(\mathbf{X}, \mathbf{Y}, \mathbf{w})=(\mathbf{X} \mathbf{w}-\mathbf{Y})^{\mathrm{T}}(\mathbf{X} \mathbf{w}-\mathbf{Y})
L(X,Y,w)=(Xw−Y)T(Xw−Y)
求解推导:
L
(
X
,
Y
,
w
)
=
(
X
w
−
Y
)
T
(
X
w
−
Y
)
=
(
w
T
X
T
−
Y
T
)
(
X
w
−
Y
)
=
w
T
X
T
X
w
−
w
T
X
T
Y
−
Y
T
X
w
+
Y
T
Y
\begin{aligned} &L(\mathbf{X}, \mathbf{Y}, \mathbf{w}) \\ &=(\mathbf{X} \mathbf{w}-\mathbf{Y})^{\mathrm{T}}(\mathbf{X} \mathbf{w}-\mathbf{Y}) \\ &=\left(\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}}-\mathbf{Y}^{\mathrm{T}}\right)(\mathbf{X} \mathbf{w}-\mathbf{Y}) \\ &=\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{Y}-\mathbf{Y}^{\mathrm{T}} \mathbf{X} \mathbf{w}+\mathbf{Y}^{\mathrm{T}} \mathbf{Y} \end{aligned}
L(X,Y,w)=(Xw−Y)T(Xw−Y)=(wTXT−YT)(Xw−Y)=wTXTXw−wTXTY−YTXw+YTY
对
w
\mathbf{w}
w求导,让其结果为0。由矩阵求导法则得:
∂
A
w
∂
w
=
A
∂
w
T
A
∂
w
=
A
T
∂
w
T
A
w
∂
w
=
2
w
T
A
\begin{aligned} &\frac{\partial A \mathbf{w}}{\partial \mathbf{w}}=A \\ &\frac{\partial \mathbf{w}^{\mathrm{T}} A}{\partial \mathbf{w}}=A^{\mathrm{T}} \\ &\frac{\partial \mathbf{w}^{\mathrm{T}} A \mathbf{w}}{\partial \mathbf{w}}=2 \mathbf{w}^{\mathrm{T}} A \end{aligned}
∂w∂Aw=A∂w∂wTA=AT∂w∂wTAw=2wTA
可知:
∂
L
(
X
,
Y
,
w
)
∂
w
=
∂
w
T
X
T
X
w
∂
w
−
∂
w
T
X
T
Y
∂
w
−
∂
Y
T
X
w
∂
w
+
∂
Y
T
Y
∂
w
=
2
w
T
X
T
X
−
Y
T
X
−
Y
T
X
+
0
=
2
w
T
X
T
X
−
2
Y
T
X
\begin{aligned} &\frac{\partial L(\mathbf{X}, \mathbf{Y}, \mathbf{w})}{\partial \mathbf{w}} \\ &=\frac{\partial \mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}}{\partial \mathbf{w}}-\frac{\partial \mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{Y}}{\partial \mathbf{w}}-\frac{\partial \mathbf{Y}^{\mathrm{T}} \mathbf{X} \mathbf{w}}{\partial \mathbf{w}}+\frac{\partial \mathbf{Y}^{\mathrm{T}} \mathbf{Y}}{\partial \mathbf{w}} \\ &=2 \mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X}-\mathbf{Y}^{\mathrm{T}} \mathbf{X}-\mathbf{Y}^{\mathrm{T}} \mathbf{X}+0 \\ &=2 \mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X}-2 \mathbf{Y}^{\mathrm{T}} \mathbf{X} \end{aligned}
∂w∂L(X,Y,w)=∂w∂wTXTXw−∂w∂wTXTY−∂w∂YTXw+∂w∂YTY=2wTXTX−YTX−YTX+0=2wTXTX−2YTX
由
2
w
^
T
X
T
X
−
2
Y
T
X
=
0
2 \hat{\mathbf{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X}-2 \mathbf{Y}^{\mathrm{T}} \mathbf{X}=0
2w^TXTX−2YTX=0
可得
w
^
T
X
T
X
=
Y
T
X
\hat{\mathbf{w}}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X}=\mathbf{Y}^{\mathrm{T}} \mathbf{X}
w^TXTX=YTX
两边转置
X
T
X
w
^
=
X
T
Y
\mathbf{X}^{\mathrm{T}} \mathbf{X} \hat{\mathbf{w}}=\mathbf{X}^{\mathrm{T}} \mathbf{Y}
XTXw^=XTY
最后
w
^
=
(
X
T
X
)
−
1
X
T
Y
\hat{\mathbf{w}}=\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \mathbf{Y}
w^=(XTX)−1XTY
7. 自己推导一遍逻辑回归, 并描述这个方法的特点 (不少于 5 条).
损失函数看做概率问题:下式越大越好
P
(
y
i
∣
x
i
;
w
)
=
(
σ
(
x
i
w
)
)
y
i
(
1
−
σ
(
x
i
w
)
)
1
−
y
i
P\left(y_{i} \mid \mathbf{x}_{i} ; \mathbf{w}\right)=\left(\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)^{y_{i}}\left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)^{1-y_{i}}
P(yi∣xi;w)=(σ(xiw))yi(1−σ(xiw))1−yi
求似然函数:假设训练样本独立, 且同等重要
为获得全局最优, 将不同样本涉及的概率连乘, 获得似然函数:
L
(
w
)
=
P
(
Y
∣
X
;
w
)
=
∏
i
=
1
m
P
(
y
i
∣
x
i
;
w
)
=
∏
i
=
1
m
(
σ
(
x
i
w
)
)
y
i
(
1
−
σ
(
x
i
w
)
)
1
−
y
i
\begin{aligned} L(\mathbf{w}) &=P(\mathbf{Y} \mid \mathbf{X} ; \mathbf{w}) \\ &=\prod_{i=1}^{m} P\left(y_{i} \mid \mathbf{x}_{i} ; \mathbf{w}\right) \\ &=\prod_{i=1}^{m}\left(\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)^{y_{i}}\left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)^{1-y_{i}} \end{aligned}
L(w)=P(Y∣X;w)=i=1∏mP(yi∣xi;w)=i=1∏m(σ(xiw))yi(1−σ(xiw))1−yi
对数函数具有单调性:
l
(
w
)
=
log
L
(
w
)
=
log
∏
i
=
1
m
P
(
y
i
∣
x
i
;
w
)
=
∑
i
=
1
m
y
i
log
σ
(
x
i
w
)
+
(
1
−
y
i
)
log
(
1
−
σ
(
x
i
w
)
)
\begin{aligned} l(\mathbf{w}) &=\log L(\mathbf{w}) \\ &=\log \prod_{i=1}^{m} P\left(y_{i} \mid \mathbf{x}_{i} ; \mathbf{w}\right) \\ &=\sum_{i=1}^{m} y_{i} \log \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)+\left(1-y_{i}\right) \log \left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right) \end{aligned}
l(w)=logL(w)=logi=1∏mP(yi∣xi;w)=i=1∑myilogσ(xiw)+(1−yi)log(1−σ(xiw))
损失函数(平均损失):
min
w
1
m
∑
i
=
1
m
−
y
i
log
σ
(
x
i
w
)
−
(
1
−
y
i
)
log
(
1
−
σ
(
x
i
w
)
)
\min _{\mathbf{w}} \frac{1}{m} \sum_{i=1}^{m}-y_{i} \log \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)-\left(1-y_{i}\right) \log \left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)
wminm1i=1∑m−yilogσ(xiw)−(1−yi)log(1−σ(xiw))
优化目标:
min
w
1
m
∑
i
=
1
m
−
y
i
log
σ
(
x
i
w
)
−
(
1
−
y
i
)
log
(
1
−
σ
(
x
i
w
)
)
\min _{\mathbf{w}} \frac{1}{m} \sum_{i=1}^{m}-y_{i} \log \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)-\left(1-y_{i}\right) \log \left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right)
wminm1i=1∑m−yilogσ(xiw)−(1−yi)log(1−σ(xiw))
梯度下降法,迭代式推导:
由于
l
(
w
)
=
∑
i
=
1
m
y
i
log
σ
(
x
i
w
)
+
(
1
−
y
i
)
log
(
1
−
σ
(
x
i
w
)
)
∂
l
(
w
)
∂
w
j
=
∑
i
=
1
m
(
y
i
σ
(
x
i
w
)
−
1
−
y
i
1
−
σ
(
x
i
w
)
)
∂
σ
(
x
i
w
)
∂
w
j
=
∑
i
=
1
m
(
y
i
σ
(
x
i
w
)
−
1
−
y
i
1
−
σ
(
x
i
w
)
)
σ
(
x
i
w
)
(
1
−
σ
(
x
i
w
)
)
∂
x
i
w
∂
w
j
=
∑
i
=
1
m
(
y
i
σ
(
x
i
w
)
−
1
−
y
i
1
−
σ
(
x
i
w
)
)
σ
(
x
i
w
)
(
1
−
σ
(
x
i
w
)
)
x
i
j
=
∑
i
=
1
m
(
y
i
−
σ
(
x
i
w
)
)
x
i
j
\begin{gathered} l(\mathbf{w})=\sum_{i=1}^{m} y_{i} \log \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)+\left(1-y_{i}\right) \log \left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right) \\ \frac{\partial l(\mathbf{w})}{\partial w_{j}}=\sum_{i=1}^{m}\left(\frac{y_{i}}{\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}-\frac{1-y_{i}}{1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}\right) \frac{\partial \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}{\partial w_{j}} \\ =\sum_{i=1}^{m}\left(\frac{y_{i}}{\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}-\frac{1-y_{i}}{1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}\right) \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right) \frac{\partial \mathbf{x}_{i} \mathbf{w}}{\partial w_{j}} \\ =\sum_{i=1}^{m}\left(\frac{y_{i}}{\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}-\frac{1-y_{i}}{1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)}\right) \sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\left(1-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right) x_{i j} \\ \quad=\sum_{i=1}^{m}\left(y_{i}-\sigma\left(\mathbf{x}_{i} \mathbf{w}\right)\right) x_{i j} \end{gathered}
l(w)=i=1∑myilogσ(xiw)+(1−yi)log(1−σ(xiw))∂wj∂l(w)=i=1∑m(σ(xiw)yi−1−σ(xiw)1−yi)∂wj∂σ(xiw)=i=1∑m(σ(xiw)yi−1−σ(xiw)1−yi)σ(xiw)(1−σ(xiw))∂wj∂xiw=i=1∑m(σ(xiw)yi−1−σ(xiw)1−yi)σ(xiw)(1−σ(xiw))xij=i=1∑m(yi−σ(xiw))xij
逻辑回归可以自己写例如或者直接调包
调包-sklearn实现
model = LogisticRegression()
model.fit(X, y)
逻辑回归的特点:
(1)由于sigmoid函数作用,预测结果为[0,1]之间的概率
(2)预测结果分为2类
(3)容易理解,可以通过公式进行推导,可解释性强
(4)准确率不高,在机器学习算法中表现一般
(5)模型简单