P12
二阶条件
仿射函数: f ( x ) = A x + b ▽ 2 f ( x ) = 0 f(x)=Ax+b\quad\triangledown^2f(x)=0 f(x)=Ax+b▽2f(x)=0
指数函数:
f
(
x
)
=
e
a
x
,
x
∈
R
f(x)=e^{ax},x\in R\quad
f(x)=eax,x∈R
f
′
(
x
)
=
a
e
a
x
f'(x)=ae^{ax}
f′(x)=aeax
f
′
′
(
x
)
=
a
2
e
a
x
f''(x)=a^2e^{ax}
f′′(x)=a2eax
恒大于0,所以指数函数是凸函数。
幂函数:
f
(
x
)
=
x
a
,
x
∈
R
+
+
f(x)=x^a,x \in R_{++}
f(x)=xa,x∈R++
f
′
(
x
)
=
a
x
a
−
1
f'(x)=ax^{a-1}
f′(x)=axa−1
f
′
′
(
x
)
=
a
(
a
−
1
)
x
a
−
2
f''(x)=a(a-1)x^{a-2}
f′′(x)=a(a−1)xa−2
▽
2
f
(
n
)
=
{
≥
0
,
if
a
≥
0
或
a
≤
0
≤
0
,
if
0
≤
a
≤
1
\triangledown^2 f(n)= \begin{cases} \geq 0, & \text {if $a \geq 0$或$a \leq 0$} \\ \leq 0, & \text{if $0 \leq a \leq 1$} \end{cases}
▽2f(n)={≥0,≤0,if a≥0或a≤0if 0≤a≤1
当
a
=
1
a=1
a=1为仿射函数,即是凸的也是凹的,
当
a
=
0
a=0
a=0为常数
绝对值的幂函数:
f
(
x
)
=
∣
x
∣
p
,
x
∈
R
f(x)=|x|^p,x \in R
f(x)=∣x∣p,x∈R
f
′
(
x
)
=
{
p
x
p
−
1
if
x
≥
0
−
p
(
−
x
)
p
−
1
if
x
<
0
f'(x)= \begin{cases} p x ^{p-1} & \text {if $x \geq 0$} \\ -p(-x)^{p-1} & \text {if $x <0$} \end{cases}
f′(x)={pxp−1−p(−x)p−1if x≥0if x<0
f
′
′
(
x
)
=
{
p
(
p
−
1
)
x
p
−
2
if
x
≥
0
p
(
p
−
1
)
(
−
x
)
p
−
2
if
x
<
0
f''(x)= \begin{cases} p(p-1) x ^{p-2} & \text {if $x \geq 0$} \\ p(p-1)(- x) ^{p-2} & \text {if $x <0$} \end{cases}
f′′(x)={p(p−1)xp−2p(p−1)(−x)p−2if x≥0if x<0
当
p
=
1
p=1
p=1,
f
(
x
)
f(x)
f(x)不可导。不用用二阶条件判断。
然而:
通过凸函数的性质,我们得知当
1
≤
p
<
2
1 \leq p < 2
1≤p<2,
f
(
x
)
f(x)
f(x)为凸函数。
当
p
≥
2
p \geq 2
p≥2时,通过二阶条件可得为凸函数。
所以:当
p
≥
1
p \geq 1
p≥1,
f
(
x
)
f(x)
f(x)为凸函数。
对数函数:
f
(
x
)
=
l
o
g
(
x
)
,
x
∈
R
+
+
f(x)=log(x),x\in R_{++}
f(x)=log(x),x∈R++
f
′
(
x
)
=
1
x
f'(x)=\frac{1}{x}
f′(x)=x1
f
′
′
(
x
)
=
−
1
x
2
f''(x)=-\frac{1}{x^2}
f′′(x)=−x21
对数函数是凹函数.
负熵:
f
(
x
)
=
x
l
o
g
(
x
)
,
x
∈
R
+
+
f(x)=xlog(x),x\in R_{++}
f(x)=xlog(x),x∈R++
f
′
(
x
)
=
l
o
g
x
+
1
f'(x)=logx+1
f′(x)=logx+1
f
′
′
(
x
)
=
1
x
f''(x)=\frac{1}{x}
f′′(x)=x1
所以负熵是凸函数。
范数
R
n
R^n
Rn空间的范数
p
(
x
)
x
∈
R
n
p(x)\quad x\in R^n
p(x)x∈Rn
范数函数判断满足三个性质
1.
p
(
a
x
)
=
∣
a
∣
p
(
x
)
p(ax)=|a|p(x)
p(ax)=∣a∣p(x)
2.
p
(
x
+
y
)
≤
p
(
x
)
+
p
(
y
)
p(x+y)\leq p(x)+p(y)
p(x+y)≤p(x)+p(y)
3.
p
(
x
)
=
0
⇔
x
=
0
p(x)=0 \Leftrightarrow x=0
p(x)=0⇔x=0
∀
x
,
y
∈
R
n
∀
0
≤
θ
≤
1
\forall x,y \in R^n \quad \forall 0 \leq \theta \leq 1
∀x,y∈Rn∀0≤θ≤1
p
(
θ
x
+
(
1
−
θ
)
y
)
≤
p
(
θ
x
)
+
p
(
(
1
−
θ
)
y
)
=
θ
p
(
x
)
+
(
1
−
θ
)
p
(
y
)
p(\theta x + (1-\theta)y) \leq p(\theta x)+ p((1-\theta)y) = \theta p(x) + (1-\theta)p(y)
p(θx+(1−θ)y)≤p(θx)+p((1−θ)y)=θp(x)+(1−θ)p(y)
所以:范数是凸函数
零范数:
∣
∣
x
∣
∣
0
||x||_0
∣∣x∣∣0 非零元素数目
零范数不是范数也不是凸函数。
x
∈
R
x \in R
x∈R
不满足范数定义的第一条性质:
极大值函数:
f
(
x
)
=
m
a
x
{
x
1
,
.
.
.
x
n
}
x
∈
R
n
f(x)=max \lbrace x_1,...x_n \rbrace \quad x \in R^n
f(x)=max{x1,...xn}x∈Rn
∀
x
,
y
∈
R
n
∀
0
≤
θ
≤
1
\forall x,y \in R^n \quad \forall 0 \leq \theta \leq 1
∀x,y∈Rn∀0≤θ≤1
p
(
θ
x
+
(
1
−
θ
)
y
)
=
m
a
x
{
θ
x
i
+
(
1
−
θ
)
y
i
,
i
=
1
,
.
.
.
n
}
≤
θ
m
a
x
{
x
i
,
i
=
1
,
.
.
.
n
}
+
(
1
−
θ
)
m
a
x
{
x
i
,
i
=
1
,
.
.
.
n
}
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
\begin{aligned} p(\theta x + (1-\theta)y) &= max \lbrace \theta x_i + (1-\theta)y_i,i=1,...n \rbrace \\ &\leq\theta max \lbrace x_i ,i=1,...n \rbrace + (1- \theta) max \lbrace x_i ,i=1,...n \rbrace \\ &=\theta f(x) + (1-\theta) f(y) \end{aligned}
p(θx+(1−θ)y)=max{θxi+(1−θ)yi,i=1,...n}≤θmax{xi,i=1,...n}+(1−θ)max{xi,i=1,...n}=θf(x)+(1−θ)f(y)
所以:极大值函数是凸函数。
解析逼近:对不可导的函数做可导的逼近。
极大值函数的解析逼近:
log-sum-up:
f
(
x
)
=
l
o
g
(
e
x
1
+
.
.
.
+
e
x
n
)
x
∈
R
n
f(x)=log(e^{x_1}+...+e^{x_n}) \quad x \in R^n
f(x)=log(ex1+...+exn)x∈Rn
m a x { x 1 . . . x n } ≤ f ( x ) ≤ m a x { x 1 . . . x n } + log n max \lbrace x_1...x_n \rbrace \leq f(x) \leq max \lbrace x_1...x_n \rbrace + \log n max{x1...xn}≤f(x)≤max{x1...xn}+logn
∂
f
∂
x
i
=
e
x
i
e
x
1
+
.
.
.
+
e
x
n
\frac{\partial f}{\partial x_i} = \frac{e^{x_i}}{e^{x_1}+...+e^{x_n}}
∂xi∂f=ex1+...+exnexi
当
i
≠
j
i \not= j
i=j
∂
2
f
∂
x
i
∂
x
j
=
−
e
x
i
e
x
j
(
e
x
1
+
.
.
.
+
e
x
n
)
2
\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{- e^{x_i}e^{x_j}}{(e^{x_1}+...+e^{x_n})^2}
∂xi∂xj∂2f=(ex1+...+exn)2−exiexj
当
i
=
j
i = j
i=j
∂
2
f
∂
x
i
∂
x
i
=
−
e
x
i
e
x
i
+
e
x
i
(
e
x
1
+
.
.
.
+
e
x
n
)
(
e
x
1
+
.
.
.
+
e
x
n
)
2
\frac{\partial^2 f}{\partial x_i \partial x_i} = \frac{- e^{x_i}e^{x_i} + e^{x_i}(e^{x_1}+...+e^{x_n})}{(e^{x_1}+...+e^{x_n})^2}
∂xi∂xi∂2f=(ex1+...+exn)2−exiexi+exi(ex1+...+exn)
定义:
Z
=
[
e
x
1
,
.
.
.
,
e
x
n
]
Z=[e^{x_1},...,e^{x_n}]
Z=[ex1,...,exn]
H = 1 ( 1 T ⋅ Z ) 2 ( ( 1 T Z ) d i a g { Z } − Z Z T ) H=\frac{1}{(1^T \cdot Z)^2}((1^TZ)diag\lbrace Z \rbrace - ZZ^T) H=(1T⋅Z)21((1TZ)diag{Z}−ZZT)
定义: K = ( 1 T Z ) d i a g { Z } − Z Z T K=(1^TZ)diag\lbrace Z \rbrace - ZZ^T K=(1TZ)diag{Z}−ZZT
半正定举证判定: ∀ V ∈ R n V T K V ≥ 0 \forall V \in R^n \quad V^TKV \geq 0 ∀V∈RnVTKV≥0
V T K V = ( 1 T Z ) V T d i a g { Z } V − V T Z Z T V = ( ∑ i Z i ) ( ∑ i V i 2 Z i ) − ( ∑ i V i Z i ) 2 V^TKV=(1^TZ) V^T diag\lbrace Z \rbrace V - V^T ZZ^T V =(\sum_i Z_i)(\sum_i V_i^2Z_i)-(\sum_i V_iZ_i)^2 VTKV=(1TZ)VTdiag{Z}V−VTZZTV=(∑iZi)(∑iVi2Zi)−(∑iViZi)2
定义: a i = V i Z i b i = Z i a_i=V_i \sqrt{Z_i} \quad b_i=\sqrt {Z_i} ai=ViZibi=Zi
V T K V = ( b T b ) ( a T a ) − ( a T b ) 2 ≥ 0 V^TKV=(b^Tb)(a^Ta)-(a^Tb)^2 \geq 0 VTKV=(bTb)(aTa)−(aTb)2≥0 因为 Cachy-Schwartz 不等式