- Proposition: Let f:R→R with domf convex and f twice differentiable.
Thenf is convex if f′′(x)≥0 for all x∈domf. - Proof: Let z,x∈domf, then
f(z)===≥f(x)+∫zxf′(t)dtf(x)+∫zx(f′(x)+∫txf′′(s)ds)dtf(x)+f′(x)(z−x)+∫zx∫txf′′(s)dsdtf(x)+f′(x)(z−x)(two case to consider)
QED by “First order conditions”
Chain Rule: Let f:Rn→Rm be differentiable at x∈domf,
let g:Rm→Rk be differentiable at f(x)∈domg,
then if h:Rn→Rk is defined by h(y)=g(f(y))∀y∈Rn, h is differentiable at
(Df:m×n matrix, Dg:k×m matrix)
can be written as h=g∘f,D(g∘f)=(Dg∘f)⋅Df
Example: Let f:Rm→R, A∈Rm×n, b∈Rn, l(x)=Ax+b.
D(f∘l)(x)=[(Df∘l)⋅Dl](x)=Df(Ax+b)⋅A=∇f(Ax+b)TAExample: Let f:Rn→R, g:R→R, then
D(g∘f)(x)=Dg(f(x))Df(x)=g′(f(x))⋅∇f(x)TExample: Let f:Rn→R, g:R→R be defined by g(t)=f(x+tu) for some vectors x,u.
To compute g′(t), let h(t)=x+tu, so h:R→Rn and g=f∘h.
So g′(t)=((Df∘h)⋅Dh)(t)=∇fT(h(t))⋅Dh(t)=∇f(x+tu)T⋅u=uT∇f(x+tu).
To compute g′′(t),
g′′(t)=(D[(uT∇f)∘h])(t)=([(DuT∇f)∘h]⋅Dh)(t)=(((uTD∇f)∘h)⋅u)(t)=uT∇2f(h(t))⋅uCorollary: Let f:Rn→R be twice differentiable, domf convex.
The f is convex if∇2f⪰0 .Example: “log-sum-exp” f(x)=log(ex1+⋯+exn),f:Rn→R,domf=Rn
∇f(x)=⎡⎣⎢⎢⎢⎢∂f∂x1(x)⋮∂f∂xn(x)⎤⎦⎥⎥⎥⎥=1ex1+⋯+exn⋅⎡⎣⎢⎢ex1⋮exn⎤⎦⎥⎥
∂∂xi1ex1+⋯+exn=−(1ex1+⋯+exn)2⋅exi
(∇2f)ij,i≠j=∂∂xiexjex1+⋯+exn=−(1ex1+⋯+exn)2⋅exiexj
(∇2f)ii=∂∂xiexiex1+⋯+exn=−(1ex1+⋯+exn)2⋅(exi)2+exiex1+⋯+exn
Put zi=exi, then ex1+⋯+exn=ITz
∇2f=−(1ITz)2zzT+1ITz⋅diag(z)=1ITz(diag(z)−1ITzzzT)
xT(ITz⋅diag(z)−zzT)x≥0⟸⟸ITz⋅∑i=1nx2i⋅zi−(zTx)2≥0(zTx)2≤ITz⋅∑i=1nx2izi=∥(z1−−√,⋯,zn−−√)∥2⋅∑i=1n∥x1z1−−√,⋯,xnzn−−√∥2Exercise: Prove that f(x,y)=y2/x is convex, domf=R++×R
∇f=⎡⎣−y2x22yx⎤⎦,∇2f=⎡⎣−2y2x3−2yx2−2yx22x⎤⎦=1x3[2y2−2xy−2xy2x2]=2x3[y−x][y−x]Proposition: Let f:Rn→R be twice differentiable at x. Then
f(x+z)=f(x)+∇f(x)Tz+12zT∇2f(x)z+errx(z)
where limz→0∥errx(z)∥2∥z∥22=0.
Equivalent to: ∀ε>0,∃r>0,s.t.(∥z∥2≤r⟹∥errx(z)∥2≤ε⋅∥z∥22)- Proof: Let ε>0, then ∃r>0 s.t.
∇f(x+z)=∇f(x)+∇2f(x)z+errx(z)
where ∥errx(z)∥2≤ε⋅∥z∥2 for all z s.t.∥z∥2≤r
∥∇f(x+z)−∇f(x)−∇2f(x)z∥2≤ε∥z∥2 for all z s.t. ∥z∥2≤r
Let z s.t.∥z∥2≤r and let u=z/∥z∥2, g(t)=f(x+tu),t∈R.
Then g′(t)=∇f(x+tu)u
Sof(x+tu)=====≤≤===f(x)+∫t0g′(s)dsf(x)+∫t0uT∇f(x+su)dsf(x)+uT∫t0(∇f(x)+∇2f(x)su+errx(su))dsf(x)+uT∇f(x)(t−0)+uT∇2f(x)u∫t0sds+uT∫t0errx(su)dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0∥errx(su)∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2∫t0ε∥su∥2dsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2ε∫t0tdsf(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+∥u∥2εt2f(x)+∇f(x)Ttu+12(tu)T∇2f(x)(tu)+εt2f(x)+∇f(x)Tz+12zT∇2f(x)z+ε∥z∥22