Convex Optimization Note 1
本文是《Convex Optimization》ch.2\3 appendix A的笔记
1. Convex Set
1.1 Affine and convex sets:
1) C=V+x0={x+x0|x∈V}C=V+x0={x+x0|x∈V} affine set 可以看做subspace在其中偏移一个点。类似于
2) Affine dimension and relative interior 在affine hull的dimension与其上的interior
3) Convex combination 可以推广到infinite情况:
4) Cones if x∈C,then ∀θ>0,θx∈Cif x∈C,then ∀θ>0,θx∈C –>convex cone
1.2 Some examples
1) Euclidean balls and ellipsoids:
2) Norm cones {(x,t)|∥x∥<t}⊂Rn+1{(x,t)|‖x‖<t}⊂Rn+1
3) Polyhedra –> simplex (the convex hull of k+1k+1 affinely independent points is kk -dimension simplex)
like unit simplex and probability simplex (e1,...,en)(e1,...,en)
Polyhedra可以有两种表示方法: Convex hull 或 Inequality
4) Positive semi-definite cone Sn+S+n
1.3 Operation that preserve convexity
1) Intersection
Positive semi-definite Cone Sn+=⋂z≠0{X∈Sn|zTXz≥0}S+n=⋂z≠0{X∈Sn|zTXz≥0}
S={x∈Rm| |p(t)|≤1 for|t|≤π/3}S={x∈Rm| |p(t)|≤1 for|t|≤π/3} and p(t)=∑mk=1xkcosktp(t)=∑k=1mxkcoskt
所有的convex set可以表达为infinite个halfspace的交集
2) Affine function 仿射函数或其逆函数均不改变convexity
Polyhedra
Solution set of linear matrix inequality A(x)=x1A1+...+xnAn⪯BA(x)=x1A1+...+xnAn⪯B
Hyperbolic cone {x|xTPx≤(cTx)2,cTx≥0}{x|xTPx≤(cTx)2,cTx≥0} is inverse image of {(x,t)|xTx≤t2,t≥0}{(x,t)|xTx≤t2,t≥0}
3) Perspective functions
P(z,t)=z/tP(z,t)=z/t 其中 dom P=Rn×R++dom P=Rn×R++ 这种函数(或其逆函数)可以保持凸性
Conditional probability: 原始probability位于probability simplex上,condition只是除以部分的和,可以看作linear-fractional function,因此conditional prob也是convex set
1.4 Separating and supporting theorem
1) Separating theorem: 任意两个不相交的凸集可以用hyperplane分开。
证明为找两个凸集的最近点连线的中点,过中点并且垂直于连线的hyperplane,两个集合必定会将其分开。反证其不能分开(Ax+bAx+b符号不对)则可以在凸集中找到一个更近的点(正好是欧氏距离的导数)。
2)Strict separating:
两个凸集不一定strict separating
一个closed convex set与一个点可以strict separating,表明所有closed convex set是所有包含它的half-space的交集。
3)inverse: 对于两个凸集,如果有一个是开集,则如果它们存在separating hyperplane,那么它们disjoint
4)supporting theorem可以由intCintC 与PP 的separating来证明
2. Mathematical background (Appendix A)
1) norm
Vector norm: P-quadratic:
Matrix norm:
sum-absolute/maximum-absolute
operator norms ∥X∥a,b=sup{∥Xu∥a | ∥u∥b≤1}‖X‖a,b=sup{‖Xu‖a | ‖u‖b≤1}
由operator产生的:l2l2 产生spectral norm为最大的奇异值,l1l1 得到max-column-sum,l∞l∞得到max-row-sum
2) equivalence of norm: 所有RnRn 上的norm与某个quadratic norm等价,满足∥x∥P≤∥x∥≤n−−√∥x∥P‖x‖P≤‖x‖≤n‖x‖P
3) Dual norm:
zTx≤∥x∥∥z∥∗zTx≤‖x‖‖z‖∗
L2-norm与自身dual,L1与L∞∞ dual,Lp与Lq dual(1/p+1/q=11/p+1/q=1)
4) close/open set and boundary definition
5) closed function: sublevel set {x∈domf|f(x)≤α}{x∈domf|f(x)≤α} all are closed set
如果 f 连续,dom f 是闭集,则f closed
如果 f 连续,dom f 是开集,则f 在端点上需要趋近于 ∞∞ 才能让f closed
6) logdet(I+X−1/2ΔXX1/2)=∑ni=1(1+λi)logdet(I+X−1/2ΔXX1/2)=∑i=1n(1+λi) 其中λiλi是X−1/2ΔXX1/2X−1/2ΔXX1/2的特征值
∇logdet(X)=X−1∇logdet(X)=X−1
7) cond(A)=∥A∥2∥A−1∥2=σmax(A)/σmin(A)cond(A)=‖A‖2‖A−1‖2=σmax(A)/σmin(A)
8) pseudo inverse :
A†bA†b 是minimize ∥Ax−b∥22minimize ‖Ax−b‖22 的解
generalized quadratic function minima
9) Schur complement X=(ABTBC)X=(ABBTC) S=C−BTA−1BS=C−BTA−1B
detX=detAdetSdetX=detAdetS
inverse 可以分解为S的逆
infu(uv)(ABTBC)(uv)=vTSvinfu(uv)(ABBTC)(uv)=vTSv
X的正定<–>A与S正定,X正定A正定<–>S正定
当A为singular时,Schur补可以由A的pseudo inverse来表示
3. Convex function
3.1 basics
1) restrict to line convex/ extended value function
2) 1st order condition: f(y)≥f(x)+∇f(x)T(y−x)f(y)≥f(x)+∇f(x)T(y−x)
3) 2nd order condition: ∇2f(x)⪰0∇2f(x)⪰0
4) sublevel sets of convex functions are convex sets, converse is not true.
5) Epigraph is convex ⇔⇔ function is convex
Epigraph在(x,f(x))(x,f(x)) 的supporting plane法向为(∇f(x),−1)(∇f(x),−1)
6) f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y) 推广 f(Ex)≤Ef(x)f(Ex)≤Ef(x) 可以称为Jensen’s Inequality
可以用它证明:
ab−−√≤(a+b)/2ab≤(a+b)/2
Holder inequality ∑ni=1xiyi≤(∑ni=1|xi|p)1/p(∑ni=1|yi|q)1/q∑i=1nxiyi≤(∑i=1n|xi|p)1/p(∑i=1n|yi|q)1/q 其中(1/p+1/q=1)
7) examples
f(x)=x2/y with y>0f(x)=x2/y with y>0
log-sum-exp function f(x)=log(ex1+...+exn)f(x)=log(ex1+...+exn) 求二阶导数,用Cauthy 不等式可得
geometric mean f(x)=(∏ni=1xi)1/nf(x)=(∏i=1nxi)1/n 同求二阶导,用Cauthy不等式得concave
log-determinant f(X)=logdetX domf=Sn++f(X)=logdetX domf=S++n 限制到直线上,求导可得
3.2 operations that preserve convexity
1) nonnegative weighted sum –>推广到无限sum
2) affine mapping f(Ax+b)
3) point-wise max f(x)=max(f1(x),...,fn(x))f(x)=max(f1(x),...,fn(x)) –> infinite set g(x)=supyf(x,y)g(x)=supyf(x,y) 给定y,所有的f(x)都是凸函数
sum of r largest component
supporting function of a set(任意集合)f(x)=sup{xTy|y∈C}f(x)=sup{xTy|y∈C}
distance to the farthest point of a setf(x)=supy∈C∥x−y∥f(x)=supy∈C‖x−y‖
maximum eigenvalue of a symmetric matrix f(X)=sup{yTXy|∥y∥2=1}f(X)=sup{yTXy|‖y‖2=1}
operator norm见2. background
所有凸函数都是所有affine under-estimator 函数的supremum(每一点都取supporting plane)
4)Composition
从求二次导数的式子可以得到。h′′(g(x))=h′′(g(x))g′(x)2+h′(g(x))g′′(x)h′′(g(x))=h′′(g(x))g′(x)2+h′(g(x))g′′(x)
推广后并不需要二次可导,只需要h在其extended value function上是nondecreasing或者nonincreasing即可。
这种extended value上限制了h定义域的范围,一定会包括(∞∞)
5) Minimization: ff is convex in , and CC is convex non-empty set, is convex
distance to a convex set
g(x)=inf{h(y)|Ay=x}g(x)=inf{h(y)|Ay=x}
6) Perspective of a function: g(x,t)=tf(x/t)g(x,t)=tf(x/t) 可以由epigraph证明
g(x,t)=xTx/tg(x,t)=xTx/t
g(x,t)=−tlog(x/t)=tlogt−tlogxg(x,t)=−tlog(x/t)=tlogt−tlogx
3.3 conjugate function
1) f∗(y)=supx∈domf(yT−f(x))f∗(y)=supx∈domf(yT−f(x))
2) Affine:−b−b
Negative logarithm: −log(−y)−1 y<0−log(−y)−1 y<0
Exponential: ylogy−y with y≥0ylogy−y with y≥0
Negative entropy: ey−1 y∈Rey−1 y∈R
Inverse: −2(−y)1/2 y≤0−2(−y)1/2 y≤0
Strictly convex quadratic function: f(x)=12xTQx with Q≻0f(x)=12xTQx with Q≻0 f∗(y)=12yTQ−1yf∗(y)=12yTQ−1y
Log-determinant: f∗(Y)=logdet(−Y)−1−nf∗(Y)=logdet(−Y)−1−n
Indicator function: supporting function
3) f(x)+f∗(y)≥xTyf(x)+f∗(y)≥xTy
4) f convex and closed–>f∗∗=ff∗∗=f 没有前提,不成立
5) scaling and affine transformation$$
sum of independent functions$f(u,v)=f_1(u)+f_2(v)$ 则$f^(w,z)=f_1^(w)+f_2^*(z)$