

KL divergence,如果KL的值越大,代表2个分布之间的差异越大,KL的值越小,代表2个分布之间的差异最小。
KL divergence:
DKL(P∣∣Q)=∑i=1NP(xi)logP(xi)Q(xi)
D_{KL}(P||Q) = \sum^N_{i=1} P(x_i)log\frac{P(x_i)}{Q(x_i)}
DKL(P∣∣Q)=i=1∑NP(xi)logQ(xi)P(xi)
JS divergence:
JSD(P∣∣Q)=12D(P∣∣M)+12D(Q∣∣M)M=12(P+Q)
JSD(P||Q) = \frac{1}{2}D(P||M)+ \frac{1}{2}D(Q||M) \\
M = \frac{1}{2}(P+Q)
JSD(P∣∣Q)=21D(P∣∣M)+21D(Q∣∣M)M=21(P+Q)
G∗=argminGDiv(PG,Pdata)x=G(z)PG(x) G^* = arg\min_{G}Div(P_G, P_{data}) \\ x = G(z) \\ P_G(x) G∗=argGminDiv(PG,Pdata)x=G(z)PG(x)

D∗=argmaxDV(D,G)=Pdata(x)Pdata(x)+PG(x)
\begin{aligned}
D^* &=arg \max_D V(D,G) \\&= \frac{P_{data(x)}}{P_{data}(x)+ P_G(x)}
\end{aligned}
D∗=argDmaxV(D,G)=Pdata(x)+PG(x)Pdata(x)
when G is fixed:
maxDV(G,D)=V(G,D∗)=Ex∼Pdatalog(D∗(x))+Ex∼PGlog(1−D∗(x))=−2log2+2JSD(Pdata∣∣PG)
\begin{aligned}
\max_D V(G, D)&=V(G,D^*) \\&=E_{x \sim P_{data}} \log(D^*(x)) + E_{x \sim P_G}\log(1-D^*(x)) \\ &= -2log2 +2JSD(P_{data}||P_G)
\end{aligned}
DmaxV(G,D)=V(G,D∗)=Ex∼Pdatalog(D∗(x))+Ex∼PGlog(1−D∗(x))=−2log2+2JSD(Pdata∣∣PG)
- Since the Jensen–Shannon divergence between two distributions is always non-negative and zero only when they are equal, we have shown that C∗=−log(4)C^∗ = −log(4)C∗=−log(4) is the global minimum of C(G) and that the only solution is pg=pdatap_g = p_{data}pg=pdata, i.e., the generative model perfectly replicating the data generating process
- 证明出V(D,G)V(D,G)V(D,G)和div(Pdata,PG)div(P_{data}, P_G)div(Pdata,PG)是有关系的。
G∗=argminGDiv(PG,Pdata)=argminGmaxDV(G,D) \begin{aligned} G^* &= arg \min_G Div(P_G, P_{data}) \\&= arg \min_G \max_D V(G,D) \end{aligned} G∗=argGminDiv(PG,Pdata)=argGminDmaxV(G,D)

Theoretical Results







