KSG Estimators
KSG估计器是Kraskov在2004年提出的互信息估计器,其原文过于简略,我参考大量文献对细节进行了补充。原文位置:[https://arxiv.org/pdf/cond-mat/0305641.pdf]
KSG估计的基本方法是首先对互信息进行如下分解 (1)
I(X,Y)=H(X)+H(Y)−H(X,Y)
I(X,Y)=H(X)+H(Y)-H(X,Y)
I(X,Y)=H(X)+H(Y)−H(X,Y)
因此,只需要对H(x)H(x)H(x)等三项进行逐个估计即可。
KLE Esitmator
KLE估计器用于估计微分熵。
利用蒙特卡洛估计有(2)
H[x]=−∫μ(x)logμ(x)dx≈−1Nlogμ^(x)
H[x]=-\int\mu(x)log\mu(x)dx \approx -\frac{1}{N}log\hat{\mu}(x)
H[x]=−∫μ(x)logμ(x)dx≈−N1logμ^(x)
Then the key problem is estimating μ^\hat{\mu}μ^(x).
考虑 一个 ϵ−ball\epsilon-ballϵ−ball 以 xix_ixi为中心, 则 (0,ϵ2)(0,\frac{\epsilon}{2})(0,2ϵ)的范围内有k−1k-1k−1个点,(ϵ2,ϵ2+dϵ)(\frac{\epsilon}{2},\frac{\epsilon}{2}+d\epsilon)(2ϵ,2ϵ+dϵ)中间有一个点(the kth neighbor),(ϵ2+dϵ)(\frac{\epsilon}{2}+d\epsilon)(2ϵ+dϵ)之外有剩余的N−k−1N-k-1N−k−1个点,因此首先在除了xix_ixi的N-1个点中挑选k个点,其中一个点位于边缘上,这k个点中剩余选取k-1个点位于最中心范围内,则p(ϵ)p(\epsilon)p(ϵ)是xix_ixi周围ϵ−ball\epsilon-ballϵ−ball的概率质量(mass) (3):
pi(ϵ)=(N−1k)(kk−1)(1−pi(ϵ))N−k−1pi(ϵ)k−1dpi(ϵ)1dϵ=(N−1)!(N−k−1)!1!(k−1)!(1−pi(ϵ))N−k−1pi(ϵ)k−1dpi(ϵ)1dϵ
\begin{aligned}
p_i(\epsilon)=&\tbinom{N-1}{k}\tbinom{k}{k-1}(1-p_i(\epsilon))^{N-k-1}p_i(\epsilon)^{k-1}\frac{dp_i(\epsilon)^1}{d\epsilon}\\
& = \frac{(N-1)!}{(N-k-1)!1!(k-1)!}(1-p_i(\epsilon))^{N-k-1}p_i(\epsilon)^{k-1}\frac{dp_i(\epsilon)^1}{d\epsilon}
\end{aligned}
pi(ϵ)=(kN−1)(k−1k)(1−pi(ϵ))N−k−1pi(ϵ)k−1dϵdpi(ϵ)1=(N−k−1)!1!(k−1)!(N−1)!(1−pi(ϵ))N−k−1pi(ϵ)k−1dϵdpi(ϵ)1
令u=pi(ϵ)u=p_i(\epsilon)u=pi(ϵ),则du=dpi(ϵ)dϵdu=dp_i(\epsilon)d\epsilondu=dpi(ϵ)dϵ
Beta函数定义为beta(p,q)=∫01xp−1(1−x)q−1dx=(p−1)!(q−1)!(p+q−1)!=p+qpq(p+qp)beta(p,q)=\int_0^{1}x^{p-1}(1-x)^{q-1}dx=\frac{(p-1)!(q-1)!}{(p+q-1)!=\frac{\frac{p+q}{pq}}{\tbinom{p+q}{p}}}beta(p,q)=∫01xp−1(1−x)q−1dx=(p+q−1)!=(pp+q)pqp+q(p−1)!(q−1)!
因此,验证积分为1: (4)
∫0∞pi(ϵ)dϵ=∫01k(N−1k)μk−1(1−μ)N−k−1dμ=k(N−1k)beta(k,N−k)=k(N−1k)Nk(N−k)1(Nk)=1
\begin{aligned}
\int_0^{\infin}p_i(\epsilon)d\epsilon&=\int_0^1k\tbinom{N-1}{k}\mu^{k-1}(1-\mu)^{N-k-1}d\mu=k\tbinom{N-1}{k}beta(k,N-k)\\
&=k\tbinom{N-1}{k}\frac{N}{k(N-k)}\frac{1}{\tbinom{N}{k}}=1
\end{aligned}
∫0∞pi(ϵ)dϵ=∫01k(kN−1)μk−1(1−μ)N−k−1dμ=k(kN−1)beta(k,N−k)=k(kN−1)k(N−k)N(kN)1=1
Lemma 1.
X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ′()Γ()X\sim Beta(\alpha, \beta),\mathbb{E}[log(X)]=\psi(\alpha)-\psi(\alpha+\beta),\psi()=\frac{\Gamma'()}{\Gamma()}X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ()Γ′()
证明: (5)
E[log(X)]=∫01B(α,β)−1log(x)xα−1(1−x)β−1dx=B(α,β)−1∫01∂∂αxα−1(1−x)β−1dx
\begin{aligned}
E[log(X)]&=\int_0^1B(\alpha,\beta)^{-1}log(x)x^{\alpha-1}(1-x)^{\beta-1}dx\\
&=B(\alpha,\beta)^{-1}\int_0^1\frac{\partial}{\partial\alpha}x^{\alpha-1}(1-x)^{\beta-1}dx
\end{aligned}
E[log(X)]=∫01B(α,β)−1log(x)xα−1(1−x)β−1dx=B(α,β)−1∫01∂α∂xα−1(1−x)β−1dx
交换微分和积分的顺序(类似于莱布尼兹法则): (6)
E[log(X)]=Beta(α,β)−1∂∂αBeta(α,β)=∂∂αlogB(α,β)
E[log(X)]=Beta(\alpha,\beta)^{-1}\frac{\partial}{\partial\alpha}Beta(\alpha,\beta)=\frac{\partial}{\partial\alpha}logB(\alpha,\beta)
E[log(X)]=Beta(α,β)−1∂α∂Beta(α,β)=∂α∂logB(α,β)
展开Beta函数: (7)
=∂∂α[logΓ(α)+logΓ(β)−logΓ(α+β)]=ψ(α)−ψ(α+β)
=\frac{\partial}{\partial\alpha}[log\Gamma(\alpha)+log\Gamma(\beta)-log\Gamma(\alpha+\beta)]=\psi(\alpha)-\psi(\alpha+\beta)
=∂α∂[logΓ(α)+logΓ(β)−logΓ(α+β)]=ψ(α)−ψ(α+β)
Proposition 2.
E[logpi(ϵ)]=ψ(k)−ψ(N)E[logp_i(\epsilon)]=\psi(k)-\psi(N)E[logpi(ϵ)]=ψ(k)−ψ(N)
使用Lemma 1可得。
熵的估计
假设在xix_ixi的邻域范围内,概率密度μ(x)\mu(x)μ(x)是均匀的,则有 (8)
pi(ϵ)≈cdϵdμ(xi)
p_i(\epsilon)\approx c_d\epsilon^d\mu(x_i)
pi(ϵ)≈cdϵdμ(xi)
其中,cdc_dcd是n维的单位球体体积,定义为∫∥s∥<1/2ds\int_{\Vert s \Vert<1/2}ds∫∥s∥<1/2ds,对于某些范数。
如果是2范数,则n维球体积公式 (9)
Vn=πn2RdΓ(1+n2)
V_n=\frac{\pi^{\frac{n}{2}}R^d}{\Gamma(1+\frac{n}{2})}
Vn=Γ(1+2n)π2nRd
R=1R=1R=1时即为cdc_dcd,再乘以RdR^dRd。
因此,该区域概率密度μ(xi)\mu(x_i)μ(xi)乘以体积cdϵdc_d\epsilon^dcdϵd,得到该区域概率质量pi(ϵ)p_i(\epsilon)pi(ϵ)
根据(8),有 (10)
E[−logμ(Xi)]≈−E[logpi(ϵ)]+logcd+dE[logϵ]=−ϕ(k)+ϕ(N)+logcd+dE[logϵ]≈−ϕ(k)+ϕ(N)+logcd+dN∑i=1Nlogϵi
\begin{aligned}
E[-log\mu(X_i)]&\approx-E[logp_i(\epsilon)]+logc_d+dE[log\epsilon]\\
&=-\phi(k)+\phi(N)+logc_d+dE[log\epsilon]\\
&\approx-\phi(k)+\phi(N)+logc_d+\frac{d}{N}\sum_{i=1}^{N}log\epsilon_i
\end{aligned}
E[−logμ(Xi)]≈−E[logpi(ϵ)]+logcd+dE[logϵ]=−ϕ(k)+ϕ(N)+logcd+dE[logϵ]≈−ϕ(k)+ϕ(N)+logcd+Ndi=1∑Nlogϵi
其中,最后的ϵi\epsilon_iϵi表示第iii个点到他第kth最近邻的距离。
KSG1估计器
首先,H(X)H(X)H(X)和H(Y)H(Y)H(Y)可以直接使用(10)进行估计。
其次,使用(10),联合熵的一个估计是 (11):
H^(X,Y)=−ψ(k)+ψ(N)+log(cdz)+dzN∑logϵi
\hat{H}(X,Y)=-\psi(k)+\psi(N)+log(c_{dz})+\frac{dz}{N}\sum log\epsilon_i
H^(X,Y)=−ψ(k)+ψ(N)+log(cdz)+Ndz∑logϵi
其中,dz=dx+dydz=dx+dydz=dx+dy,即Z的维度是x和y维度之和。且对于球的体积,KSG采用的最大范数进行的度量,也就是最大范数下N维空间球的体积,因此区域看上去是n维的立方体。
其中可以注意到: (12)
cdz=cdxcdy
c_{dz}=c_{dx}c_{dy}
cdz=cdxcdy
我思考时类比于正方体的底面积乘以高,这一点在下面(23)式中使用。
然而,使用固定的k估计的H(X)H(Y)H(X,Y)H(X)H(Y)H(X,Y)H(X)H(Y)H(X,Y)直接对I(X,Y)I(X,Y)I(X,Y)进行估计,存在问题。原因是,(10)式的偏差主要来自于对(8)式假设的均匀概率密度的违反,因此这三项估计都因此会存在偏差。而对于一个固定的k来说,联合空间上的第k近邻的距离要远于边际空间,因此实际上同样是kth近邻,空间的大小不同,因此概率密度违反(8)式假设的大小也不同,因此会导致偏差非常不同从而无法抵消(且观看下图图示,以1近邻为例)。
为了避免这一点,由于(10)对任意k都成立,因此不必使用固定的k。KSG1的方法是首先找到Joint上kth近邻,然后在X marginnal space上确定其距离ϵ(i)\epsilon(i)ϵ(i),在该距离内的点的个数作为nx(i)n_x(i)nx(i),则到joint上第kth点的距离正是ϵ(i)\epsilon(i)ϵ(i),而这个点在Xmarginal上则必然是第nx(i)+1n_x(i)+1nx(i)+1个点。对Y marginal space用同样大小的距离ϵ(i)\epsilon(i)ϵ(i)做同等操作。具体的情况如下图所示,此时Joint上是1th近邻,而此时x方向上产生5近邻,而y方向上是3近邻。
此时对于H(X)H(X)H(X)来说,则有(考虑(2)式和(10)式中第一个约等号的蒙特卡洛近似,我未必写的非常非常详细,如果这里还需过于详细的介绍,则无需再看此文,而是重新补习统计相关知识): (13)
H^(X)=−1N∑i=1Nψ(nx(i)+1)+ψ(N)+log(cdz)+dxN∑logϵi
\hat{H}(X)=-\frac{1}{N}\sum_{i=1}^{N}\psi(n_x(i)+1)+\psi(N)+log(c_{dz})+\frac{dx}{N}\sum log\epsilon_i
H^(X)=−N1i=1∑Nψ(nx(i)+1)+ψ(N)+log(cdz)+Ndx∑logϵi
对于Y,则利用同样的距离找到ny(i)+1n_y(i)+1ny(i)+1进行估计,导致Y的估计不准,因为y到其kth点的距离不是ϵ2\frac{\epsilon}{2}2ϵ。
KSG2估计器
KSG2估计器更为复杂,其考虑在两个marginal的方向上采用不同的距离,此时图示joint的kth区域不再是超立方体区域,而是超长方体区域。首先根据joint上kth近邻的位置分别确定x marginal上的距离和y marginal上的距离(已经与KSG1不同,该方法确定x marginal上的距离后,直接将该距离应用于 y marginal上,因此是超正方体),此时点落在xi±ϵx(i)/2x_i\pm\epsilon_x(i)/2xi±ϵx(i)/2和yi±ϵy(i)/2y_i\pm\epsilon_y(i)/2yi±ϵy(i)/2之内。
此时存在两种情况,即ϵx(i)\epsilon_x(i)ϵx(i)和ϵy(i)\epsilon_y(i)ϵy(i)由同一点决定或由两点决定,具体见下图。
故考虑在将原先的超正方体概率替换为超长方体的概率,该超长方体的概率是两种情况的混合,则 (14)
Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)
P_k(\epsilon_x,\epsilon_y)=P_k^b(\epsilon_x,\epsilon_y)+P_k^c(\epsilon_x,\epsilon_y)
Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)
其中第一种时,有k-1个点在最内,中间的微分区域(逻辑与(3)相同)有1个点,外部有N-1-k个点在最外,若不失一般性的假设ϵx<ϵy\epsilon_x<\epsilon_yϵx<ϵy,该种状况发生的概率表达为 (15)
Pkb(ϵx,ϵy)=(N−1k)(kk−1)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=k(N−1k)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1
\begin{aligned}
P_k^b(\epsilon_x,\epsilon_y)&=\tbinom{N-1}{k}\tbinom{k}{k-1}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\
&=k\tbinom{N-1}{k}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}
\end{aligned}
Pkb(ϵx,ϵy)=(kN−1)(k−1k)qik−1(∂ϵx∂ϵy∂2qi)(1−pi(ϵy))N−k−1=k(kN−1)qik−1(∂ϵx∂ϵy∂2qi)(1−pi(ϵy))N−k−1
原文的表述将qiq_iqi吸进了偏导项中,其中qiq_iqi表示的是上述超长方体的区域,而pip_ipi则仍然是由较大的ϵy\epsilon_yϵy确定的超正方体区域。使用pip_ipi是必须的,因为使用了最大范数,即最远的点的距离,保证了如果该正方形内的点都在矩形中,反之则不行。
对于第二种情况,此时有k-2个点在最内,而2个点在微分区域,N-k-1个点在外围区域,因此此种状况发生的概率为 (16)
Pkc(ϵx,ϵy)=(N−1k)(kk−2)qik−2(∂2qi2∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=(N−1k)(kk−2)2qiqik−2(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=k(k−1)(N−1k)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1
\begin{aligned}
P_k^c(\epsilon_x,\epsilon_y)&=\tbinom{N-1}{k}\tbinom{k}{k-2}q_i^{k-2}(\frac{\partial ^2 q_i^2}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\
&=\tbinom{N-1}{k}\tbinom{k}{k-2}2q_iq_i^{k-2}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\
&=k(k-1)\tbinom{N-1}{k}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}
\end{aligned}
Pkc(ϵx,ϵy)=(kN−1)(k−2k)qik−2(∂ϵx∂ϵy∂2qi2)(1−pi(ϵy))N−k−1=(kN−1)(k−2k)2qiqik−2(∂ϵx∂ϵy∂2qi)(1−pi(ϵy))N−k−1=k(k−1)(kN−1)qik−1(∂ϵx∂ϵy∂2qi)(1−pi(ϵy))N−k−1
同时注意到,由于X和Y上是独立的,根据 (17)
u(∂q∂x∂y)=(∂q∂x)(∂q∂y)
u(\frac{\partial q}{\partial x\partial y})=(\frac{\partial q}{\partial x})(\frac{\partial q}{\partial y})
u(∂x∂y∂q)=(∂x∂q)(∂y∂q)
故(16)进一步拆解为 (18)
k(k−1)(N−1k)qik−2(∂q∂x)(∂q∂y)(1−pi(ϵy))N−k−1
k(k-1)\tbinom{N-1}{k}q_i^{k-2}(\frac{\partial q}{\partial x})(\frac{\partial q}{\partial y})(1-p_i(\epsilon_y))^{N-k-1}
k(k−1)(kN−1)qik−2(∂x∂q)(∂y∂q)(1−pi(ϵy))N−k−1
验证Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)P_k(\epsilon_x,\epsilon_y)=P_k^b(\epsilon_x,\epsilon_y)+P_k^c(\epsilon_x,\epsilon_y)Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)的积分为1. (19)
∫0∞∫0ϵyPk(ϵx,ϵy)dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂∂x(kqik−1(∂qi∂ϵy))dϵxdϵy=k(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dϵy=k(N−1k)Beta(k,N−k)=1
\begin{aligned}
&\int_0^{\infin}\int_0^{\epsilon_y}P_k(\epsilon_x,\epsilon_y)d\epsilon_xd\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}\int_0^{\epsilon_y}\frac{\partial}{\partial x}({kq_i^{k-1}(\frac{\partial q_i}{\partial\epsilon_y}}))d\epsilon_xd\epsilon_y\\
&=k\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}d\epsilon_y\\
&=k\tbinom{N-1}{k}Beta(k,N-k)=1
\end{aligned}
∫0∞∫0ϵyPk(ϵx,ϵy)dϵxdϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂x∂(kqik−1(∂ϵy∂qi))dϵxdϵy=k(kN−1)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dϵy=k(kN−1)Beta(k,N−k)=1
下面证明E[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−1/kE[logq_i(\epsilon_x,\epsilon_y)]=\psi(k)-\psi(N)-1/kE[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−1/k,首先在第二个积分号进行分部积分: (20)
E[logqi(ϵx,ϵy)]=∫0∞∫0ϵylog(qi)Pk(ϵx,ϵy)dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂∂x(kqik−1(∂qi∂ϵy))dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂qi∂ϵy)∣0ϵy−∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx]dϵy
\begin{aligned}
E[logq_i(\epsilon_x,\epsilon_y)]&=\int_0^{\infin}\int_0^{\epsilon_y}log(q_i)P_k(\epsilon_x,\epsilon_y)d\epsilon_xd\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}\int_0^{\epsilon_y}\frac{\partial}{\partial x}({kq_i^{k-1}(\frac{\partial q_i}{\partial\epsilon_y}}))d\epsilon_xd\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}[log(q_i)kq_i^{k-1}(\frac{\partial q_i}{\partial \epsilon_y})|_{0}^{\epsilon_y}\\&-\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x]d\epsilon_y\\
\end{aligned}
E[logqi(ϵx,ϵy)]=∫0∞∫0ϵylog(qi)Pk(ϵx,ϵy)dϵxdϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂x∂(kqik−1(∂ϵy∂qi))dϵxdϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂ϵy∂qi)∣0ϵy−∫0ϵy{kqik−2∂ϵx∂qi(∂ϵy∂qi)}dϵx]dϵy
由最开始的Lemma1可知X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ′()Γ()X\sim Beta(\alpha, \beta),\mathbb{E}[log(X)]=\psi(\alpha)-\psi(\alpha+\beta),\psi()=\frac{\Gamma'()}{\Gamma()}X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ()Γ′()
可以观察到第一部分: (21)
=(N−1k)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂qi∂ϵy)∣0ϵydϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂pi(ϵy)∂ϵy)dϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂pi(ϵy)∂ϵy)dϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1dpi(ϵy)=(N−1k)∫01(1−u)N−k−1log(u)kuk−1du=ψ(k)−ψ(N)
\begin{aligned}
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}[log(q_i)kq_i^{k-1}(\frac{\partial q_i}{\partial \epsilon_y})|_{0}^{\epsilon_y}d\epsilon_xd\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}(\frac{\partial p_i(\epsilon_y)}{\partial \epsilon_y})d\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}(\frac{\partial p_i(\epsilon_y)}{\partial \epsilon_y})d\epsilon_y\\
&=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}dp_i(\epsilon_y)\\
&=\tbinom{N-1}{k}\int_0^{1}(1-u)^{N-k-1}log(u)ku^{k-1}du=\psi(k)-\psi(N)
\end{aligned}
=(kN−1)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂ϵy∂qi)∣0ϵydϵxdϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂ϵy∂pi(ϵy))dϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂ϵy∂pi(ϵy))dϵy=(kN−1)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1dpi(ϵy)=(kN−1)∫01(1−u)N−k−1log(u)kuk−1du=ψ(k)−ψ(N)
其中,注意到qi(0)=0q_i(0)=0qi(0)=0和当ϵx=ϵy\epsilon_x=\epsilon_yϵx=ϵy时有q(ϵy,ϵy)q(\epsilon_y,\epsilon_y)q(ϵy,ϵy)为一个超立方体,恰是pi(ϵy)p_i(\epsilon_y)pi(ϵy),最后的部分由前面Prososition 2.的结论得出。
第二部分: (22)
−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy
\begin{aligned}
&-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y
\end{aligned}
−(kN−1)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂ϵx∂qi(∂ϵy∂qi)}dϵx)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂ϵx∂qi(∂ϵy∂qi)}dϵx)dϵy
由局部均匀假设: (23)
qi(ϵx,ϵy)≈μ(xi,yi)cdxcdyϵxdxϵydy→∂∂ϵyqi(ϵx,ϵy)=μ(xi,yi)cdxcdyϵxdxdyϵydy−1=qi(ϵx,ϵy)dyϵy
\begin{aligned}
&q_i(\epsilon_x,\epsilon_y)\approx\mu(x_i,y_i)c_{dx}c_{dy}\epsilon_x^{dx}\epsilon_y^{dy}\\
\rightarrow&\frac{\partial}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)=\mu(x_i,y_i)c_{dx}c_{dy}\epsilon_x^{dx}dy\epsilon_y^{dy-1}=q_i(\epsilon_x,\epsilon_y)\frac{dy}{\epsilon_y}
\end{aligned}
→qi(ϵx,ϵy)≈μ(xi,yi)cdxcdyϵxdxϵydy∂ϵy∂qi(ϵx,ϵy)=μ(xi,yi)cdxcdyϵxdxdyϵydy−1=qi(ϵx,ϵy)ϵydy
并且由刚才分部积分时提及的当ϵx=ϵy\epsilon_x=\epsilon_yϵx=ϵy时有 q(ϵy,ϵy)q(\epsilon_y,\epsilon_y)q(ϵy,ϵy)为一个超立方体,因此 (24)
∂∂ϵyqi(ϵx,ϵy)∣ϵx=ϵy=pi(ϵy)dyϵy
\frac{\partial}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}=p_i(\epsilon_y)\frac{dy}{\epsilon_y}
∂ϵy∂qi(ϵx,ϵy)∣ϵx=ϵy=pi(ϵy)ϵydy
则(22)继续简化有: (25)
=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(qidyϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)(∫0ϵy{kqik−1∂qi∂ϵx}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy
\begin{aligned}
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(q_i\frac{dy}{\epsilon_y})\}d\epsilon_x)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})(\int_0^{\epsilon_y}\{kq_i^{k-1}\frac{\partial q_i}{\partial \epsilon_x}\}d\epsilon_x)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y
\end{aligned}
=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂ϵx∂qi(∂ϵy∂qi)}dϵx)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂ϵx∂qi(qiϵydy)}dϵx)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(ϵydy)(∫0ϵy{kqik−1∂ϵx∂qi}dϵx)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(ϵydy)pi(ϵy)k)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(ϵydy)pi(ϵy)k)dϵy
其中第三个等号注意到了dyϵy\frac{dy}{\epsilon_y}ϵydy是常数,注意dydydy是YYY的维度不是微分。第四个等号是将kqik−1kq_i^{k-1}kqik−1吸收到微分,再由积分所得。
进一步根据(24)有:(26)
ddϵypi(ϵy)=d∂ϵyqi(ϵx,ϵy)∣ϵx=ϵy=qi(ϵx,ϵy)∣ϵx=ϵydydϵy=pi(ϵy)dyϵy
\frac{d}{d\epsilon_y}p_i(\epsilon_y)=\frac{d}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}=q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}\frac{dy}{d\epsilon_y}=p_i(\epsilon_y)\frac{dy}{\epsilon_y}
dϵydpi(ϵy)=∂ϵydqi(ϵx,ϵy)∣ϵx=ϵy=qi(ϵx,ϵy)∣ϵx=ϵydϵydy=pi(ϵy)ϵydy
因此,(25)简化为(26)
=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1∂∂ypi(ϵy)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dpi=−(N−1k)Beta(k,N−k)=−1k
\begin{aligned}
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}\frac{\partial}{\partial y}p_i(\epsilon_y)d\epsilon_y\\
&=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}dp_i=-\tbinom{N-1}{k}Beta(k,N-k)\\
&=-\frac{1}{k}
\end{aligned}
=−(kN−1)∫0∞(1−pi(ϵy))N−k−1(ϵydy)pi(ϵy)k)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1∂y∂pi(ϵy)dϵy=−(kN−1)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dpi=−(kN−1)Beta(k,N−k)=−k1
联立(21)和(26),得到(27)
E[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−1k
E[logq_i(\epsilon_x,\epsilon_y)]=\psi(k)-\psi(N)-\frac{1}{k}
E[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−k1
进一步的,根据公式(23),自然可得(28)
logμ(xi,yi)≈logqi(ϵx,ϵy)−log(cdxcdy)−dxlogϵx−dylogϵy
\begin{aligned}
log\mu(x_i,y_i)\approx logq_i(\epsilon_x,\epsilon_y)-log(c_{dx}c_{dy})-dxlog\epsilon_x-dylog\epsilon_y
\end{aligned}
logμ(xi,yi)≈logqi(ϵx,ϵy)−log(cdxcdy)−dxlogϵx−dylogϵy
求期望有:(29)
E[logμ(xi,xy)]=ψ(k)−ψ(N)−1k−log(cdxcdy)−dx1N∑logϵx−dy1N∑logϵy
E[log\mu(x_i,x_y)]=\\\psi(k)-\psi(N)-\frac{1}{k}-log(c_{dx}c_{dy})-dx\frac{1}{N}\sum log\epsilon_x-dy\frac{1}{N}\sum log\epsilon_y
E[logμ(xi,xy)]=ψ(k)−ψ(N)−k1−log(cdxcdy)−dxN1∑logϵx−dyN1∑logϵy
再进一步利用(1)式可得KSG2估计器:(30)
I(2)(X,Y)=ψ(k)+ψ(N)−1k−1N∑ψ(nx)−1N∑ψ(ny)
I^{(2)}(X,Y)=\psi(k)+\psi(N)-\frac{1}{k}-\frac{1}{N}\sum\psi(n_x)-\frac{1}{N}\sum\psi(n_y)
I(2)(X,Y)=ψ(k)+ψ(N)−k1−N1∑ψ(nx)−N1∑ψ(ny)
注意,这里是ψ(nx)\psi(n_x)ψ(nx),这是因为nxn_xnx此时定义为距离小于等于ϵx2\frac{\epsilon_x}{2}2ϵx的点的个数。
至此,KSG估计器的形式推导完毕。