KSG互信息估计器的原理详细推导(Kraskov, 2004)

本文介绍了KSG估计器,它是Kraskov在2004年提出的互信息估计器。通过对互信息分解,对H(x)等三项逐个估计。还阐述了KLE估计器用于估计微分熵,以及KSG1、KSG2估计器的原理和推导过程,解决了固定k估计存在的偏差问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

KSG Estimators

KSG估计器是Kraskov在2004年提出的互信息估计器,其原文过于简略,我参考大量文献对细节进行了补充。原文位置:[https://arxiv.org/pdf/cond-mat/0305641.pdf]

KSG估计的基本方法是首先对互信息进行如下分解 (1)
I(X,Y)=H(X)+H(Y)−H(X,Y) I(X,Y)=H(X)+H(Y)-H(X,Y) I(X,Y)=H(X)+H(Y)H(X,Y)
因此,只需要对H(x)H(x)H(x)等三项进行逐个估计即可。

KLE Esitmator

KLE估计器用于估计微分熵。

利用蒙特卡洛估计有(2)
H[x]=−∫μ(x)logμ(x)dx≈−1Nlogμ^(x) H[x]=-\int\mu(x)log\mu(x)dx \approx -\frac{1}{N}log\hat{\mu}(x) H[x]=μ(x)lo(x)dxN1logμ^(x)
Then the key problem is estimating μ^\hat{\mu}μ^(x).

考虑 一个 ϵ−ball\epsilon-ballϵballxix_ixi为中心, 则 (0,ϵ2)(0,\frac{\epsilon}{2})(0,2ϵ)的范围内有k−1k-1k1个点,(ϵ2,ϵ2+dϵ)(\frac{\epsilon}{2},\frac{\epsilon}{2}+d\epsilon)(2ϵ,2ϵ+dϵ)中间有一个点(the kth neighbor),(ϵ2+dϵ)(\frac{\epsilon}{2}+d\epsilon)(2ϵ+dϵ)之外有剩余的N−k−1N-k-1Nk1​个点,因此首先在除了xix_ixi的N-1个点中挑选k个点,其中一个点位于边缘上,这k个点中剩余选取k-1个点位于最中心范围内,则p(ϵ)p(\epsilon)p(ϵ)xix_ixi周围ϵ−ball\epsilon-ballϵball的概率质量(mass) (3):
pi(ϵ)=(N−1k)(kk−1)(1−pi(ϵ))N−k−1pi(ϵ)k−1dpi(ϵ)1dϵ=(N−1)!(N−k−1)!1!(k−1)!(1−pi(ϵ))N−k−1pi(ϵ)k−1dpi(ϵ)1dϵ \begin{aligned} p_i(\epsilon)=&\tbinom{N-1}{k}\tbinom{k}{k-1}(1-p_i(\epsilon))^{N-k-1}p_i(\epsilon)^{k-1}\frac{dp_i(\epsilon)^1}{d\epsilon}\\ & = \frac{(N-1)!}{(N-k-1)!1!(k-1)!}(1-p_i(\epsilon))^{N-k-1}p_i(\epsilon)^{k-1}\frac{dp_i(\epsilon)^1}{d\epsilon} \end{aligned} pi(ϵ)=(kN1)(k1k)(1pi(ϵ))Nk1pi(ϵ)k1dϵdpi(ϵ)1=(Nk1)!1!(k1)!(N1)!(1pi(ϵ))Nk1pi(ϵ)k1dϵdpi(ϵ)1
u=pi(ϵ)u=p_i(\epsilon)u=pi(ϵ),则du=dpi(ϵ)dϵdu=dp_i(\epsilon)d\epsilondu=dpi(ϵ)dϵ
Beta函数定义为beta(p,q)=∫01xp−1(1−x)q−1dx=(p−1)!(q−1)!(p+q−1)!=p+qpq(p+qp)beta(p,q)=\int_0^{1}x^{p-1}(1-x)^{q-1}dx=\frac{(p-1)!(q-1)!}{(p+q-1)!=\frac{\frac{p+q}{pq}}{\tbinom{p+q}{p}}}beta(p,q)=01xp1(1x)q1dx=(p+q1)!=(pp+q)pqp+q(p1)!(q1)!
因此,验证积分为1: (4)
∫0∞pi(ϵ)dϵ=∫01k(N−1k)μk−1(1−μ)N−k−1dμ=k(N−1k)beta(k,N−k)=k(N−1k)Nk(N−k)1(Nk)=1 \begin{aligned} \int_0^{\infin}p_i(\epsilon)d\epsilon&=\int_0^1k\tbinom{N-1}{k}\mu^{k-1}(1-\mu)^{N-k-1}d\mu=k\tbinom{N-1}{k}beta(k,N-k)\\ &=k\tbinom{N-1}{k}\frac{N}{k(N-k)}\frac{1}{\tbinom{N}{k}}=1 \end{aligned} 0pi(ϵ)dϵ=01k(kN1)μk1(1μ)Nk1dμ=k(kN1)beta(k,Nk)=k(kN1)k(Nk)N(kN)1=1

Lemma 1.

X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ′()Γ()X\sim Beta(\alpha, \beta),\mathbb{E}[log(X)]=\psi(\alpha)-\psi(\alpha+\beta),\psi()=\frac{\Gamma'()}{\Gamma()}XBeta(α,β),E[log(X)]=ψ(α)ψ(α+β),ψ()=Γ()Γ()

证明: (5)
E[log(X)]=∫01B(α,β)−1log(x)xα−1(1−x)β−1dx=B(α,β)−1∫01∂∂αxα−1(1−x)β−1dx \begin{aligned} E[log(X)]&=\int_0^1B(\alpha,\beta)^{-1}log(x)x^{\alpha-1}(1-x)^{\beta-1}dx\\ &=B(\alpha,\beta)^{-1}\int_0^1\frac{\partial}{\partial\alpha}x^{\alpha-1}(1-x)^{\beta-1}dx \end{aligned} E[log(X)]=01B(α,β)1log(x)xα1(1x)β1dx=B(α,β)101αxα1(1x)β1dx
交换微分和积分的顺序(类似于莱布尼兹法则): (6)
E[log(X)]=Beta(α,β)−1∂∂αBeta(α,β)=∂∂αlogB(α,β) E[log(X)]=Beta(\alpha,\beta)^{-1}\frac{\partial}{\partial\alpha}Beta(\alpha,\beta)=\frac{\partial}{\partial\alpha}logB(\alpha,\beta) E[log(X)]=Beta(α,β)1αBeta(α,β)=αlogB(α,β)
展开Beta函数: (7)
=∂∂α[logΓ(α)+logΓ(β)−logΓ(α+β)]=ψ(α)−ψ(α+β) =\frac{\partial}{\partial\alpha}[log\Gamma(\alpha)+log\Gamma(\beta)-log\Gamma(\alpha+\beta)]=\psi(\alpha)-\psi(\alpha+\beta) =α[logΓ(α)+logΓ(β)logΓ(α+β)]=ψ(α)ψ(α+β)

Proposition 2.

E[logpi(ϵ)]=ψ(k)−ψ(N)E[logp_i(\epsilon)]=\psi(k)-\psi(N)E[logpi(ϵ)]=ψ(k)ψ(N)

使用Lemma 1可得。

熵的估计

假设在xix_ixi的邻域范围内,概率密度μ(x)\mu(x)μ(x)是均匀的,则有 (8)
pi(ϵ)≈cdϵdμ(xi) p_i(\epsilon)\approx c_d\epsilon^d\mu(x_i) pi(ϵ)cdϵdμ(xi)
其中,cdc_dcd是n维的单位球体体积,定义为∫∥s∥<1/2ds\int_{\Vert s \Vert<1/2}dss<1/2ds,对于某些范数。

如果是2范数,则n维球体积公式 (9)
Vn=πn2RdΓ(1+n2) V_n=\frac{\pi^{\frac{n}{2}}R^d}{\Gamma(1+\frac{n}{2})} Vn=Γ(1+2n)π2nRd
R=1R=1R=1时即为cdc_dcd,再乘以RdR^dRd

因此,该区域概率密度μ(xi)\mu(x_i)μ(xi)乘以体积cdϵdc_d\epsilon^dcdϵd,得到该区域概率质量pi(ϵ)p_i(\epsilon)pi(ϵ)

根据(8),有 (10)
E[−logμ(Xi)]≈−E[logpi(ϵ)]+logcd+dE[logϵ]=−ϕ(k)+ϕ(N)+logcd+dE[logϵ]≈−ϕ(k)+ϕ(N)+logcd+dN∑i=1Nlogϵi \begin{aligned} E[-log\mu(X_i)]&\approx-E[logp_i(\epsilon)]+logc_d+dE[log\epsilon]\\ &=-\phi(k)+\phi(N)+logc_d+dE[log\epsilon]\\ &\approx-\phi(k)+\phi(N)+logc_d+\frac{d}{N}\sum_{i=1}^{N}log\epsilon_i \end{aligned} E[lo(Xi)]E[logpi(ϵ)]+logcd+dE[logϵ]=ϕ(k)+ϕ(N)+logcd+dE[logϵ]ϕ(k)+ϕ(N)+logcd+Ndi=1Nlogϵi
其中,最后的ϵi\epsilon_iϵi表示第iii个点到他第kth最近邻的距离。

KSG1估计器

首先,H(X)H(X)H(X)H(Y)H(Y)H(Y)可以直接使用(10)进行估计。

其次,使用(10),联合熵的一个估计是 (11):
H^(X,Y)=−ψ(k)+ψ(N)+log(cdz)+dzN∑logϵi \hat{H}(X,Y)=-\psi(k)+\psi(N)+log(c_{dz})+\frac{dz}{N}\sum log\epsilon_i H^(X,Y)=ψ(k)+ψ(N)+log(cdz)+Ndzlogϵi
其中,dz=dx+dydz=dx+dydz=dx+dy,即Z的维度是x和y维度之和。且对于球的体积,KSG采用的最大范数进行的度量,也就是最大范数下N维空间球的体积,因此区域看上去是n维的立方体。

其中可以注意到: (12)
cdz=cdxcdy c_{dz}=c_{dx}c_{dy} cdz=cdxcdy
我思考时类比于正方体的底面积乘以高,这一点在下面(23)式中使用。

然而,使用固定的k估计的H(X)H(Y)H(X,Y)H(X)H(Y)H(X,Y)H(X)H(Y)H(X,Y)直接对I(X,Y)I(X,Y)I(X,Y)进行估计,存在问题。原因是,(10)式的偏差主要来自于对(8)式假设的均匀概率密度的违反,因此这三项估计都因此会存在偏差。而对于一个固定的k来说,联合空间上的第k近邻的距离要远于边际空间,因此实际上同样是kth近邻,空间的大小不同,因此概率密度违反(8)式假设的大小也不同,因此会导致偏差非常不同从而无法抵消(且观看下图图示,以1近邻为例)。
在这里插入图片描述

为了避免这一点,由于(10)对任意k都成立,因此不必使用固定的k。KSG1的方法是首先找到Joint上kth近邻,然后在X marginnal space上确定其距离ϵ(i)\epsilon(i)ϵ(i),在该距离内的点的个数作为nx(i)n_x(i)nx(i),则到joint上第kth点的距离正是ϵ(i)\epsilon(i)ϵ(i),而这个点在Xmarginal上则必然是第nx(i)+1n_x(i)+1nx(i)+1个点。对Y marginal space用同样大小的距离ϵ(i)\epsilon(i)ϵ(i)做同等操作。具体的情况如下图所示,此时Joint上是1th近邻,而此时x方向上产生5近邻,而y方向上是3近邻。
在这里插入图片描述

此时对于H(X)H(X)H(X)来说,则有(考虑(2)式和(10)式中第一个约等号的蒙特卡洛近似,我未必写的非常非常详细,如果这里还需过于详细的介绍,则无需再看此文,而是重新补习统计相关知识): (13)
H^(X)=−1N∑i=1Nψ(nx(i)+1)+ψ(N)+log(cdz)+dxN∑logϵi \hat{H}(X)=-\frac{1}{N}\sum_{i=1}^{N}\psi(n_x(i)+1)+\psi(N)+log(c_{dz})+\frac{dx}{N}\sum log\epsilon_i H^(X)=N1i=1Nψ(nx(i)+1)+ψ(N)+log(cdz)+Ndxlogϵi
对于Y,则利用同样的距离找到ny(i)+1n_y(i)+1ny(i)+1进行估计,导致Y的估计不准,因为y到其kth点的距离不是ϵ2\frac{\epsilon}{2}2ϵ

KSG2估计器

KSG2估计器更为复杂,其考虑在两个marginal的方向上采用不同的距离,此时图示joint的kth区域不再是超立方体区域,而是超长方体区域。首先根据joint上kth近邻的位置分别确定x marginal上的距离和y marginal上的距离(已经与KSG1不同,该方法确定x marginal上的距离后,直接将该距离应用于 y marginal上,因此是超正方体),此时点落在xi±ϵx(i)/2x_i\pm\epsilon_x(i)/2xi±ϵx(i)/2yi±ϵy(i)/2y_i\pm\epsilon_y(i)/2yi±ϵy(i)/2之内。

此时存在两种情况,即ϵx(i)\epsilon_x(i)ϵx(i)ϵy(i)\epsilon_y(i)ϵy(i)由同一点决定由两点决定,具体见下图。
在这里插入图片描述

故考虑在将原先的超正方体概率替换为超长方体的概率,该超长方体的概率是两种情况的混合,则 (14)
Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy) P_k(\epsilon_x,\epsilon_y)=P_k^b(\epsilon_x,\epsilon_y)+P_k^c(\epsilon_x,\epsilon_y) Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)
其中第一种时,有k-1个点在最内,中间的微分区域(逻辑与(3)相同)有1个点,外部有N-1-k个点在最外,若不失一般性的假设ϵx<ϵy\epsilon_x<\epsilon_yϵx<ϵy,该种状况发生的概率表达为 (15)
Pkb(ϵx,ϵy)=(N−1k)(kk−1)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=k(N−1k)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1 \begin{aligned} P_k^b(\epsilon_x,\epsilon_y)&=\tbinom{N-1}{k}\tbinom{k}{k-1}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\ &=k\tbinom{N-1}{k}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1} \end{aligned} Pkb(ϵx,ϵy)=(kN1)(k1k)qik1(ϵxϵy2qi)(1pi(ϵy))Nk1=k(kN1)qik1(ϵxϵy2qi)(1pi(ϵy))Nk1
原文的表述将qiq_iqi吸进了偏导项中,其中qiq_iqi表示的是上述超长方体的区域,而pip_ipi则仍然是由较大的ϵy\epsilon_yϵy确定的超正方体区域。使用pip_ipi是必须的,因为使用了最大范数,即最远的点的距离,保证了如果该正方形内的点都在矩形中,反之则不行。

对于第二种情况,此时有k-2个点在最内,而2个点在微分区域,N-k-1个点在外围区域,因此此种状况发生的概率为 (16)
Pkc(ϵx,ϵy)=(N−1k)(kk−2)qik−2(∂2qi2∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=(N−1k)(kk−2)2qiqik−2(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1=k(k−1)(N−1k)qik−1(∂2qi∂ϵx∂ϵy)(1−pi(ϵy))N−k−1 \begin{aligned} P_k^c(\epsilon_x,\epsilon_y)&=\tbinom{N-1}{k}\tbinom{k}{k-2}q_i^{k-2}(\frac{\partial ^2 q_i^2}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\ &=\tbinom{N-1}{k}\tbinom{k}{k-2}2q_iq_i^{k-2}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1}\\ &=k(k-1)\tbinom{N-1}{k}q_i^{k-1}(\frac{\partial ^2 q_i}{\partial\epsilon_x\partial\epsilon_y})(1-p_i(\epsilon_y))^{N-k-1} \end{aligned} Pkc(ϵx,ϵy)=(kN1)(k2k)qik2(ϵxϵy2qi2)(1pi(ϵy))Nk1=(kN1)(k2k)2qiqik2(ϵxϵy2qi)(1pi(ϵy))Nk1=k(k1)(kN1)qik1(ϵxϵy2qi)(1pi(ϵy))Nk1
同时注意到,由于X和Y上是独立的,根据 (17)
u(∂q∂x∂y)=(∂q∂x)(∂q∂y) u(\frac{\partial q}{\partial x\partial y})=(\frac{\partial q}{\partial x})(\frac{\partial q}{\partial y}) u(xyq)=(xq)(yq)
故(16)进一步拆解为 (18)
k(k−1)(N−1k)qik−2(∂q∂x)(∂q∂y)(1−pi(ϵy))N−k−1 k(k-1)\tbinom{N-1}{k}q_i^{k-2}(\frac{\partial q}{\partial x})(\frac{\partial q}{\partial y})(1-p_i(\epsilon_y))^{N-k-1} k(k1)(kN1)qik2(xq)(yq)(1pi(ϵy))Nk1
验证Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)P_k(\epsilon_x,\epsilon_y)=P_k^b(\epsilon_x,\epsilon_y)+P_k^c(\epsilon_x,\epsilon_y)Pk(ϵx,ϵy)=Pkb(ϵx,ϵy)+Pkc(ϵx,ϵy)的积分为1. (19)
∫0∞∫0ϵyPk(ϵx,ϵy)dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂∂x(kqik−1(∂qi∂ϵy))dϵxdϵy=k(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dϵy=k(N−1k)Beta(k,N−k)=1 \begin{aligned} &\int_0^{\infin}\int_0^{\epsilon_y}P_k(\epsilon_x,\epsilon_y)d\epsilon_xd\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}\int_0^{\epsilon_y}\frac{\partial}{\partial x}({kq_i^{k-1}(\frac{\partial q_i}{\partial\epsilon_y}}))d\epsilon_xd\epsilon_y\\ &=k\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}d\epsilon_y\\ &=k\tbinom{N-1}{k}Beta(k,N-k)=1 \end{aligned} 00ϵyPk(ϵx,ϵy)dϵxdϵy=(kN1)0(1pi(ϵy))Nk10ϵyx(kqik1(ϵyqi))dϵxdϵy=k(kN1)0(1pi(ϵy))Nk1pi(ϵy)k1dϵy=k(kN1)Beta(k,Nk)=1
下面证明E[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−1/kE[logq_i(\epsilon_x,\epsilon_y)]=\psi(k)-\psi(N)-1/kE[logqi(ϵx,ϵy)]=ψ(k)ψ(N)1/k,首先在第二个积分号进行分部积分: (20)
E[logqi(ϵx,ϵy)]=∫0∞∫0ϵylog(qi)Pk(ϵx,ϵy)dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1∫0ϵy∂∂x(kqik−1(∂qi∂ϵy))dϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂qi∂ϵy)∣0ϵy−∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx]dϵy \begin{aligned} E[logq_i(\epsilon_x,\epsilon_y)]&=\int_0^{\infin}\int_0^{\epsilon_y}log(q_i)P_k(\epsilon_x,\epsilon_y)d\epsilon_xd\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}\int_0^{\epsilon_y}\frac{\partial}{\partial x}({kq_i^{k-1}(\frac{\partial q_i}{\partial\epsilon_y}}))d\epsilon_xd\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}[log(q_i)kq_i^{k-1}(\frac{\partial q_i}{\partial \epsilon_y})|_{0}^{\epsilon_y}\\&-\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x]d\epsilon_y\\ \end{aligned} E[logqi(ϵx,ϵy)]=00ϵylog(qi)Pk(ϵx,ϵy)dϵxdϵy=(kN1)0(1pi(ϵy))Nk10ϵyx(kqik1(ϵyqi))dϵxdϵy=(kN1)0(1pi(ϵy))Nk1[log(qi)kqik1(ϵyqi)0ϵy0ϵy{kqik2ϵxqi(ϵyqi)}dϵx]dϵy
由最开始的Lemma1可知X∼Beta(α,β),E[log(X)]=ψ(α)−ψ(α+β),ψ()=Γ′()Γ()X\sim Beta(\alpha, \beta),\mathbb{E}[log(X)]=\psi(\alpha)-\psi(\alpha+\beta),\psi()=\frac{\Gamma'()}{\Gamma()}XBeta(α,β),E[log(X)]=ψ(α)ψ(α+β),ψ()=Γ()Γ()

可以观察到第一部分: (21)
=(N−1k)∫0∞(1−pi(ϵy))N−k−1[log(qi)kqik−1(∂qi∂ϵy)∣0ϵydϵxdϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂pi(ϵy)∂ϵy)dϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1(∂pi(ϵy)∂ϵy)dϵy=(N−1k)∫0∞(1−pi(ϵy))N−k−1log(pi(ϵy))kpi(ϵy)k−1dpi(ϵy)=(N−1k)∫01(1−u)N−k−1log(u)kuk−1du=ψ(k)−ψ(N) \begin{aligned} &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}[log(q_i)kq_i^{k-1}(\frac{\partial q_i}{\partial \epsilon_y})|_{0}^{\epsilon_y}d\epsilon_xd\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}(\frac{\partial p_i(\epsilon_y)}{\partial \epsilon_y})d\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}(\frac{\partial p_i(\epsilon_y)}{\partial \epsilon_y})d\epsilon_y\\ &=\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}log(p_i(\epsilon_y))kp_i(\epsilon_y)^{k-1}dp_i(\epsilon_y)\\ &=\tbinom{N-1}{k}\int_0^{1}(1-u)^{N-k-1}log(u)ku^{k-1}du=\psi(k)-\psi(N) \end{aligned} =(kN1)0(1pi(ϵy))Nk1[log(qi)kqik1(ϵyqi)0ϵydϵxdϵy=(kN1)0(1pi(ϵy))Nk1log(pi(ϵy))kpi(ϵy)k1(ϵypi(ϵy))dϵy=(kN1)0(1pi(ϵy))Nk1log(pi(ϵy))kpi(ϵy)k1(ϵypi(ϵy))dϵy=(kN1)0(1pi(ϵy))Nk1log(pi(ϵy))kpi(ϵy)k1dpi(ϵy)=(kN1)01(1u)Nk1log(u)kuk1du=ψ(k)ψ(N)
其中,注意到qi(0)=0q_i(0)=0qi(0)=0和当ϵx=ϵy\epsilon_x=\epsilon_yϵx=ϵy时有q(ϵy,ϵy)q(\epsilon_y,\epsilon_y)q(ϵy,ϵy)为一个超立方体,恰是pi(ϵy)p_i(\epsilon_y)pi(ϵy),最后的部分由前面Prososition 2.的结论得出。

第二部分: (22)
−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy \begin{aligned} &-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y \end{aligned} (kN1)0(1pi(ϵy))Nk1(0ϵy{kqik2ϵxqi(ϵyqi)}dϵx)dϵy=(kN1)0(1pi(ϵy))Nk1(0ϵy{kqik2ϵxqi(ϵyqi)}dϵx)dϵy
由局部均匀假设: (23)
qi(ϵx,ϵy)≈μ(xi,yi)cdxcdyϵxdxϵydy→∂∂ϵyqi(ϵx,ϵy)=μ(xi,yi)cdxcdyϵxdxdyϵydy−1=qi(ϵx,ϵy)dyϵy \begin{aligned} &q_i(\epsilon_x,\epsilon_y)\approx\mu(x_i,y_i)c_{dx}c_{dy}\epsilon_x^{dx}\epsilon_y^{dy}\\ \rightarrow&\frac{\partial}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)=\mu(x_i,y_i)c_{dx}c_{dy}\epsilon_x^{dx}dy\epsilon_y^{dy-1}=q_i(\epsilon_x,\epsilon_y)\frac{dy}{\epsilon_y} \end{aligned} qi(ϵx,ϵy)μ(xi,yi)cdxcdyϵxdxϵydyϵyqi(ϵx,ϵy)=μ(xi,yi)cdxcdyϵxdxdyϵydy1=qi(ϵx,ϵy)ϵydy
并且由刚才分部积分时提及的当ϵx=ϵy\epsilon_x=\epsilon_yϵx=ϵy时有 q(ϵy,ϵy)q(\epsilon_y,\epsilon_y)q(ϵy,ϵy)为一个超立方体,因此 (24)
∂∂ϵyqi(ϵx,ϵy)∣ϵx=ϵy=pi(ϵy)dyϵy \frac{\partial}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}=p_i(\epsilon_y)\frac{dy}{\epsilon_y} ϵyqi(ϵx,ϵy)ϵx=ϵy=pi(ϵy)ϵydy

则(22)继续简化有: (25)
=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(∂qi∂ϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(∫0ϵy{kqik−2∂qi∂ϵx(qidyϵy)}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)(∫0ϵy{kqik−1∂qi∂ϵx}dϵx)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy \begin{aligned} &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(\frac{\partial q_i}{\partial \epsilon_y})\}d\epsilon_x)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\int_0^{\epsilon_y}\{kq_i^{k-2}\frac{\partial q_i}{\partial \epsilon_x}(q_i\frac{dy}{\epsilon_y})\}d\epsilon_x)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})(\int_0^{\epsilon_y}\{kq_i^{k-1}\frac{\partial q_i}{\partial \epsilon_x}\}d\epsilon_x)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y \end{aligned} =(kN1)0(1pi(ϵy))Nk1(0ϵy{kqik2ϵxqi(ϵyqi)}dϵx)dϵy=(kN1)0(1pi(ϵy))Nk1(0ϵy{kqik2ϵxqi(qiϵydy)}dϵx)dϵy=(kN1)0(1pi(ϵy))Nk1(ϵydy)(0ϵy{kqik1ϵxqi}dϵx)dϵy=(kN1)0(1pi(ϵy))Nk1(ϵydy)pi(ϵy)k)dϵy=(kN1)0(1pi(ϵy))Nk1(ϵydy)pi(ϵy)k)dϵy
其中第三个等号注意到了dyϵy\frac{dy}{\epsilon_y}ϵydy是常数,注意dydydyYYY的维度不是微分。第四个等号是将kqik−1kq_i^{k-1}kqik1吸收到微分,再由积分所得。

进一步根据(24)有:(26)
ddϵypi(ϵy)=d∂ϵyqi(ϵx,ϵy)∣ϵx=ϵy=qi(ϵx,ϵy)∣ϵx=ϵydydϵy=pi(ϵy)dyϵy \frac{d}{d\epsilon_y}p_i(\epsilon_y)=\frac{d}{\partial \epsilon_y}q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}=q_i(\epsilon_x,\epsilon_y)|_{\epsilon_x=\epsilon_y}\frac{dy}{d\epsilon_y}=p_i(\epsilon_y)\frac{dy}{\epsilon_y} dϵydpi(ϵy)=ϵydqi(ϵx,ϵy)ϵx=ϵy=qi(ϵx,ϵy)ϵx=ϵydϵydy=pi(ϵy)ϵydy
因此,(25)简化为(26)
=−(N−1k)∫0∞(1−pi(ϵy))N−k−1(dyϵy)pi(ϵy)k)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1∂∂ypi(ϵy)dϵy=−(N−1k)∫0∞(1−pi(ϵy))N−k−1pi(ϵy)k−1dpi=−(N−1k)Beta(k,N−k)=−1k \begin{aligned} &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}(\frac{dy}{\epsilon_y})p_i(\epsilon_y)^k)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}\frac{\partial}{\partial y}p_i(\epsilon_y)d\epsilon_y\\ &=-\tbinom{N-1}{k}\int_0^{\infin}(1-p_i(\epsilon_y))^{N-k-1}p_i(\epsilon_y)^{k-1}dp_i=-\tbinom{N-1}{k}Beta(k,N-k)\\ &=-\frac{1}{k} \end{aligned} =(kN1)0(1pi(ϵy))Nk1(ϵydy)pi(ϵy)k)dϵy=(kN1)0(1pi(ϵy))Nk1pi(ϵy)k1ypi(ϵy)dϵy=(kN1)0(1pi(ϵy))Nk1pi(ϵy)k1dpi=(kN1)Beta(k,Nk)=k1
联立(21)和(26),得到(27)
E[logqi(ϵx,ϵy)]=ψ(k)−ψ(N)−1k E[logq_i(\epsilon_x,\epsilon_y)]=\psi(k)-\psi(N)-\frac{1}{k} E[logqi(ϵx,ϵy)]=ψ(k)ψ(N)k1
进一步的,根据公式(23),自然可得(28)
logμ(xi,yi)≈logqi(ϵx,ϵy)−log(cdxcdy)−dxlogϵx−dylogϵy \begin{aligned} log\mu(x_i,y_i)\approx logq_i(\epsilon_x,\epsilon_y)-log(c_{dx}c_{dy})-dxlog\epsilon_x-dylog\epsilon_y \end{aligned} lo(xi,yi)logqi(ϵx,ϵy)log(cdxcdy)dxlogϵxdylogϵy
求期望有:(29)
E[logμ(xi,xy)]=ψ(k)−ψ(N)−1k−log(cdxcdy)−dx1N∑logϵx−dy1N∑logϵy E[log\mu(x_i,x_y)]=\\\psi(k)-\psi(N)-\frac{1}{k}-log(c_{dx}c_{dy})-dx\frac{1}{N}\sum log\epsilon_x-dy\frac{1}{N}\sum log\epsilon_y E[lo(xi,xy)]=ψ(k)ψ(N)k1log(cdxcdy)dxN1logϵxdyN1logϵy
再进一步利用(1)式可得KSG2估计器:(30)
I(2)(X,Y)=ψ(k)+ψ(N)−1k−1N∑ψ(nx)−1N∑ψ(ny) I^{(2)}(X,Y)=\psi(k)+\psi(N)-\frac{1}{k}-\frac{1}{N}\sum\psi(n_x)-\frac{1}{N}\sum\psi(n_y) I(2)(X,Y)=ψ(k)+ψ(N)k1N1ψ(nx)N1ψ(ny)

注意,这里是ψ(nx)\psi(n_x)ψ(nx),这是因为nxn_xnx此时定义为距离小于等于ϵx2\frac{\epsilon_x}{2}2ϵx的点的个数。

至此,KSG估计器的形式推导完毕。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值