Noise-Contrastive Estimation

本文介绍了噪声对比估计(Noise Contrastive Estimation, NCE)方法,该方法常用于无监督学习和信息检索中,通过构建噪声样本对比来估计概率分布。在深度学习领域,NCE被用来训练潜在变量模型,如自编码器和生成对抗网络,以优化模型在大量未标记数据上的表现。通过NCE,可以有效估计高维复杂分布,避免直接计算难以求解的归一化常数。
  • https://leimao.github.io/article/Noise-Contrastive-Estimation/
3.2. Noise-Contrastive Estimation Computing the non-parametric softmax in Eq.(2) is cost prohibitive when the number of classes n is very large, e.g. at the scale of millions. Similar problems have been well addressed in the literature for learning word embed- dings [25, 24], where the number of words can also scale to millions. Popular techniques to reduce computation in- clude hierarchical softmax [26], noise-contrastive estima- tion (NCE) [9], and negative sampling [24]. We use NCE [9] to approximate the full softmax. We adapt NCE to our problem, in order to tackle the dif- ficulty of computing the similarity to all the instances in the training set. The basic idea is to cast the multi-class clas- sification problem into a set of binary classification prob- lems, where the binary classification task is to discrimi- nate between data samples and noise samples. Specifically, the probability that feature representation v in the memory bank corresponds to the i-th example under our model is, P (i|v) = exp(vT fi/τ ) Zi (4) Zi = n∑ j=1 exp (vT j fi/τ ) (5) where Zi is the normalizing constant. We formalize the noise distribution as a uniform distribution: Pn = 1/n. Following prior work, we assume that noise samples are m times more frequent than data samples. Then the posterior probability of sample i with feature v being from the data distribution (denoted by D = 1) is: h(i, v) := P (D = 1|i, v) = P (i|v) P (i|v) + mPn(i) . (6) Our approximated training objective is to minimize the neg- ative log-posterior distribution of data and noise samples, JN CE (θ) = −EPd [log h(i, v)] −m·EPn [log(1 − h(i, v′))] . (7) Here, Pd denotes the actual data distribution. For Pd, v is the feature corresponding to xi; whereas for Pn, v′ is the feature from another image, randomly sampled according to noise distribution Pn. In our model, both v and v′ are sampled from the non-parametric memory bank V . Computing normalizing constant Zi according to Eq. (4) is expensive. We follow [25], treating it as a constant and estimating its value via Monte Carlo approximation: Z ≃ Zi ≃ nEj [exp(vT j fi/τ )] = n m m∑ k=1 exp(vT jk fi/τ ), (8) where {jk} is a random subset of indices. Empirically, we find the approximation derived from initial batches suffi- cient to work well in practice. NCE reduces the computational complexity from O(n) to O(1) per sample. With such drastic reduction, our exper- iments still yield competitive performance. 3.3. Proximal RegularizationTraining Iterations Training Loss The Effect of Proximal Regulari
最新发布
08-24
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值