Deep Learning 理解

本文深入探讨了Word2Vec模型中的两种高效训练策略:层次softmax(Hierarchical Softmax)和负采样(Negative Sampling),并详细解释了这两种方法如何解决传统softmax算法在大规模词汇表上的效率问题。同时,文章还介绍了Word2Vec的损失函数——交叉熵,以及其在衡量预测分布与真实分布接近程度中的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、word2vector
1、 hierarchical softmax
传统的softmax的最后一层需要计算每个单词的概率,效率太低,因此提出了替代方案:Hierarchical softmax。

Hierarchical Softmax 基于这样的思想:相比于直接建模 P(y/x) ,我们可以先定义一个划分函数 c() 将 y 划分到区域 C,然后:
在这里插入图片描述
即计算 x 条件下 y 的概率,先计算 x 条件下 y 所在的区域的概率,再计算该区域条件下 y 的概率。这个方法可以嵌套,可以在 C 区域下再划分区域。通过不断将样本空间(词汇表)分成两个大小相等的互补的集合,可以每次将样本空间的大小缩小一半,最终经过 步就可以得到想要的样本点
参考:https://zhuanlan.zhihu.com/p/57028381
https://www.zhihu.com/question/43378064

2、 negative sampling
根据某种分布采样得到负样本,之后按照二分类的交叉熵计算损失:-log(logistic) - SUM(log(1-logistic))
具体理论和源码可以参考我的另一篇博客
这里着重关注负采样的采样流程:word2vec的tensorflow的实现中candidate_sampling_ops.py说采用的是“an approximately log-uniform
or Zipfian distribution”,即公示:P(class) = (log(class + 2) - log(class + 1)) / log(range_max + 1)(这里要求按照词频降序排列)
但是在paper和这个tutorial中说是按照概率分布来采样:
在这里插入图片描述
因此比较迷糊:到底哪种方式比较合适呢

c语言的采样实现

**

建立一个unigram
table包含了一亿个元素的数组,这个数组是由词汇表中每个单词的索引号填充的,并且这个数组中有重复,也就是说有些单词会出现多次。每个单词的索引在这个数组中出现的次数根据公式P(w_i)table_size得出,也就是说计算出的负采样概率1亿=单词在表中出现的次数。
因此我理解的步骤:(1)根据P(w_i)*table_size计算每个单词在表中应该出现的次数(2)根据次数向表中put相应个数的该单词,左右单词重复该操作后应该正好table装满(因为概率p的和为1)(3)在0-table_size中产生随机数,随机数作为key对应的value就是采样的index。

**

二、Loss
1、cross entropy for softmax:
How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines。参考https://stackoverflow.com/questions/41990250/what-is-cross-entropy

'Written by three experts in the field, Deep Learning is the only comprehensive book on the subject.' -- Elon Musk, co-chair of OpenAI; co-founder and CEO of Tesla and SpaceX, Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning., The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models., Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值