Outline:
- The General structure and assumption of two model: CBOW and Skip-Gram.
- The simplest version of updating parameters—‘One-word context’ version.
a. 符号表示及优化目标
b. 隐藏层到输出层的参数更新
c. 输入层到隐藏层的参数更新
d. 感性理解 - Update parameters version2——‘Multi-word context’ version.
- Two ways to improve updating parameters
a. Huffman code & Hierarchical Softmax
b. Negative sampling
手写笔记(待补充)
Main Reference:
- 《word2vec Parameter Learning Explained》
- 《word2vec中的数学》
- 《word2vec Explained: Deriving Mikolov et al.’s
Negative-Sampling Word-Embedding Method》
Other Reference:
- The three original paper of Tomas Mikolov:
- 《Efficient Estimation of Word Representations in Vector Space》
- 《Distributed-representations-of-words-and-phrases-and-their-compositionality-Paper》
- 《Distributed Representations of Sentences and Documents》
- Some translation link:
a. https://www.cnblogs.com/peghoty/p/3857839.html
b. https://www.cnblogs.com/conan-ai/p/11354926.html
c. https://blog.youkuaiyun.com/u010555997/article/details/76598666
d. https://www.jianshu.com/p/4517181ca9c3
e. https://blog.youkuaiyun.com/lanyu_01/article/details/80097350