Efficient estimation of word representations in vector space

Sharp tools make good work.

工欲善其事,必先利其器

Today I’ll explore word vectors presented by Mikolov et al. in the paper - “Efficient estimation of word representations in vector space”. Two novel model architectures for learning vector representations of words are proposed in this paper which significantly improve the quality of word vectors at lower computational cost, and the vectors are measured in a word similarity task using a word offset technique where simple algebraic operations are performed on the word vectors.

In this paper, Mikolov et al. give a short summary of previously proposed model architectures, including the well-known NNLM and RNNLM, and propose two new log-linear models called CBOW and Skip-gram.

The CBOW is similar to the feedforward NNLM, where the non-linear hidden layer is removed and the projection layer is shared for all words, and the objective of this model is to use words from the history and future simultaneously to correctly classify the middle word in the vocabulary. Unlike standard bag-of-words model, it uses continuous distributed representation of the context.

The Skip-gram is similar to CBOW, but instead of predicting the current word based on the context, it tries to maximize classification of a word based on another word in the same sentence. More precisely, each current word is used as an input to a log-linear classifier consisting of continuous projection layer, and the result is used for predicting words within a certain range before and after the current word. Note that increasing the range improves quality of the resulting word vectors, but it also increases the computational cost.

Below the model architecture of two models is shown.
在这里插入图片描述

Below I present my code implementing the Skip-gram and the CBOW.

class Skip_gram(nn.Module):
    def __init__(self):
        super(Skip_gram, self).__init__()
        self.embedding = nn.Embedding(MAX_VOCAB_SIZE, EMBEDDING_SIZE)
        self.embedding.weight.data.uniform_(-INIT_RANGE, INIT_RANGE)

        self.outLayer = nn.Linear(EMBEDDING_SIZE, MAX_VOCAB_SIZE)

    def forward(self, X):
        # X -> B
        embedded = self.embedding(X)  # B x EMBEDDING_SIZE

        output = self.outLayer(embedded)  # B x MAX_VOCAB_SIZE

        return F.softmax(output, -1)


class CBOW(nn.Module):
    def __init__(self):
        super(CBOW, self).__init__()
        self.embedding = nn.Embedding(MAX_VOCAB_SIZE, EMBEDDING_SIZE)
        self.embedding.weight.data.uniform_(-INIT_RANGE, INIT_RANGE)

        self.outLayer = nn.Linear(EMBEDDING_SIZE, MAX_VOCAB_SIZE)

    def forward(self, X):

        # X -> B x 2C
        embedded = self.embedding(X)  # B x 2C x EMBEDDING_SIZE
        embedded = embedded.sum(1) / (2 * C)  # B x EMBEDDING_SIZE

        output = self.outLayer(embedded)  # B x MAX_VOCAB_SIZE
        return F.softmax(output, -1)

To compare the quality of different versions of word vectors, previous papers typically use a table showing example words and their most similar words, and understand them intuitively. Since it has been observed that there can be many different types of similarities between words, for example, word big is similar to bigger in the same sense that small is similar to smaller. Mikolov et al. raise a question about how to find a word that is similar to small in the same sense as biggest is similar to big, and the question can be answered by simply computing vector X X X = vector(“biggest”) - vector(“big”) + vector(“small”) that is used to search in the vector space to find the word closest to X X X measured by cosine distance. When the word vectors are well trained, it is possible to find the correct answer using this method.

From my perspective, the most valuable contribution of this paper is that it proposes two novel and computationally efficient model architectures to obtain word vectors of high quality, with the expectation that many existing NLP applications can benefit from the model architectures described above, such as machine translation, information retrieval and question answering systems, and may enable other future applications yet to be invented.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值