【论文阅读笔记】Using the Output Embedding to Improve Language Model

 

       设Word2Vec等词向量训练模型的输入词向量为U,输出词向量为V,通常模型训练完成后,只是用U作为预训练词向量给其他上游模型使用,V通常忽略,本文探讨了U和V使用的效果,已经联合使用U和V的想过,得出以下结论:

1、在Word2Vec Skipgram模型中,输出词向量与输入词向量的效果相比稍差。

2、在基于RNN的语言模型中,输入词向量比输入词向量想过更好。

3、通过将这两种嵌入结合在一起,即强制U = V,联合嵌入的效果更类似于输出嵌入,而不是未绑定模型的输入嵌入。

4、将输入和输出嵌入绑在一起,可以改进各种语言模型的困惑度。

5、当不使用dropout时,建议在V之前添加一个额外的投影P,并对P应用正则化。

6、神经翻译模型中的权值绑定可以在不影响性能的前提下将它们的大小(参数数量)减少到原来大小的一半以下。

Sure, here's an example of how you can design a PyTorch model using embeddings: ``` import torch import torch.nn as nn class EmbeddingModel(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_size, num_classes): super(EmbeddingModel, self).__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.LSTM(embedding_dim, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, num_classes) def forward(self, inputs): # inputs shape: (batch_size, seq_len) embedded = self.embedding(inputs) # embedded shape: (batch_size, seq_len, embedding_dim) output, _ = self.rnn(embedded) # output shape: (batch_size, seq_len, hidden_size) logits = self.fc(output[:, -1, :]) # logits shape: (batch_size, num_classes) return logits ``` In this example, we're creating a model that takes in sequences of integers (representing words in a sentence) and outputs a classification. The `EmbeddingModel` class inherits from `nn.Module` and defines three layers: 1. An `Embedding` layer that creates a learned embedding for each word in the vocabulary. The `vocab_size` parameter specifies the number of unique words in the vocabulary, and `embedding_dim` specifies the size of the learned embeddings. 2. An `LSTM` layer that takes the embedded input sequences and outputs a sequence of hidden states. The `hidden_size` parameter specifies the number of hidden units in the LSTM. 3. A fully connected `Linear` layer that takes the final hidden state of the LSTM and produces the output logits. `num_classes` specifies the number of classes we're trying to classify. In the `forward` method, we first pass the input sequences through the embedding layer to get the learned embeddings. Then we pass the embedded sequences through the LSTM layer to get a sequence of hidden states. Finally, we take the last hidden state (corresponding to the end of the sequence) and pass it through the fully connected layer to get the final logits. Note that we're using the `batch_first=True` parameter in the LSTM layer so that the input and output shapes are `(batch_size, seq_len, embedding_dim)` and `(batch_size, seq_len, hidden_size)` instead of `(seq_len, batch_size, embedding_dim)` and `(seq_len, batch_size, hidden_size)`. This is just a matter of personal preference, but it can make the code easier to read and write.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值