torch.nn.Embedding

最新推荐文章于 2025-09-08 10:54:46 发布

原创最新推荐文章于 2025-09-08 10:54:46 发布 · 263 阅读

·

0

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

博客介绍了在PyTorch里实现词向量的方法。先要用数字表示单词，如用0表示“hello”。定义词向量矩阵，如2个词、5维度就是2x5矩阵。初始词向量未优化，需建神经网络修改参数。还说明了获取词向量及打印的代码操作。

部署运行你感兴趣的模型镜像

在pytorch里面实现word embedding是通过一个函数来实现的:nn.Embedding

1

2

3

4

5

6

7

8

9

10

11

12

13

# -*- coding: utf-8 -*-

import numpy as np

import torch

import torch.nn as nn

import torch.nn.functional as F

from torch.autograd import Variable

word_to_ix = {'hello': 0, 'world': 1}

embeds = nn.Embedding(2, 5)

hello_idx = torch.LongTensor([word_to_ix['hello']])

hello_idx = Variable(hello_idx)

hello_embed = embeds(hello_idx)

print(hello_embed)

这就是我们输出的hello这个词的word embedding，代码会输出如下内容，接下来我们解析一下代码：

1

2

3

Variable containing:

0.4606 0.6847 -1.9592 0.9434 0.2316

[torch.FloatTensor of size 1x5]

首先我们需要word_to_ix = {'hello': 0, 'world': 1}，每个单词我们需要用一个数字去表示他，这样我们需要hello的时候，就用0来表示它。

接着就是word embedding的定义nn.Embedding(2, 5)，这里的2表示有2个词，5表示5维度，其实也就是一个2x5的矩阵，所以如果你有1000个词，每个词希望是100维，你就可以这样建立一个word embedding，nn.Embedding(1000, 100)。如何访问每一个词的词向量是下面两行的代码，注意这里的词向量的建立只是初始的词向量，并没有经过任何修改优化，我们需要建立神经网络通过learning的办法修改word embedding里面的参数使得word embedding每一个词向量能够表示每一个不同的词。

1 2	`hello_idx = torch.LongTensor([word_to_ix['hello']])` `hello_idx = Variable(hello_idx)`

接着这两行代码表示得到一个Variable，它的值是hello这个词的index，也就是0。这里要特别注意一下我们需要Variable，因为我们需要访问nn.Embedding里面定义的元素，并且word embeding算是神经网络里面的参数，所以我们需要定义Variable。

hello_embed = embeds(hello_idx)这一行表示得到word embedding里面关于hello这个词的初始词向量，最后我们就可以print出来。

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。