python实现word2vec

最新推荐文章于 2025-05-21 08:52:37 发布

一只燃

最新推荐文章于 2025-05-21 08:52:37 发布

阅读量296

点赞数

CC 4.0 BY-SA版权

分类专栏： DataWhale

本文链接：https://blog.youkuaiyun.com/weixin_42856002/article/details/98958199

DataWhale 专栏收录该内容

10 篇文章

订阅专栏

本文通过使用numpy和Google Sheets实现Word2Vec的skip-gram模型，详细解析了自然语言处理中词向量的生成过程。从数据预处理到模型训练，再到损失计算与参数更新，全面介绍了Word2Vec的内部实现机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
https://www.leiphone.com/news/201812/2o1E1Xh53PAfoXgD.html
两个链接对照着看

实现的是skip_graw模型
在这里插入图片描述

text = "natural language processing and machine learning is fun and exciting"

# Note the .lower() as upper and lowercase does not matter in our implementation
# [['natural', 'language', 'processing', 'and', 'machine', 'learning', 'is', 'fun', 'and', 'exciting']]
corpus = [[word.lower() for word in text.split()]]

数据处理，把目标词和对应的内容词打包
目标词和内容词
处理之后的格式
在这里插入图片描述