CS224N notes_chapter2_word2vec

本文深入探讨了Word2Vec的工作原理,包括其背后的理论基础、Skip-Gram与CBOW两种算法,以及如何通过优化目标函数来提升词向量的质量。文章还介绍了如何使用梯度下降法进行参数优化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

第二讲 word2vec

1 Word meaning

the idea that is represented by a word, phrase, writing, art etc.
How do we have usable meaning in a computer?
Common answer: toxonomy(分类系统) like WordNet that has hypernyms relations(is-a) and synonym(同义词) sets.
Problems with toxonomy:

  • missing nuances(细微差别) 比如 proficient 就比 good 更适合形容专家, 但是在分类系统中它们就是同义词
  • missing new words
  • subjective
  • requires human labor to create and adapt
  • Hard to compute accurate word similarity

Problems with discrete representation: one-hot representation dimensions.
[0,0,0,...,1,...,0]
and one-hot doesn’t give the relation/similarity between words.
Distributional similarity: you can get a lot of value for representing a word by means of its neighbors.
Next, we want to use vectors to represent words.
distributional: understand word meaning by context.
distributed:dense vectors to represent the meaning of the words.

2. Word2vec intro

Basic idea of learning Neural Network word embeddings
We def a model to predict the center word wtw_twt and context words in terms of word vectors.
p(context∣wt)p(context|w_t) p(contextwt)
which has a loss function like
J=1−p(w−t∣wt)J = 1 -p(w_{-t}|w_t) J=1p(wtwt)
-t means neighbors of wtw_twt except wtw_twt
Main idea of word2vec: Predict between every word and its context words.
Two algorithms.

  1. Skip-grams(SG)
    Predict context words given target(position independent)
    … turning into banking crises as …
    banking: center word
    turning: p(wt−2∣wt)p(w_{t-2}|w_t)p(wt2wt)
    For each word t=1,…T, we predict surrounding words in a window of “radius” m of every word
    J′(θ)=∏t=1T∏0m≤j≤m,j≠0P(wt+j∣wt;θ)J(θ)=−1T∑t=1T∑0m≤j≤m,j≠0P(wt+j∣wt;θ)J'(\theta)=\prod_{t=1}^T \prod_{0m\leq j \leq m, j\neq 0} P(w_{t+j}|w_t;\theta) \\ J(\theta)=-\frac 1 T \sum_{t=1}^T \sum_{0m\leq j \leq m, j\neq 0} P(w_{t+j}|w_t;\theta) J(θ)=t=1T0mjm,j̸=0P(wt+jwt;θ)J(θ)=T1t=1T0mjm,j̸=0P(wt+jwt;θ)
    hyperparameter: window size m
    we use p(wt+j∣wt)=exp(uoTvc)∑w=1Vexp(uwTvc)p(w_{t+j}|w_t)= \frac{exp(u_o^Tv_c)}{\sum_{w=1}^V exp(u_w^Tv_c)}p(wt+jwt)=w=1Vexp(uwTvc)exp(uoTvc),
    the dot product will be greater if two words are more similar. And softmax maps the values to probability distribution.

  2. Continuous Bag of Words(CBOW)
    Predict target word from bag-of-words context.

3. Research highlight

omit

4. Word2vec objective function gradients

all parameters in model
θ=[va⋮vzebraua⋮uzebra]\theta=\left[\begin{aligned} v_a \\ \vdots \\ v_{zebra} \\ u_a \\ \vdots \\ u_{zebra} \end{aligned}\right] θ=vavzebrauauzebra
We try to optimize these parameters by training the model. We use gradients descent.
∂∂vclog⁡exp(uoTvc)−log⁡∑x=1Vexp(uwTvc)=uo−∑x=1Vuxexp(uxTvc)∑w=1Vexp(uwTvc)=u0−∑x=1vp(x∣c)ux\begin{aligned} &\frac{\partial}{\partial v_c} \log{exp(u_o^Tv_c)}-\log{\sum_{x=1}^V}exp(u_w^Tv_c) \\ =& u_o - \frac{\sum_{x=1}^{V}u_x exp(u_x^Tv_c)}{\sum_{w=1}^Vexp(u_w^Tv_c)} \\ =&u_0 - \sum_{x=1}^{v}p(x|c)u_x \end{aligned} ==vclogexp(uoTvc)logx=1Vexp(uwTvc)uow=1Vexp(uwTvc)x=1Vuxexp(uxTvc)u0x=1vp(xc)ux

5. Optimization refresher

We have the gradients at point x. Then we go along the negative gradients.
θjnew=θjold−α∂∂θjoldJ(θ)\theta_j^{new}=\theta_j^{old} - \alpha\frac{\partial}{\partial \theta_j^{old}}J(\theta) θjnew=θjoldαθjoldJ(θ)
α\alphaα: step size.
In matrix notation for parameters
θjnew=θjold−α∇θJ(θ)\theta_j^{new}=\theta_j^{old} - \alpha\nabla_\theta J(\theta)θjnew=θjoldαθJ(θ)
Stochastic Gradient Descent:

  • global update -> much time
  • mini batch -> also good idea

6. Assignment 1 notes

7. Usefulness of Wordvec

内容概要:本文探讨了在MATLAB/SimuLink环境中进行三相STATCOM(静态同步补偿器)无功补偿的技术方法及其仿真过程。首先介绍了STATCOM作为无功功率补偿装置的工作原理,即通过调节交流电压的幅值和相位来实现对无功功率的有效管理。接着详细描述了在MATLAB/SimuLink平台下构建三相STATCOM仿真模型的具体步骤,包括创建新模型、添加电源和负载、搭建主电路、加入控制模块以及完成整个电路的连接。然后阐述了如何通过对STATCOM输出电压和电流的精确调控达到无功补偿的目的,并展示了具体的仿真结果分析方法,如读取仿真数据、提取关键参数、绘制无功功率变化曲线等。最后指出,这种技术可以显著提升电力系统的稳定性与电能质量,展望了STATCOM在未来的发展潜力。 适合人群:电气工程专业学生、从事电力系统相关工作的技术人员、希望深入了解无功补偿技术的研究人员。 使用场景及目标:适用于想要掌握MATLAB/SimuLink软件操作技能的人群,特别是那些专注于电力电子领域的从业者;旨在帮助他们学会建立复杂的电力系统仿真模型,以便更好地理解STATCOM的工作机制,进而优化实际项目中的无功补偿方案。 其他说明:文中提供的实例代码可以帮助读者直观地了解如何从零开始构建一个完整的三相STATCOM仿真环境,并通过图形化的方式展示无功补偿的效果,便于进一步的学习与研究。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值