python中向量长度_Word2vec向量的长度有什么意义?

在Python中,Word2vec向量的长度与词频有密切关系。低频词汇因使用一致导致向量增长,而高频词汇在不同上下文中使用,其向量会缩短。图3展示了词频与向量长度的趋势,高频率词汇的向量长度最终可能接近停用词和标点符号。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

For given term frequency,

the vector length is seen to take values only in a

narrow interval. That interval initially shifts upwards

with increasing frequency. Around a frequency

of about 30, that trend reverses and the interval

shifts downwards.

...

Both forces determining the length of a word

vector are seen at work here. Small-frequency

words tend to be used consistently, so that the

more frequently such words appear, the longer

their vectors. This tendency is reflected by the upwards

trend in Fig. 3 at low frequencies. High-frequency

words, on the other hand, tend to be

used in many different contexts, the more so, the

more frequently they occur. The averaging over

an increasing number of different contexts shortens

the vectors representing such words. This tendency

is clearly reflected by the downwards trend

in Fig. 3 at high frequencies, culminating in punctuation

marks and stop words with short vectors at

the very end.

...

Figure 3: Word vector length v versus term frequency

tf of all words in the hep-th vocabulary.

Note the logarithmic scale used on the frequency

axis. The dark symbols denote bin means with the

kth bin containing the frequencies in the interval

[2k−1, 2k − 1] with k = 1, 2, 3, . . .. These means

are included as a guide to the eye. The horizontal

line indicates the length v = 1.37 of the mean

vector

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值