CS224N Assignment 1-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_39642801/article/details/88987417

新坑，本人水平有限，如有错误，欢迎指出
Q1.1
主要步骤是对列表的列表进行扁平化，后用set去重

    corpus_words=list(set([word for sublist in corpus for word in sublist]))
    corpus_words.sort()
    num_corpus_words=len(corpus_words)

Q1.2
注意在进行单词计数时只需让W[worda,wordb]加1即可，不需要同时让W[wordb,worda]加一

    M=np.zeros((num_words,num_words))
    index=0
    for word in words:
        word2Ind[word]=index
        index+=1
    for sen in corpus:
        sen_len=len(sen)
        for cur in range(sen_len):
            min=cur-window_size
            max=cur+window_size
            if(min<0):
                min=0
            if(max>=sen_len):
                max=sen_len-1
            for slide_index in range(min,max+1):
                if (slide_index==cur):
                    continue
                M[word2Ind[sen[slide_index]],word2Ind[sen[cur]]]+=1

Q1.3

    svd=TruncatedSVD(k)
    M_reduced=svd.fit_transform(M)

Q1.4

    for word in words:
        plt.scatter(M_reduced[word2Ind[word]][0],M_reduced[word2Ind[word]][1])
    return

之后的题目都是开放题，我就不放代码了，大致讲一下Q2.4的思路
这题是想找出两个单词的类比关系，比如例子中是男人之于国王类似于女人之于女王，所以在positive中为女人和国王，为的是让结果既要和女人相关，也要和男人的类比结果–国王相关，而在negative里为男人，是想让结果和男人尽可能无关

拓展阅读：
SKIP-GRAM 论文：https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
hierarchical softmax：https://blog.youkuaiyun.com/itplus/article/details/37969817