CS224N（Natural Language Processing with Deep Learning）总结：模型、任务、作业、作业中涉及到的特殊代码

最新推荐文章于 2025-11-06 23:46:01 发布

原创

最新推荐文章于 2025-11-06 23:46:01 发布 · 2.2k 阅读

CC 4.0 BY-SA版权

文章标签：

本文总结了CS224N课程，涵盖了word2vec、GloVe、RNN、GRU、LSTM、Attention、CNN等模型，涉及机器翻译、依存句法分析和指代消解等任务。作业包括skip-gram、情感分析、命名实体识别和阅读理解，强调了训练技巧和特殊代码应用。

模型：word2vec（skip-gram、CBOW）、GloVe、DNN/BP/Tips for training、RNN/GRU/LSTM、Attention、CNN、TreeRNN

应用：Neural Machine Translation、Dependency Parsing、Coreference Resolution

作业：skip-gram、window-based sentiment classification；dependency parsing；named entity recognition、RNN、GRU；question answer！

收获很大！！！

=======================课程中涉及到的模型、方法

word2vec：skip-gram（根据中间的词预测周围的词）、CBOW（根据周围词的(平均)预测中间的词）

This captures co-occurrence of words one at a time

==》一般直接用W作为最终的embedding

GloVe：考虑word-word co-occurrence以及单个word本身频率

This captures co-occurrence counts directly

==》一般用U+V作为最终的embedding

lecture notes写的很好：gradient check、regularization、dropout、activation function、data preprocessing、parameter initialization、optimizer

Gradient vanishing：正交初始化+relu；或者使用GRU, LSTM

梯度爆炸：gradient clipping，[-5, +5] is a good choice

Fancy RNN：GRU, LSTM, bi-directional RNN, multi-layer RNN

Vanilla attention：global attention/local attention、soft attention/hard attention è dot-product attention；Multiplicative attention；Additive attention。

控制attention的位置：encourage covering ALL important parts；prevent attending to the same part repeatedly.（主要通过修改attention weight实现）

Self-attention：同一个RNN，当前的位置attend之前所有的位置

1 条评论