CS224N notes_chapter13_CNN-优快云博客

本文链接：https://blog.youkuaiyun.com/lirt15/article/details/95265012

第十三讲 CNN

##未完且有可能不续

From RNN to CNN

RNN can only capture a phrase given its left side context.
#就是说，你如果想拿到RNN对某一个输入向量的处理结果，你需要把它前边的输入都过一遍，而不能只拆出其中的一部分

Main CNN idea:
Compute vectors for every possible phrase.

Regardless of whether phrase is grammatical
Not vary linguistically or cognitively plausible

#这些也导致CNN只能观察到局部的一些特征
#笔者这里跳过了卷积的定义

Single Layer CNN

Example:
Word vectors: $\mathbf x_i\in \mathbb{R^k}$
Sentence:
$\mathbf x_{i:n}=\mathbf x_1 \oplus \mathbf x_2 \oplus...\oplus \mathbf x_n$
Concatenation of words in range:
$\mathbf x_{i:i+j}$
Conv vector: $\mathbf w \in \mathbf{R}^{hk}$
#每次读入h个单词，然后输出一个分数
Padding: to make output the same length as input.
Pooling: capture most important activation.
#课程中是这样解释Pooling的: 设想你的卷积操作是针对2-gram去做的，那么你的卷积核在移动过程中，会对特定的2-gram词组产生比较强的响应值，max-pooling能将这个最强的响应记录下来。
#相应地，我们也可以加入3-gram,4-gram，同时做max-pooling,最终把他们的结果串在一起，那么我们就得到了一个能看到2,3,4-gram的特征向量。
#之后我们再用得到的这个特征向量做分类或其他问题。