cs224n A5 Updated Sol_cs224n作业5答案-优快云博客

这篇博客记录了2020年cs224n作业5中关于卷积神经网络和Transformer的部分答案。内容涉及卷积架构的变长输入处理、1D卷积的填充计算、Highway Network在字符嵌入中的作用以及Transformer相比LSTM-attention模型的优势。

Written部分答案

2020cs224n作业5比2019版本改变挺多，所以在这里记录一下自己的答案（still in process，仅供参考）。

Problem 1.
(a) We learned in class that recurrent neural architectures can operate over variable length input (i.e., the shape of the model parameters is independent of the length of the input sentence). Is the same true of convolutional architectures? Write one sentence to explain why or why not.

Solution: This is true for conv nets as well because we could apply embeddings and use linear transformation to resize input shapes.

(b) In 1D convolutions, we do padding, i.e. we add some zeros to both sides of our input, so that the kernel sliding over the input can be applied to at least one complete window.
In this case, if we use the kernel size k = 5, what will be the size of the padding (i.e. the additional number of zeros on each side) we need for the 1-dimensional convolution, such that there exists at least one window for all possible values of mword in our dataset? Explain your reasoning.

Solution: Because the kernel size is 5, the length of $x'_{padded}$ has to be at least 5 to obtain a complete window sliding. For example, if the input is just a letter “a”, then we need to pad 1 zero on each side of this input, plus the “start/end” tokens padded to the beginning and the end of the word, we have “start”+“0”+“a”+“0”+“end”, a size 5 window.