重度拖延症患者准备继续完成作业2了......
首先题目链接点击打开链接
Q1: Fully-connected Neural Network (25 points)
在这一问中需要完成layers.py文件中的一些函数,首先是affine_forward,也就是前向计算,较为简单:
N = x.shape[0] x_reshape = x.reshape([N, -1]) x_plus_w = x_reshape.dot(w) # [N, M] out = x_plus_w + b然后是反向梯度传递计算,这里的一个小技巧就是先确定导数的表达式,然后具体计算(比如是否需要转置、求和等)则可以通过导数表达式中各项的shape确定,这个技巧在lecture4的backprop notes中Gradients for vectorized operations一节中有讲到。 点击打开链接
N = x.shape[0] x_reshape = x.reshape([N, -1]) # [N, D] dx = dout.dot(w.T).reshape(*x.shape) dw = (x_reshape.T).dot(dout) db = np.sum(dout, axis=0)
例如对于dx,通过上面前向计算表达式可知其导数为dout * w,由于dout.shape == (N, M),w.shape == (D, M),而dx.shape==(N, d1, ..., d_k),很容易写出表达式。
对于relu_forward,直接根据其表达式即可:
out = np.maximum(0, x)
而relu_backward中,则与点击打开链接 中的max gate是一致的,代码如下:
dx = dout * (x >= 0)
接下来看看它在这次作业中预先实现好的svm loss和softmax loss。
实现Two-layer network, 这里较为简单,根据初始化要求,对w和b进行初始化:
self.params['W1'] = weight_scale * np.random.randn(input_dim, hidden_dim) self.params['b1'] = np.zeros(hidden_dim) self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes) self.params['b2'] = np.zeros(num_classes)
然后在loss函数中,搭建2层神经网络:
layer1_out, layer1_cache = affine_relu_forward(X, self.params['W1'], self.params['b1']) layer2_out, layer2_cache = affine_forward(layer1_out, self.params['W2'], self.params['b2']) scores = layer2_out
计算梯度:
loss, dscores = softmax_loss(scores, y) loss = loss + 0.5 * self.reg * np.sum(self.params['W1'] * self.params['W1']) + \ 0.5 * self.reg * np.sum(self.params['W2'] * self.params['W2']) d1_out, dw2, db2 = affine_backward(dscores, layer2_cache) grads['W2'] = dw2 + self.reg * self.params['W2'] grads['b2'] = db2 dx1, dw1, db1 = affine_relu_backward(d1_out, layer1_cache) grads['W1'] = dw1 + self.reg * self.params['W1'] grads['b1'] = db1
接下来将所有相关的内容全部利用一个solver对象来进行组合,其中solver.py中已经说明了该对象如何使用,所以在这里直接使用即可:
solver = Solver(model, data, update_rule='sgd', optim_config={ 'learning_rate': 1e-3,}, lr_decay=0.80, num_epochs=10, batch_size=100, print_every=100) solver.train() scores = solver.model.loss(data['X_test']) y_pred = np.argmax(scores, axis=1) acc = np.mean(y_pred == data['y_test']) print("test acc: ",acc)
最终在测试集上达到的准确率为52.3%,然后做出图:
这里注意训练集和验证集的准确率,如果差别过大,而且验证集上准确率提升缓慢,则考虑是否过拟合。
然后是实现FullyConnectedNet,都是套路,网络层结构如下所示:
{affine - [batch norm] - relu - [dropout]} x (L - 1) - affine - softmax
在初始化中,对W、b、gamma、beta进行初始化,按照提示写即可:
shape1 = input_dim for i, shape2 in enumerate(hidden_dims): self.params['W'+str(i+1)] = weight_scale * np.random.randn(shape1, shape2) self.params['b'+str(i+1)] = np.zeros(shape2) shape1 = shape2 if self.use_batchnorm: self.params['gamma'+str(i+1)] = np