LeNet 模型
LeNet 模型分为卷积层和全连接层两个部分。下面我们分别介绍:
卷积层块里的基本单位是卷积层后接最大池化层:卷积层用来识别图像里的空间模式,例如线条和物体局部,之后的最大池化层则用来降低卷积层对位置的敏感性。卷积层块由两个这样的基本单位重复堆叠构成。在卷积层快中,每个卷积层都用5×5的窗口,并在输出上使用sigmoid激活函数。第一个卷积层输出通道数为6,第二个卷积层输出通道数则增加到16.这是因为第二个卷积层比第一个卷积层的输入的高和宽要小,所以增加输出通道使两个圈基层的参数尺寸类似。卷积层块的两个最大池化层的窗口形状均为2*2,且步幅为2.由于池化窗口与步幅形状相同,池化窗口在输入上每次滑动所覆盖的区域互不重叠。
代码描述
// An highlighted block
import d2lzh as d2l
import mxnet as mx
from mxnet import autograd, gluon,init,nd
from mxnet.gluon import loss as gloss,nn
import time
net = nn.Sequential()
net.add(nn.Conv2D(channels=6, kernel_size=5, activation='sigmoid'),
nn.MaxPool2D(pool_size=2,strides=2),
nn.Conv2D(channels=16, kernel_size=5, activation='sigmoid'),
nn.MaxPool2D(pool_size=2,strides=2),
nn.dense(120,activation='sigmoid'),
nn.dense(84,activation='sigmoid'),
nn.dense(10),
)
接下来我们构造一个高和宽均为28的单通道数据样本,并逐层进行前向计算来查看每个层的输出形状。
// An highlighted block
X = nd.random.uniform(shape=(1, 1, 28, 28))
net.initialize()
for layer in net:
X = layer(X)
print(layer.name, 'output shape:\t', X.shape)
输出:
conv0 output shape: (1, 6, 24, 24)
pool0 output shape: (1, 6, 12, 12)
conv1 output shape: (1, 16, 8, 8)
pool1 output shape: (1, 16, 4, 4)
dense0 output shape: (1, 120)
dense1 output shape: (1, 84)
dense2 output shape: (1, 10)
可以看到在卷积层块中输入的高和宽在逐层减小。卷积层由于使用高和宽均为5的卷积核,从而将高和宽分别减小4,而池化层则将高和宽减半,但通道数则从1增加到16。全连接层则逐层减少输出个数,直到变成图像的类别数10。
获取数据和训练
下面我们来实验LeNet模型。实验中,我们仍然使用Fashion-MNIST作为训练数据集。
// An highlighted block
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)
def evaluate_accuracy(data_iter, net, ctx):
acc_sum, n = nd.array([0], ctx=ctx), 0
for X, y in data_iter:
# 如果ctx代表GPU及相应的显存,将数据复制到显存上
X, y = X.as_in_context(ctx), y.as_in_context(ctx).astype('float32')
acc_sum += (net(X).argmax(axis=1) == y).sum()
n += y.size
return acc_sum.asscalar() / n
def train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
num_epochs):
print('training on', ctx)
loss = gloss.SoftmaxCrossEntropyLoss()
for epoch in range(num_epochs):
train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
for X, y in train_iter:
X, y = X.as_in_context(ctx), y.as_in_context(ctx)
with autograd.record():
y_hat = net(X)
l = loss(y_hat, y).sum()
l.backward()
trainer.step(batch_size)
y = y.astype('float32')
train_l_sum += l.asscalar()
train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()
n += y.size
test_acc = evaluate_accuracy(test_iter, net, ctx)
print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, '
'time %.1f sec'
% (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc,
time.time() - start))
lr, num_epochs = 0.9, 5
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)
输出:
epoch 1, loss 2.3202, train acc 0.102, test acc 0.100, time 2.1 sec
epoch 2, loss 1.9571, train acc 0.249, test acc 0.566, time 1.9 sec
epoch 3, loss 0.9643, train acc 0.618, test acc 0.668, time 1.8 sec
epoch 4, loss 0.7567, train acc 0.705, test acc 0.733, time 1.8 sec
epoch 5, loss 0.6704, train acc 0.733, test acc 0.754, time 1.8 sec
小结
•卷积神经网络就是含卷积层的网络。
•LeNet交替使用卷积层和最大池化层后接全连接层来进行图像分类。
来源:伯克利教材\d2l-zh\chapter_convolutional-neural-networks\lenet