LSTM在MNIST手写数据集上做分类（代码中尺寸变换细节）

最新推荐文章于 2025-06-02 11:49:30 发布

答萌答萌-

最新推荐文章于 2025-06-02 11:49:30 发布

阅读量1.4k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：编程中遇到的问题（已解决） lstm 文章标签： lstm 代码实现 rnn tensorflow

本文链接：https://blog.youkuaiyun.com/youngdoris/article/details/84106178

编程中遇到的问题（已解决）同时被 2 个专栏收录

2 篇文章

订阅专栏

lstm

1 篇文章

订阅专栏

本文深入探讨了LSTM和RNN在网络实现中的具体应用，通过对代码的详细解读，阐述了输入输出格式、隐藏层转换及输出细节，特别关注于LSTM内部的工作原理和数据流动过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

RNN和LSTM学了有一段时间了，主要都是看博客了解原理，最近在研究SLSTM，在对SLSTM进行实现的时候遇到了困难，想说先比较一下二者的理论实现，这才发现自己对于LSTM内部的输入输出格式、输出细节等等都不是非常清楚，借此机会梳理一下，供后续研究使用。

下面代码来自github的mnist_master项目，非常基础且工整。也是我学习RNN、CNN的入手代码示例，感谢！

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf


def RNN(X, weights, biases):
    # hidden layer for input
    print('the shape of X is',X.shape)
    X = tf.reshape(X, [-1, n_inputs])
    X_in = tf.matmul(X, weights['in']) + biases['in']
    X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])
    print('the shape of X_in is',X_in.shape)
    # cell
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
    _init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
    outputs, states = tf.nn.dynamic_rnn(lstm_cell, X_in, initial_state=_init_state, time_major=False)
    print('the shape of outputs is',outputs.shape)
    # hidden layer for output as the final results
    # results = tf.matmul(states[1], weights['out']) + biases['out']
    # or
    outputs = tf.transpose(outputs, [1, 0, 2])
    print('the shape of transpose outputs is',outputs.shape)
    tf.unstack(outputs)
    results = tf.matmul(outputs[-1], weights['out']) + biases['out']

    return results


# load mnist data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# parameters init
l_r = 0.001
training_iters = 100000
batch_size = 128

n_inputs = 28
n_steps = 28
n_hidden_units = 128
n_classes = 10

# define placeholder for input
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_classes])

# define w and b
weights = {
    'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),
    'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))
}
biases = {
    'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))
}

pred = RNN(x, weights, biases)

print('the output of RNN is',pred.shape)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
train_op = tf.train.AdamOptimizer(l_r).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# init session
sess = tf.Session()
# init all variables
sess.run(tf.global_variables_initializer())
# start training

# for i in range(training_iters):
for i in range(training_iters):
    # get batch to learn easily
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    batch_x = batch_x.reshape([batch_size, n_steps, n_inputs])
    sess.run(train_op, feed_dict={x: batch_x, y: batch_y})
    if i % 50 == 0:
        print(sess.run(accuracy, feed_dict={x: batch_x, y: batch_y, }))
# test_data = mnist.test.images.reshape([-1, n_steps, n_inputs])
# test_label = mnist.test.labels
# print("Testing Accuracy: ", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

关于尺寸的输出结果为：

the shape of X is (?, 28, 28)
the shape of X_in is (?, 28, 128)
the shape of outputs is (128, 28, 128)
the shape of transpose outputs is (28, 128, 128)
the output of RNN is (128, 10)

the shape of X ： ? = 不确定的batch大小，第一个28 = n_steps，第二个28 = n_inputs

the shape of X_in ：? = batch，28= n_steps，128 = n_hidden_units

(注意：到这里开始，每个step输入的信息已经通过隐藏层转换成了hidden_units的值（我是这么理解的），所以下面的28均表示step的数量，这一点真的是坑啊，之前就是这一点没搞明白，整个代码都理解得糊里糊涂的）

the shape of outputs: 这个没啥好说的，三个数字对应的意义和X_in一致

the shape of transpose outputs is :这里只是调换了一下第0维度和第1维度。

得到这个结果之后把outputs unstack得到28个(128,128)的数组，第1个128 = batch，第2个128 = n_hidden_units

然后取最后一个timesteps的output，也就是outputs[-1]，与隐藏层的输出权重W(out)和输出偏置B(out)做matmul运算，得到这个batch（此处是128张图片）最后的运算结果。

the output of RNN: 128 = batch，10 = n_classes 也就是这128张图片在10个类别上分别的得分

后面再用softmax_entropy_with_logits(logits = pred, labels = y)balabala 计算cost (y.shape = (128,10),one-hot)

接着用Adam优化、计算并输出正确率....

这样最基础的单层LSTM的一套流程是走下来了。

LSTM内部的原理什么的我就不解释了，优秀的解释太多了～

碎碎念：

之前看了很多博客也看了一些知乎回答，看完了还是很糊涂，大家主要都还是解释LSTM的那几个门，但是这个输入格式搞不懂我后面的知识就有点建造空中楼阁的感觉。。可是这么一写完感觉也好简单呀....搞明白这么点东西居然花了我一下午时间= = 不过发现了一个快捷的方法，就是在代码里把觉得不清楚的量的shape输出来看一下，如果重合的数字很多，可以先设置成不一样的，这样比较容易对应。比如这道题里面就可以把batch = 10， n_hidden_units = 5，n_steps = 20，n_inputs = 28，这样就一目了然了。（虽然会报错但是对理清思路还是很有帮助的，反正tf是先画图再喂数据嘛）另外就是，一定要动笔算！！！