讲解tf.estimator.Estimator tf.layers等高级API实现一个CNN

最新推荐文章于 2022-10-31 19:15:36 发布

原创最新推荐文章于 2022-10-31 19:15:36 发布 · 878 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#tf.layers.dropout #tf.layers.conv2d #tf.contrib.layers.flatten #tf.layers.max_pooling2d #CNN

tensorflow 学习专栏收录该内容

30 篇文章

订阅专栏

本文详细介绍如何使用TensorFlow的tf.layers模块构建卷积神经网络(CNN)，包括卷积层、池化层、Dropout层的参数解析及代码实现，通过MNIST数据集进行训练和评估。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

tf.contrib.layers.flatten

假设输入（inputs）的第一个维度表示batch_size。在保持batch_size的同时,使输入的shape变为: [batch_size, k]

tf.contrib.layers.flatten(
    inputs,
    outputs_collections=None,
    scope=None
)

'''
Args:
	inputs: A tensor of size [batch_size, ...].
	outputs_collections: Collection to add the outputs.
	scope: Optional scope for name_scope.
Returns:
     A flattened tensor with shape [batch_size, k].
'''

tf.layers.conv2d

大部分内容来自这篇博客
2D 卷积层的函数接口
这个层创建了一个卷积核，将输入进行卷积来输出一个 tensor。如果 use_bias 是 True（且提供了 bias_initializer），则一个偏差向量会被加到输出中。最后，如果 activation 不是 None，激活函数也会被应用到输出中。

tf.layers.conv2d(
    inputs,
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    data_format='channels_last',
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer=None,
    bias_initializer=tf.zeros_initializer(),
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    trainable=True,
    name=None,
    reuse=None
)
'''
Arguments:
		inputs: Tensor input.
		filters: Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
		kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
		strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
		padding: One of "valid" or "same" (case-insensitive).
		data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
		
		dilation_rate: An integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. Can be a single integer to specify the same value for all spatial dimensions. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any stride value != 1.
		
		activation: Activation function. Set it to None to maintain a linear activation.
		
		use_bias: Boolean, whether the layer uses a bias.
		
		kernel_initializer: An initializer for the convolution kernel.
		
		bias_initializer: An initializer for the bias vector. If None, the default initializer will be used.
		
		kernel_regularizer: Optional regularizer for the convolution kernel.
		
		bias_regularizer: Optional regularizer for the bias vector.
		
		activity_regularizer: Optional regularizer function for the output.
		
		kernel_constraint: Optional projection function to be applied to the kernel after being updated by an Optimizer (e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected variable and must return the projected variable (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.
		
		bias_constraint: Optional projection function to be applied to the bias after being updated by an Optimizer.
		
		trainable: Boolean, if True also add variables to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
		
		name: A string, the name of the layer.
		
		reuse: Boolean, whether to reuse the weights of a previous layer by the same name.

Returns:
   Output tensor.
'''

主要参数说明

inputs: Tensor input.
filters: 就是卷积核的个数是一个整数
kernel_size: 卷积窗口的大小 . 1个整数或2个整数的元组/列表，指定2D卷积窗口的 高度和宽度 。若为单个整数则表示高度和宽度方向都是该值。（因为是2维度卷积，所以只在高，宽方向上进行卷积）–
strides：和kernel_size的参数格式相同，不过这个参数表示的是卷积窗口沿着高和宽移动的“步数”。另外， strides 不等于1 和 dilation_rate 不等于1 这两种情况不能同时存在。
其他的具体参数请看英文的代码块

tf.layers.max_pooling2d

tf.layers.max_pooling2d(
    inputs,
    pool_size,
    strides,
    padding='valid',
    data_format='channels_last',
    name=None
)
'''
Arguments:
		inputs: The tensor over which to pool. Must have rank 4.
		pool_size: An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to specify the same value for all spatial dimensions.
		strides: An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.
		padding: A string. The padding method, either 'valid' or 'same'. Case-insensitive.
		data_format: A string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
		name: A string, the name of the layer.
Returns:
     Output tensor.

'''

说明

参数的含义和上面的tf.layers.conv2d几乎一样**（pool_size：表示的池化窗口的大小，也是只在高和宽的方向上移动池化窗口）**

tf.layers.dropout（请千万注意这个方法的参数rate的含义）

tf.layers.dropout(
    inputs,
    rate=0.5,
    noise_shape=None,
    seed=None,
    training=False,
    name=None
)

和tf.nn.dropout所实现的功能完全一样，具体的请看本人这篇博客
唯一需要注意的地方有两点：

参数rate表示的是drop_prob而不是keep_prob
参数training：当其是True时，对输入实施dropout，当是False时，不起dropout不起作用

用以上知识点实现一个CNN

#!/usr/bin/env python
# coding: utf-8


from __future__ import division,print_function,absolute_import


# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)

import tensorflow as tf

import matplotlib.pyplot as plt
import numpy as np
#训练参数
learning_rate =0.001
num_steps=2000
batch_size=128

#网络参数
num_input =784
num_classes=10
dropout=0.25


def conv_net(x_dict,n_classes,dropout,reuse,is_training):
    with tf.variable_scope('ConvNet',reuse=reuse):
        x=x_dict['image']
        x=tf.reshape(x,[-1,28,28,1])
        conv1=tf.layers.conv2d(x,32,5,activation=tf.nn.relu)
        conv1=tf.layers.max_pooling2d(conv1,2,2)
        
        conv2=tf.layers.conv2d(conv1,64,3,activation=tf.nn.relu)
        conv2=tf.layers.max_pooling2d(conv2,2,2)
        
        #将最后一个卷积得到的多个通道进行flatten
        fc1=tf.contrib.layers.flatten(conv2)
        fc1=tf.layers.dense(fc1,1024)  #这是定义一个全连接 参数：输入，和神经网络的输出
        fc1=tf.layers.dropout(fc1,rate=dropout,training=is_training)
        out=tf.layers.dense(fc1,n_classes)
    return out
        
def model_fn(features,labels,mode):
    logits_train=conv_net(features,num_classes,dropout,reuse=False,is_training=True)
    logits_test=conv_net(features,num_classes,dropout,reuse=True,is_training=False)
    
    pred_classes=tf.argmax(logits_test,-1)
    pred_probas=tf.nn.softmax(logits_test)
    
    if mode==tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode,predictions=pred_classes)
     loss_op=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits_train,labels=tf.cast(labels,tf.int32)))
    optimizer=tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op=optimizer.minimize(loss_op)
    
    accury_op=tf.metrics.accuracy(labels=labels,predictions=pred_classes)
    
    return tf.estimator.EstimatorSpec(
        mode,
        predictions=pred_classes,
        loss=loss_op,
        train_op=train_op,
        eval_metric_ops={"accuracy":accury_op}
        
    )
    
model=tf.estimator.Estimator(model_fn)
input_fn=tf.estimator.inputs.numpy_input_fn(x={'image':mnist.train.images},
                                            y=mnist.train.labels,
                                            batch_size=batch_size,
                                            num_epochs=None,
                                            shuffle=True)
model.train(input_fn,steps=num_steps)
# Ealuate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
model.evaluate(input_fn)




# Predict single images
n_images = 4
# Get images from test set
test_images = mnist.test.images[:n_images]
# Prepare the input data
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': test_images}, shuffle=False)
# Use the model to predict the images class
preds = list(model.predict(input_fn))

# Display
for i in range(n_images):
    plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')
    plt.show()
    print("Model prediction:", preds[i])