tensorflow中dense中的kernel的初始值是怎么实现的_Tensorflow中的高级Keras API简介-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_34086714/article/details/113669153

本文介绍了如何使用Tensorflow的高级Keras API构建和训练神经网络模型，包括数据准备、模型构建、编译、训练、正则化以及模型评估与保存。通过实例展示了Dense层的使用，讨论了过拟合问题及L1、L2正则化和dropout技术来防止过拟合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Tensorflow是用于深度学习模型最著名的库。但是，Tensorflow不是那么好，而且学习曲线很长。为了解决这个问题，Tensorflow的高级Keras API提供了构建块，可以更轻松地创建和训练深度学习模型。此外，Keras模型是通过将可配置的构建块连接在一起而制成的，几乎没有限制，这使得它更加模块化。

分类

要使用高级的Keras API进行简单分类，需要遵循的一些步骤是：

导入所需的模块。
将数据准备为api提供合适的格式。
使用tf.keras(Tensorflow-Keras)apis构建神经网络模型并对其进行编译。
使用准备好的数据训练模型，同时尝试解决欠拟合和过度拟合的情况。
评估模型。
保存和恢复模型便于以后使用。

导入所需的模块

首先需要导入numpy和pandas，因为它们是数据处理和准备所必需的。必须导入Tensorflow api和高级keras api来进行低级操作和模型构建。必须导入Matplotlib，以便对性能和准确性等进行图形分析。

# TensorFlow and tf.kerasimport tensorflow as tffrom tensorflow import kerasimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt%matplotlib inline''' %matplotlib inline With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.'''

数据准备

数据准备阶段获取原始数据，使其看起来结构化，消除其中的噪声，并更改适合您所设计的模型的数据格式和形状。数据可以是不同格式的，比如图像数据与文本数据不同，两者都需要不同的处理和预处理。例如，如果我们使用mnist_fashion数据来制作分类器来对服装进行分类，则可以按如下方式进行数据准备：

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',  'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']import pandas as pdtrain_df = pd.read_csv('data/fashion-mnist_train.csv',sep=',')test_df = pd.read_csv('data/fashion-mnist_test.csv', sep = ',')train_df.head()''' we need to convert the dataframes into numpy arrays of float32 type which is the acceptable form for tensorflow and keras. '''train_data = np.array(train_df, dtype = 'float32')test_data = np.array(test_df, dtype = 'float32')'''We scale the pixel values to a range of 0 to 1 before feeding to the neural network model. (presently they are from 0-255)'''x_train = train_data[:,1:]/255y_train = train_data[:,0]x_test= test_data[:,1:]/255y_test=test_data[:,0]

构建神经网络模型

神经网络的基本构建块是层。层从输入的数据中提取表示。例如：

model = keras.Sequential([ keras.layers.Flatten(input_shape=(784,)), keras.layers.Dense(128, activation=tf.nn.relu), keras.layers.Dense(10, activation=tf.nn.softmax)])

网络由两个tf.keras.layers.Dense层组成。这些是密集连接或完全连接的神经层。

第一个Dense层有128个节点(或神经元)。

第二个(也是最后一个)层是一个10节点的softmax层 - 它返回一个10个概率分数的数组，总和为1。每个节点包含一个分数，表示当前图像属于10个类别中的一个概率。

在模型准备好进行训练之前，它需要更多设置。这些是在模型的编译步骤中添加的：

损失函数：用来度量模型在训练过程中的准确性。我们希望最小化这个函数，以便在正确的方向上“引导”模型。

优化器：这是根据所看到的数据及其损失函数更新模型的方式。

度量标准：用于监视训练和测试步骤。下面的例子使用了准确度，即正确分类的图像的比例。

'''When doing multi-class classification, categorical cross entropy loss and sparse categorical cross entropy is used a lot. To compare them both read https://jovianlin.io/cat-crossentropy-vs-sparse-cat-crossentropy/'''model.compile(optimizer=tf.train.AdamOptimizer(),  loss='sparse_categorical_crossentropy', metrics=['accuracy'])

训练模型

神经网络模型的训练需要以下步骤：

将训练数据提供给模型 - 在本例中为train_images和train_labels数组。
模型学会了关联图像和标签。
我们要求模型对测试集进行预测。在本示例中，是test_images数组。我们验证预测是否与test_labels数组中的标签匹配。

model.fit(x_train, y_train, epochs=10)test_loss, test_acc = model.evaluate(x_test, y_test)print('Test accuracy:', test_acc)print('Test loss:', test_loss)10000/10000 [==============================] - 1s 52us/stepTest accuracy: 0.8963Test loss: 0.3374745888918638

训练模型时的一个主要问题是过拟合和过拟合。通过充分的训练，可以避免装配不足。为了避免过拟合，两种解决方案分别是增加权值正则化和增加丢包。

添加权重正则化

减轻过拟合的一种常见方法是通过强制权值取较小的值来约束网络的复杂性，这使得权值的分布更加“规则”。这被称为“权值正则化”，它是通过在网络的损失函数中加入与具有较大权值相关的成本来实现的。这个成本有两种：

L1正则化，其中增加的成本与权重系数的绝对值成正比(即与权重的“L1范数”成正比)。

L2正则化，其中增加的成本正比于权重系数值的平方(即权重的“L2范数”)。L2正则化在神经网络中也称为权值衰减。别让不同的名字迷惑了:权值衰减在数学上和L2正则化是完全一样的。

唯一需要的更改是在模型体系结构中：

l2_model = keras.models.Sequential([ keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001), activation=tf.nn.relu, input_shape=(NUM_WORDS,)), keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001), activation=tf.nn.relu), keras.layers.Dense(1, activation=tf.nn.sigmoid)])

添加 dropout

dropout是由Hinton和他在多伦多大学的学生开发的最有效和最常用的神经网络正则化技术之一。

Dropout应用于一个层，由训练过程中随机“dropping out”层的一些输出特征组成。假设给定的层在训练期间通常会为给定的输入样本返回一个向量[0.2,0.5,1.3,0.8,1.1]; 在应用dropout之后，这个向量会随机分布一些零项，例如[0,0.5,1.3,0,1.1]。

“dropout率”是特征被归零的部分；它通常设置在0.2到0.5之间。在测试时，没有单位被删除，而是将层的输出值按比例缩小等于dropout率的因子，以便平衡更多单位有效的事实而不是在训练时间。

此外，唯一需要的更改是在模型体系结构中。

dpt_model = keras.models.Sequential([ keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)), keras.layers.Dropout(0.5), keras.layers.Dense(16, activation=tf.nn.relu), keras.layers.Dropout(0.5), keras.layers.Dense(1, activation=tf.nn.sigmoid)])

模型评估

有许多度量标准来评估分类、回归、聚类等。这里我们将使用准确性来度量分类。(准确度，召回率，F-measure和准确度主要用于分类器性能测量的主要指标)。

test_loss, test_acc = model.evaluate(x_test, y_test)print('Test accuracy:', test_acc)print('Test loss:', test_loss)10000/10000 [==============================] - 1s 52us/stepTest accuracy: 0.8963Test loss: 0.3374745888918638

我们还可以看到在模型得到训练时度量标准是如何变化的。为此，需要保存模型的历史记录，并绘制图表，以显示训练是如何进行的趋势。

history = model.fit(train_data, train_labels, epochs=50, batch_size=512, validation_split=0.2)acc = history.history['acc']val_acc = history.history['val_acc']loss = history.history['loss']val_loss = history.history['val_loss']epochs = range(1, len(acc) + 1)plt.plot(epochs, acc, 'bo', label='Training Acc')plt.plot(epochs, val_acc, 'b', label='Validation Acc')plt.title('Training and validation Acc')plt.xlabel('Epochs')plt.ylabel('Acc')plt.legend()plt.show()

模型保存和恢复

可以使用tf.keras.Model api 在tensorflow中保存和恢复模型。

model.compile(optimizer=tf.train.AdamOptimizer(),  loss='sparse_categorical_crossentropy', metrics=['accuracy'])# Save weights to a TensorFlow Checkpoint filemodel.save_weights('./weights/my_model')# Restore the model's state,# this requires a model with the same architecture.model.load_weights('./weights/my_model')'''It also can be saved in keras HDF5 format'''# Save weights to a HDF5 filemodel.save_weights('my_model.h5', save_format='h5')# Restore the model's statemodel.load_weights('my_model.h5')#The model arch can also be saved to a json file# Serialize a model to JSON formatjson_string = model.to_json()json_string#Then it can be restored like belowlatest_model = tf.keras.models.model_from_json(json_string)