tensorflow学习记录10——数据准备（六）

机器不学习我也不学习

已于 2025-04-02 13:58:18 修改

阅读量428

点赞数 3

文章标签：学习

于 2025-03-26 13:47:19 首次发布

本文链接：https://blog.youkuaiyun.com/m0_58297458/article/details/146527560

版权

二、数据预处理

前两节加载数据后，可能需要对数据进行数据预处理操作，不同格式分开展示。

将原始数据转换为适合分析和建模的格式，提高数据质量，确保模型的准确性和可靠性。

4、数据集预处理

tf.cast（）函数，通过该函数可改变张量的数据类型：

tf.cast(x,dtype,name=None）#其中x为要转换内容，dtype为转换格式，name默认为None。

tf.cast()中的tensor为全部内容时，完成的是所有tensor数据格式的转换
tf.cast()中的tensor为条件内容时，完成的是掩膜形式的数据格式转换

代码：

import tensorflow as tf
from tensorflow.keras import datasets  # 导入经典数据集

# 加载自带数据集
(x, y), (x_test, y_test) = datasets.mnist.load_data()# 加载MNIST数据集
print('x:', x.shape, 'y:', y.shape, 'x_test:', x_test.shape, 'y_test:', y_test.shape)

# 将加载的数据转换为Dataset对象
train = tf.data.Dataset.from_tensor_slices((x, y))
test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
# MNIST数据集预处理
train = train.shuffle(10000)  # 随机打散样本，不会打乱样本与标签的映射关系，这样做可以避免模型在训练过程中学习到数据的顺序，提高模型的泛化能力
# 自定义预处理函数
def train_preprocess(x, y):#有些项目需要对训练数据做一些图像增强的操作
    # 调用此函数时会自动传入x、y对象
    # 标准化到0-1
    x = tf.cast(x, dtype=tf.float32) / 255.
    x = tf.reshape(x, [-1, 28*28])
    y = tf.cast(y, dtype=tf.int32)  # 转成整型张量
    
    y = tf.one_hot(y, depth=10)# one_hot接受的输入为int32,输出为float32
    # 返回的x、y将替换传入的x、y参数，从而实现数据的预处理功能
    return x, y

def test_preprocess(x_test, y_test):
    x_test = tf.cast(x_test, dtype=tf.float32) / 255.
    y_test = tf.cast(y_test, dtype=tf.int32)
    return x_test, y_test

# 在preprocess函数中实现预处理函数，使用map()方法将预处理函数应用到训练集和测试集的每个样本上，实现数据的预处理。传入函数名即可
train = train.map(train_preprocess)
test = test.map(test_preprocess)