数据的标准化与归一化

最新推荐文章于 2024-07-04 20:26:48 发布

原创最新推荐文章于 2024-07-04 20:26:48 发布 · 311 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#标准化 #归一化

模式识别专栏收录该内容

5 篇文章

订阅专栏

标准化与归一化的方法

标准化：

即是将数据处理为均值为0，标准差为1的形式。
$\frac{x - x.mean()}{x.std()}$

归一化

将数据处理为分布在 $[0, 1]$ 的范围。
$\frac{x - x.min()}{x.max() - x.min()}$

标准化与归一化均可以用做数据的预处理的手段，且标准化可以避免数据中有若干特别大或者特别小的值造成处理效果不理想。

Show me the code

导包

# 数据预处理模块
from sklearn import preprocessing

import numpy as np
import matplotlib.pyplot as plt

生成数据

# 生成一个n_points点的序列的序列
n_points = 10
ar = np.random.random(n_points).astype("float64")

标准化

# 手写标准化
ar_by_us = (ar - np.mean(ar)) / (np.std(ar))

# 库函数标准化
ar_by_library = preprocessing.StandardScaler().fit_transform(ar.reshape(-1, 1))

print("左侧手写标准化，右侧库函数标准化：")
for i in range(n_points):
    print(i,"  ----  ",  ar_by_library[i, 0], ar_by_us[i])

归一化

# 手写归一化
ar_by_us = (ar - ar.min()) / (ar.max() - ar.min())

# 库函数归一化
ar_by_library = preprocessing.minmax_scale(ar)

print("左侧手写归一化，右侧库函数归一化：")
for i in range(n_points):
    print(i,"  ----  ",  ar_by_library[i], ar_by_us[i])