深度解析：从MNIST到多类分类：SGD、随机森林与深度理解-优快云博客

本文链接：https://blog.youkuaiyun.com/DuLNode/article/details/120144085

多种分类以及模型评估

分类

分类

获取mnist数据集

from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1)
mnist.keys()

运行结果：
mnist_key
其中：
DESCR:描述数据集
data:包含一个数组，每个实例一行，每个特征一列
target:包含一个带标记的数组

获取训练数据和标签

X, y = mnist['data'], mnist['target']

import matplotlib.pyplot as plt
import matplotlib as mpl

some_digit = np.array(X)[0]
some_digit_image = some_digit.reshape(28, 28)

plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
plt.show()

显示第0个图片
0图片

数据标准化及数据集划分

因为标签是字符型的，现在将字符型转换成无符号8位整型

y = y.astype(np.uint8)

mnist数据集已经分好了训练集（前60000）和测试集（后10000）这里直接分离就行

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

训练二分类器

划分数据集

这里是将原来的0-9数据集按照5或非5进行划分

y_train_5 = (y_train == 5)  # 是5为1，非5为0
y_test_5 = (y_test == 5)

随机梯度下降分类

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42)  # random_state=42是将随机值设置为42,这里也可以换做其他数值
sgd_clf.fit(X_train, y_train_5)  # 训练
sgd_clf.predict([some_digit])  # some_digit这个图片是之前plot的那个5的图片

运行结果：
在这里插入图片描述

性能测试

使用交叉验证测量准确率

k折分层抽样：

from sklearn.model_selection import StratifiedKFold  # K折分层抽样
from sklearn.base import clone

skfolds = StratifiedKFold(n_splits=3)  # 分成3折

for train_index, test_index in skfolds.split(X_train, y_train_5):  # 这还是那个5和非5的分类器
    clone_clf = clone(sgd_clf)  # 克隆训练好的sgd_clf(随机梯度下降分类器)
    # 划分训练集
    X_train_flods = np.array(X_train)[train_index]
    y_train_flods = y_train_5[train_index]
    # 划分验证集
    X_test_flods = np.array(X_train)[test_index]
    y_test_flods = y_train_5[test_index]
    
    clone_clf.fit(X_train_flods, y_train_flods)  # 训练一折中的训练数据
    y_pred = clone_clf.predict(X_test_flods)  # 预测一折中的验证数据
    n_correct = sum(y_pred == y_test_flods)
    print(n_correct / len(y_pred))

运行结果：
在这里插入图片描述
交叉验证：

from sklearn.model_selection import cross_val_score  # 交叉验证
cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")

运行结果：
在这里插入图片描述

傻瓜版分类器

from sklearn.base import BaseEstimator

class Never5Classifier(BaseEstimator):  # 傻瓜版的分类器
    def fit(self, X, y=None):
        return self  # 这个训练其实就是没训练
    def predict(self, X):
        return np.zeros((len(X), 1), dtype=bool)  # 这个预测是无论输入什么都归为0
        
never_5_clf = Never5Classifier()
cross_val_score(never_5_clf, X_train, y_train_5, cv=