机器学习-逻辑回归多类分类（吴恩达识别手写数字）

最新推荐文章于 2024-12-27 21:28:15 发布

Oct11_F

最新推荐文章于 2024-12-27 21:28:15 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

文章标签：机器学习

本文链接：https://blog.youkuaiyun.com/Fred_18/article/details/89220555

逻辑回归专栏收录该内容

1 篇文章

订阅专栏

本文介绍了在机器学习领域，如何使用逻辑回归进行多类分类，以吴恩达的手写数字识别课程为例。重点讲解了reshape()函数的使用，该函数能够改变数据的行数和列数，常用于numpy和pandas的数据结构。在实例中，reshape()用于将标签数据y_i转化为5000x1的矩阵，以适应训练集的形状。

reshape()函数
（1）reshape()介绍
在创建DataFrame的时候常常使用reshape来更改数据的列数和行数，可用于numpy库里的ndarray和array结构以及pandas库里面的DataFrame和Series结构。

reshape（行，列）可以根据指定的数值将数据转换为特定的行数和列数，通常会遇见reshape(1, -1)or reshape(-1, 1)；

import numpy as np
import pandas as pd

在这里插入图片描述

-1在这里怎么理解呢？

根据numpy库官网介绍，-1被理解为unspecified value，意思是未指定为给定的，可以理解为一个正整数通配符，它代替任何整数。例如我只需要特定的行数，列数多少我无所谓，我只需要指定行数，那么列数直接用-1代替就行了，反之亦然。

（2）
在吴恩达老师手写数字识别中，有一段创建分类器的代码：

from scipy.optimize import minimize

def one_vs_all(X, y, num_labels, learning_rate):
    rows = X.shape[0]
    params = X.shape[1]
    
    # k X (n + 1) array for the parameters of each of the k classifiers
    all_theta = np.zeros((num_labels, params + 1))
    
    # insert a column of ones at the beginning for the intercept term
    X = np.insert(X, 0, values=np.ones(rows), axis=1)
    
    # labels are 1-indexed instead of 0-indexed
    for i in range(1, num_labels + 1):
        theta = np.zeros(params + 1)
        y_i = np.array([1 if label == i else 0 for label in y])
        y_i = np.reshape(y_i, (rows, 1))
        
        # minimize the objective function
        fmin = minimize(fun=cost, x0=theta, args=(X, y_i, learning_rate), method='TNC', jac=gradient)
        all_theta[i-1,:] = fmin.x
    
    return all_theta

其中**np.reshape(y_i, (rows, 1))**的作用是将y_i转换成5000x1的矩阵（训练集中有5000个20x20的样本）。

参考：
https://www.jianshu.com/p/d9df005636a6
https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html