Python 机器学习(Machine Learning)入门篇(sklearn), 代码详解

本文详细介绍机器学习中的线性回归、逻辑回归等算法实践,通过房价预测和癌症分类案例,讲解正规方程、梯度下降及岭回归等优化方法,提供Python代码实现与模型评估技巧。
部署运行你感兴趣的模型镜像

该项目比较适合初学者了解机器学习的原理和一些实践, 代码主要借鉴于黑马程序员, 纯自己手打, 有很多自己的理解

python文件大致如下: 附上百度网盘连接

链接:https://pan.baidu.com/s/1uYkjcL6xa2xmPC9HK_TpDg 
提取码:a2th

不需要下载积分, 只希望如果有用的话大家多评论和发表看法, 多点赞, 谢谢大家

下面是部分截图内容和大致文件格式

注:(已经下载好数据集, 不需要自己去找数据了, 方便了不少, 能直接使用)

代码都有注释

下面附上一小部分代码:

线性回归:

# 线性模型包括线性关系和非线性关系两种
# 线性模型包括参数一次幂和自变量一次幂
# 线性关系一定是线性模型, 反之不一定
# 优化方法有两种: 一种是正规方程, 第二种是梯度下降

# 这部分用来训练预测房价
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge, RidgeCV
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error  # 均方误差
"""
Author: Siliang Liu
15/08/2020
Reference itheima.com
"""


def load_data():
    boston_data = load_boston()
    print("特征数量为:(样本数,特征数)", boston_data.data.shape)
    x_train, x_test, y_train, y_test = train_test_split(boston_data.data,
                                                        boston_data.target, random_state=22)
    return x_train, x_test, y_train, y_test


# 正规方程
def linear_Regression():
    """
    正规方程的优化方法
    不能解决拟合问题
    一次性求解
    针对小数据
    :return:
    """
    x_train, x_test, y_train, y_test = load_data()
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = LinearRegression()
    estimator.fit(x_train, y_train)

    print("正规方程_权重系数为: ", estimator.coef_)
    print("正规方程_偏置为:", estimator.intercept_)

    y_predict = estimator.predict(x_test)
    error = mean_squared_error(y_test, y_predict)
    print("正规方程_房价预测:", y_predict)
    print("正规方程_均分误差:", error)
    return None


# 梯度下降
def linear_SGDRegressor():
    """
    梯度下降的优化方法
    迭代求解
    针对大数据
    :return:
    """
    x_train, x_test, y_train, y_test = load_data()
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 建议看下这个函数的api, 这些值都是默认值
    # estimator = SGDRegressor(loss="squared_loss", fit_intercept=True, eta0=0.01,
    #                          power_t=0.25)

    estimator = SGDRegressor(learning_rate="constant", eta0=0.01, max_iter=10000)
    # estimator = SGDRegressor(penalty='l2', loss="squared_loss")  # 这样设置就相当于岭回归, 但是建议用Ridge方法
    estimator.fit(x_train, y_train)

    print("梯度下降_权重系数为: ", estimator.coef_)
    print("梯度下降_偏置为:", estimator.intercept_)

    y_predict = estimator.predict(x_test)
    error = mean_squared_error(y_test, y_predict)
    print("梯度下降_房价预测:", y_predict)
    print("梯度下降_均分误差:", error)

    return None


def linear_Ridge():
    """
    Ridge: 岭回归方法
    :return:
    """
    x_train, x_test, y_train, y_test = load_data()
    transfer = StandardScaler()  # 建议使用标准化处理数据
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = Ridge(max_iter=10000, alpha=0.5)  # 岭回归
    # estimator = RidgeCV(alphas=[0.1, 0.2, 0.3, 0.5])  # 加了交叉验证的岭回归
    estimator.fit(x_train, y_train)

    print("岭回归_权重系数为: ", estimator.coef_)
    print("岭回归_偏置为:", estimator.intercept_)

    y_predict = estimator.predict(x_test)
    error = mean_squared_error(y_test, y_predict)
    print("岭回归_房价预测:", y_predict)
    print("岭回归_均分误差:", error)

    return None


if __name__ == '__main__':
    linear_Regression()
    linear_SGDRegressor()
    linear_Ridge()

逻辑回归:

# 逻辑回归一般是二分类问题
"""
这一部分用逻辑回归来分类breast是否良性
这里需要注意的有一下:
    LogisticRegression方法相当于SGDClassifier(loss="log",penalty=" ")
    SGDClassifier实现了一个普通的随机梯度下降学习,
    也支持平均梯度下降ASGD, 可以设置average=True来开启
"""
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
from sklearn_learning.model_load_store.Util_model import *
"""
Author: Siliang Liu
15/08/2020
Reference itheima.com
"""


def load_data():
    """
    先获取数据
    处理数据
        有缺失值
    数据集划分 测试 训练
    特征工程
        无量纲化-标准化(不要用归一化 之前有笔记)
    逻辑回归预估器
    模型评估
    :return x_train, x_test, y_train, y_test:
    """
    column_name = ['Sample code number', 'Clump Thickness',
                   'Uniformity of Cell Size', 'Uniformity of Cell Shape', 'Marginal Adhesion',
                   'Single Epithelial Cell Size',
                   'Bare Nuclei', 'Bland Chromatin',
                   'Normal Nucleoli', 'Mitoses', 'Class']
    # # 网上直接下载
    # path = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
    # original_data = pd.read_csv(path, names=column_name)

    # 文件读取
    original_data = pd.read_csv("../../resources/cancer/breast-cancer-wisconsin.data", names=column_name)

    # 缺失值处理
    # 第一步先替换 ? 为 nan
    data = original_data.replace(to_replace="?", value=np.nan)
    # 第二步可以选择前面笔记里面自己写的的过滤nan也可以用简单的方法如下:
    data.dropna(inplace=True)
    print("检测是否还有缺失值(全为false表示没有缺失值)\n", data.isnull().any())  # 检测是否还有缺失值

    # 第三步 筛选特征值和目标值
    x = data.iloc[:, 1:-1]  # 表示每一行数据都要, 从第一列到倒数第二列的column字段也要
    y = data["Class"]
    x_train, x_test, y_train, y_test = train_test_split(x, y)
    return x_train, x_test, y_train, y_test


def logic_Regression():
    """
    逻辑回归的真实值是分类, 也就是是否属于某一个类别,和线性回归不一样
    线性回归损失函数: (y_predict-y_true)平方和/总数
    逻辑回归损失函数: 对数似然损失(https://blog.youkuaiyun.com/u014182497/article/details/82252456)
    逻辑回归用sigmoid函数为例子: 需要把结果映射到sigmoid函数上
    分两种情况:(见截图) y轴表示损失值, 横轴x表示映射结果.(分段函数)
    当真实值为1 见 对数似然损失-1.png
    当真实值为0 见 对数似然损失-2.png
    不难理解,需要对着图看
    逻辑回归损失值得到后需要用梯度下降来优化
    后续就差不多
    :return:
    """

    x_train, x_test, y_train, y_test = load_data()
    # 第四步: 开始特征工程
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    # 第五步, 预估器流程
    estimator = LogisticRegression()  # 默认参数
    estimator.fit(x_train, y_train)
    print("逻辑回归_权重系数为: ", estimator.coef_)
    print("逻辑回归_偏置为:", estimator.intercept_)

    # store_model(estimator, "logic_regression_model01.pkl")  # 保存模型
    # estimator = load_model("logic_regression_model01.pkl")  # 加载模型

    # 第六步, 模型评估
    y_predict = estimator.predict(x_test)
    print("逻辑回归_预测结果", y_predict)
    print("逻辑回归_预测结果对比:", y_test == y_predict)
    score = estimator.score(x_test, y_test)
    print("准确率为:", score)
    # 2是良性的 4是恶性的
    """
    但是实际上这个预测结果不是我们想要的, 以上只能说明预测的正确与否,
    而事实上, 我们需要一种评估方式来显示我们对恶性breast的预测成功率, 也就是召回率
    同时可以查看F1-score的稳健性
    (召回率和精确率看笔记和截图)
    所以下面换一种评估方法
    """

    Score = classification_report(y_test, y_predict, labels=[2, 4],
                                  target_names=["良性", "恶性"])
    print("查看精确率,召回率,F1-score\n", Score)
    # support表示样本量

    """
    ROC曲线和AUC指标(样本分类不均衡的情况下,可以使用这种方法)
    AUC = 0.5 是瞎猜模型
    AUC = 1 是最好的模型
    AUC < 0.5 属于反向毒奶
    更多的看截图
    """
    # 需要转换为0,1表示
    y_true = np.where(y_test > 3, 1, 0)  # 表示大于3为1,反之为0(class值为2和4)
    return_value = roc_auc_score(y_true, y_predict)
    print("ROC曲线和AUC返回值为(三角形面积)", return_value)

    fpr, tpr, thresholds = roc_curve(y_true, y_predict)
    plt.plot(fpr, tpr)
    plt.show()
    return None


if __name__ == '__main__':
    logic_Regression()

希望对大家都有帮助!!!

您可能感兴趣的与本文相关的镜像

Python3.11

Python3.11

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

About This Book, Leverage Python' s most powerful open-source libraries for deep learning, data wrangling, and data visualization, Learn effective strategies and best practices to improve and optimize machine learning systems and algorithms, Ask – and answer – tough questions of your data with robust statistical models, built for a range of datasets, Who This Book Is For, If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource., What You Will Learn, Explore how to use different machine learning models to ask different questions of your data, Learn how to build neural networks using Keras and Theano, Find out how to write clean and elegant Python code that will optimize the strength of your algorithms, Discover how to embed your machine learning model in a web application for increased accessibility, Predict continuous target outcomes using regression analysis, Uncover hidden patterns and structures in data with clustering, Organize data using effective pre-processing techniques, Get to grips with sentiment analysis to delve deeper into textual and social media data, Style and approach, Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值