Hands-On Machine Learning with Scikit-Learn and TensorFlow学习笔记(一)

本文探讨了机器学习的应用及分类,包括监督学习、非监督学习等类型,并通过实际案例展示了如何使用Scikit-Learn训练线性模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hands-On Machine Learning with Scikit-Learn and TensorFlow学习记录

机器学习真的是深入到了每一个学科。从今天起,博主将会一边看这本书一边做一些笔记。
本文内容大多基于此书。翻译参考git上的翻译
第0章引言部分,这里我们不多赘述。
第1章 机器学习概述

机器学习善于:

  • 需要进行大量手工调整或需要拥有长串规则才能解决的问题:机器学习算法通常可以简化代码、提高性能。
  • 问题复杂,传统方法难以解决:最好的机器学习方法可以找到解决方案。
  • 环境有波动:机器学习算法可以适应新数据。
  • 洞察复杂问题和大量数据

机器学习有多种类型,可以根据如下规则进行分类:

  • 是否在人类监督下进行训练(监督,非监督,半监督和强化学习)
  • 是否可以动态渐进学习(在线学习 vs 批量学习)
  • 它们是否只是通过简单地比较新的数据点和已知的数据点,或者在训练数据中进行模式识别,以建立一个预测模型,就像科学家所做的那样(基于实例学习 vs 基于模型学习)

监督/非监督学习

机器学习可以根据训练时监督的量和类型进行分类。主要有四类:监督学习、非监督学习、半监督学习和强化学习。

监督学习

在监督学习中,用来训练算法的训练数据包含了答案,称为标签

例子

垃圾邮件分类系统,预测目标数值(回归)等。
还有一些比较经典的例如K近邻算法,线性回归,逻辑回归,支持向量机(SVM),决策树和随机森林,神经网络。

非监督学习

与监督训练相反,训练数据是没有标签的。

例子

聚类:K 均值,层次聚类分析(Hierarchical Cluster Analysis,HCA),期望最大值
可视化和降维:主成分分析(Principal Component Analysis,PCA),核主成分分析,局部线性嵌入(Locally-Linear Embedding,LLE),t-分布邻域嵌入算法(t-distributed Stochastic Neighbor Embedding,t-SNE)
关联性规则学习:Apriori 算法,Eclat 算法

半监督学习

一些算法可以处理部分带标签的训练数据,通常是大量不带标签数据加上小部分带标签数据。这称作半监督学习

强化学习

这个就很厉害了强化学习

批量和在线学习

另一个用来分类机器学习的准则是,它是否能从导入的数据流进行持续学习。

批量学习

在批量学习中,系统不能进行持续学习:必须用所有可用数据进行训练。这通常会占用大量时间和计算资源,所以一般是线下做的。首先是进行训练,然后部署在生产环境且停止学习,它只是使用已经学到的策略。这称为离线学习。
如果你想让一个批量学习系统明白新数据(例如垃圾邮件的新类型),就需要从头训练一个系统的新版本,使用全部数据集(不仅有新数据也有老数据),然后停掉老系统,换上新系统。

在线学习

在在线学习中,是用数据实例持续地进行训练,可以一次一个或一次几个实例(称为小批量)。每个学习步骤都很快且廉价,所以系统可以动态地学习到达的新数据。
在线学习很适合系统接收连续流的数据(比如,股票价格),且需要自动对改变作出调整。如果计算资源有限,在线学习是一个不错的方案:一旦在线学习系统学习了新的数据实例,它就不再需要这些数据了,所以扔掉这些数据(除非你想滚回到之前的一个状态,再次使用数据)。这样可以节省大量的空间。
在线学习算法也可以当机器的内存存不下大量数据集时,用来训练系统(这称作核外学习,out-of-core learning)。算法加载部分的数据,用这些数据进行训练,重复这个过程,直到用所有数据都进行了训练。
在线学习可能是离线的,我们将它想象成持续学习比较合适。
在线学习系统的一个重要参数是,它们可以多快地适应数据的改变:这被称为学习速率。如果你设定一个高学习速率,系统就可以快速适应新数据,但是也会快速忘记老数据(你可不想让垃圾邮件过滤器只标记最新的垃圾邮件种类)。相反的,如果你设定的学习速率低,系统的惰性就会强:即,它学的更慢,但对新数据中的噪声或没有代表性的数据点结果不那么敏感。

基于实例 vs 基于模型学习

另一种分类机器学习的方法是判断它们是如何进行归纳推广的。大多机器学习任务是关于预测的。这意味着给定一定数量的训练样本,系统需要能推广到之前没见到过的样本。对训练数据集有很好的性能还不够,真正的目标是对新实例预测的性能。
这个不详细写了。
这里我们遇到了本书的第一部分代码。
也是第一个案例使用 Scikit-Learn 训练并运行线性模型。

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
# 加载数据
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t',
encoding='latin1', na_values="n/a")
# 准备数据
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# 可视化数据
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()
# 选择线性模型
lin_reg_model = sklearn.linear_model.LinearRegression()
# 训练模型
lin_reg_model.fit(X, y)
# 对塞浦路斯进行预测
X_new = [[22587]] # 塞浦路斯的人均GDP
print(lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]

上个例子就这样吧。
这次先这样,下次我们继续说机器学习目前遇到的主要挑战,并且继续说第二章的内容。

When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search. Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you may want to use it. Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!
When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search. Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you may want to use it. Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron English | 2017 | ISBN: 1491962291 | 566 Pages | EPUB | 8.41 MB Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how. By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn a range of techniques, starting with simple linear regression and progressing to deep neural networks. With exercises in each chapter to help you apply what you’ve learned, all you need is programming experience to get started. Explore the machine learning landscape, particularly neural nets Use scikit-learn to track an example machine-learning project end-to-end Explore several training models, including support vector machines, decision trees, random forests, and ensemble methods Use the TensorFlow library to build and train neural nets Dive into neural net architectures, including convolutional nets, recurrent nets, and deep reinforcement learning Learn techniques for training and scaling deep neural nets Apply practical code examples without acquiring excessive machine learning theory or algorithm details
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值