The Fundamentals of Machine Learning

本文介绍了机器学习的基础概念,包括监督学习、非监督学习、半监督学习及增强学习的不同类型。探讨了不同学习方式的特点,如批学习和在线学习,并讨论了实例与模型为基础的学习方法。此外,还分析了机器学习面临的挑战,如数据不足、过拟合等问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

What is Machine Learning?

  1. Machine Learning is the science(and art) of programming computers so they can learn from data.
  2. ML is the field of study that gives computers the ability to learn without being explicitly programmed.---Arthur Sammuel ,1959
  3. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.---Tom Mitchell, 1997


Type of ML.

从是否监督角度分类

监督学习Supervised Learning

训练集中的数据被认为的设置好标签。例如垃圾邮件管理器中,用户标记的垃圾邮件作为训练集。

  • k-Nearest Neighbors

  • Linear Regression

  • Logistic Regression

  • Support Vector Machines

  • Decision Trees and Random Forests

  • Netural networks

非监督学习UnSupervised Learning

训练集未标记

  • Clustering(k-Means,Hierarchical Cluster Analysis,Expectation Maximization)

  • Visualization and dimensionality reduction( Principal Component Analysis, Kernel PCA , Locally-Linear Embedding , t-distributed Stochastic Neighbor Embedding)

  • Association rule learning(Apriori , Eclat)

半监督学习Semisupervised Learning

部分训练集被标记

增强学习Reinforcement Learning

学习系统称之为Agent,根据Agent的选择,基于rewards和penalties。AlphaGO就是如此。

从训练过程角度

批学习Batch Learning

线下学习(offline learning),顾名思义。对于经常更新的数据不适合,训练需要巨大的资源开支。随着数据更新,训练集会越来越大。

线上学习Online Learning

将数据分割成mini-batches,在计算机资源紧张时很实用,可以删除训练过的min-bathes,并且可以replay到之前的状态。out-of-core learning:训练数据量远大于计算机内存的学习。

从训练逻辑角度

基于样例的学习Instance-based learning

系统随“心”学习,对新数据采用相似度比较的方式度量。例如一个垃圾邮件评判系统,如果训练集中的邮件字数都是单数,那么系统可能会认为字数为单数的邮件都是垃圾邮件。

基于模型的学习Model-based learning

人工选择一个模型,例如 人民满意度 = a * 年收入 + b , 即是一个线性模型。要设计“评判模型参数适合度的标准”,来评价当前模型参数的好坏。


Challenges of Machine Learning

  • 训练数据量不足(只要数据量上去了,各种算法的表现都提升)

  • 训练数据没代表性(或者训练数据有偏见)

  • 训练数据质量差(应当清除数据中的errors,outliers,noise)

  • 抓取了无关特征(特征抽取:将现有特征融合成更有用的特征)

  • 过拟合(系统对于训练集训练过度,认为一些无关紧要的内容也是特征,导致在测试集中表现差。通常的操作是,给模型以约束,简化,在训练集中剔除噪声等正常化操作。)

  • 欠拟合(对策是选取更强大的模型,选用更好的特征,减少模型的约束)


Testing and Validating

训练集,测试集八二分成。

仅以测试集的成绩调试,会导致模型和超参(例如人民满意度例子中的a和b)对测试集过拟合。

故,数等分训练集,任选部分为子训练集和验证集,以验证集结果调试模型和超参数。

 

Machine Learning Fundamentals: Use Python and scikit-learn to get up and running with the hottest developments in machine learning By 作者: Hyatt Saleh ISBN-10 书号: 1789803551 ISBN-13 书号: 9781789803556 出版日期: 2018-11-29 pages 页数: (426) As machine learning algorithms become popular, new tools that optimize these algorithms are also developed. Machine Learning Fundamentals explains you how to use the syntax of scikit-learn. You’ll study the difference between supervised and unsupervised models, as well as the importance of choosing the appropriate algorithm for each dataset. You’ll apply unsupervised clustering algorithms over real-world datasets, to discover patterns and profiles, and explore the process to solve an unsupervised machine learning problem. The focus of the book then shifts to supervised learning algorithms. You’ll learn to implement different supervised algorithms and develop neural network structures using the scikit-learn package. You’ll also learn how to perform coherent result analysis to improve the performance of the algorithm by tuning hyperparameters. By the end of this book, you will have gain all the skills required to start programming machine learning algorithms. Contents What You Will Learn Understand the importance of data representation Gain insights into the differences between supervised and unsupervised models Explore data using the Matplotlib library Study popular algorithms, such as k-means, Mean-Shift, and DBSCAN Measure model performance through different metrics Implement a confusion matrix using scikit-learn Study popular algorithms, such as Naïve-Bayes, Decision Tree, and SVM Perform error analysis to improve the performance of the model Learn to build a comprehensive machine learning program Authors Hyatt Saleh After graduating from college as a business administrator, Hyatt Saleh discovered the importance of data analysis for understanding and solving real-life problems. Since then, as a self-taught person, she has not only worked as a freelancer for many companies around the world in the field of machine learning but also founded an artificial intelligence company that aims to optimize everyday processes. She is also the author of another Packt book, titled “Machine Learning Fundamentals”.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值