《learning from data》读书笔记---第三章: The Linear Model

本文是《learning from data》第三章的读书笔记,主要探讨线性分类、线性回归和逻辑回归。介绍了线性分类器在非可分数据集上的处理,线性回归的最小二乘法,以及逻辑回归用于预测概率的方法。同时讨论了梯度下降在优化过程中的应用,以及非线性转换对模型的影响。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Chapter 3 :The Linear Model 

三大块:classification problem ,regression and probability estimation

3.1 Linear Classification 

二分类问题:利用线性分类器

 ①因为线性模型的dvc = d + 1,由前面 VC generalization bound 可得下列式子,由这个式子可以得出当N足够大时,Eout会接近Ein

 ②对于线性可分的数据集,第一章中提到的PLA可以完成分类,Ein = 0

①+② ,由 VC bound可以得出这个模型泛化效果很好(generalize well out of sample)

3.1.1 Non-Separable Data 

对于上面的两幅图,PLA都是无法停止的,因为不存在Ein=0的情况,且在运行PLA时会很不稳定,可能仅仅因为一次的更新导致模型由较好的性能变成很差的性能。 在图a中,用一条线进行分类看起来是可行的,但是我们需要 tolerate noise,最终选择 a hypothesis with a small Ein, not necessarily Ein = 0。对于图b,完全无法进行线性分类,需要 nonlinear transformation ,会在3.4节讲到。 为了找到一个hypothesis with the minimum Ein, 我们需要解决如下的优化问题。

但是因为最小化上式的Ein(w)是一个 NP-hard问题,所以我们退而求其次approximately minimizing Ein 。

 extend PLA :The pocket algorithm,将当前最好的w装在口袋里,直到遇到更好的,替换掉。pocket algorithm在原PLA的基础上多加了evaluate Ein(w(t + 1)的步骤,所以速度更慢,converge 的速度没有保障。但因为其简单性,适合快速上手。

练习

 3.2 Linear Regression 

银行决定是否给用户颁发信用卡问题:yn不再是简单的颁发和不颁发(±1),而是一个实数值。

3.2.1 The Algorithm 

主要思想:minimizing the squared error

earning Data Mining with Python - Second Edition by Robert Layton English | 4 May 2017 | ASIN: B01MRP7VFV | 358 Pages | AZW3 | 2.85 MB Key Features Use a wide variety of Python libraries for practical data mining purposes. Learn how to find, manipulate, analyze, and visualize data using Python. Step-by-step instructions on data mining techniques with Python that have real-world applications. Book Description This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations. What you will learn Apply data mining concepts to real-world problems Predict the outcome of sports matches based on past results Determine the author of a document based on their writing style Use APIs to download datasets from social media and other online services Find and extract good features from difficult datasets Create models that solve real-world problems Design and develop data mining applications using a variety of datasets Perform object detection in images using Deep Neural Networks Find meaningful insights from your data through intuitive visualizations Compute on big data, including real-time data from the internet About the Author Robert Layton is a data scientist working mainly on text mining problems for industries including the finance, information security, and transport sectors. He runs dataPipeline to build algorithms for practical use, and Eurekative, helping bringing start-ups to life in regional Australia. He has presented at the last four PyCon AU conferences, at multiple international research conferences, and has been training in some capacity for five years. He has a PhD in cybercrime analytics from the Internet Commerce Security Laboratory at Federation University Australia, where he was the Inaugural Young Alumni of the Year in 2014 and is currently and Honorary Research Fellow. You can find him on LinkedIn at https://www.linkedin.com/in/drrobertlayton and on Twitter at @robertlayton. Robert writes regularly on data mining and cybercrime, in a private, consultancy, and a research capacity. Robert is an Official Member of the Ballarat Hackerspace, where he helps grow the future-tech sector in regional Victoria.
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值