Machine Learning with Scikit-Learn and Tensorflow 6.5 计算复杂度

本文探讨了使用决策树进行预测的时间复杂度为O(log₂m),与特征数量无关,因此即使面对庞大训练数据也能快速预测。同时分析了决策树训练的时间复杂度为O(nmlog₂m),并讨论了如何通过预排序等方法提高训练效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

书籍信息
Hands-On Machine Learning with Scikit-Learn and Tensorflow
出版社: O’Reilly Media, Inc, USA
平装: 566页
语种: 英语
ISBN: 1491962291
条形码: 9781491962299
商品尺寸: 18 x 2.9 x 23.3 cm
ASIN: 1491962291

系列博文为书籍中文翻译
代码以及数据下载:https://github.com/ageron/handson-ml

利用决策树进行预测时需要从根结点前进到叶结点。考虑到决策树通常基本是平衡的,利用决策树进行预测需要遍历的结点数量是 O(log2m) 。因为每个结点值需要检查特定特征,所以利用决策树进行预测的时间复杂度是 O(log2m) ,与特征数量无关。所以决策树的预测非常迅速,即使是面对庞大的训练数据。

然而,决策树需要在每个结点比较所有样本的所有特征,导致决策树的训练复杂度是 O(nmlog2m) 。在训练数据集较小时,scikit-learn可以通过对数据进行预排序加速训练(设置presort=True),但是,对于庞大的训练数据集,这样的措施会降低训练的速度。

译者注:
这里的n感觉应该是特征数量。
这里的m感觉应该是样本数量。
CART生长时,把所有特征内的值都作为分裂候选,并为其计算评价指标(信息增益/基尼不纯度),所以每层是 O(nm) log2m 层的树就是 O(nmlog2m)

When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search. Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you may want to use it. Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值