应用预测建模第四章过度拟合与模型调优习题4.3【运用一倍标准差法、容忍度选择简洁模型】

本文探讨了偏最小二乘法(PLS)在化工生产过程产量预测中的应用,通过一倍标准差法确定最简洁的模型,并计算容忍度取值以平衡模型复杂度与性能。对比多种模型,如随机森林和SVM,综合考虑R^2、预测时间和模型复杂度,选择了最佳模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

《应用预测建模》Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译

第四章 过度拟合与模型调优

4.3 偏最小二乘法( 6. 3 节)可以对1. 4 节中化工生产过程的产量进行建模。相关数据包含在AppliedPredictiveModeling软件包中,载入入数据的代码如下:
> library(AppliedPredictiveModeling)
> data(ChemicalManufacturingProcess)
分析的目的是找到能使得R^2达到最优的PLS 成分数(5. 1节)。使用重复10 折交叉验证评价了含有1到10 个成分的PLS 模型,相应的结果见如下表格:

( a )根据“一倍标准差”法,多少个PLS 成分能给出最简洁的模型?
( b )对于这个例子计算容忍度取值。如果R^2下降10% 是可以接受的,那么最佳的PLS成分数是多少?
( c )其他一些复杂度不同的模型(在本书第二部分讨论)经过训练和调优后最终结果见图4-13 。如果目的是选择使得R^2达到最优的模型,那么应该选哪个模型?为什么? 

图4-13 估计的模型表现(横轴)和预测500 000 个新佯本点所需的时间(纵轴),这里使用的是化工生产的数据 

( d )计算时间和模型复杂度(4.8节)是在选择模型时还要考虑的因素。给定每个模型的预测时间、模型复杂度和R^2的估计,你会选择哪个模型,为什么?



( a )根据“一倍标准差”法,多少个PLS 成分能给出最简洁的模型?

一倍标准差法(P54):

如下表所示,最优的 R^2取值对应的PLS成分数为4,其对应的R^2均值为0.545,标准差为0.0308。一倍标准差原则能够选择R^2不低于0.545-0.0308=0.5142的模型,因此在这个范围内最简单的模型为成分数为3的模型。

即,3个PLS 成分能给出最简洁的模型。



( b )对于这个例子计算容忍度取值。如果R^2下降10% 是可以接受的,那么最佳的PLS成分数是多少?

可容忍范围(P54):

用公式(X-O)/O进行计算,得下表: 

成分数均值标准差可容忍范围
10.4440.027-18.53%
20.5000.030-8.26%
30.5330.030-2.20%
40.5450.0310.00%
50.5420.032-0.55%
60.5370.033-1.47%
70.5340.033-2.02%
80.5340.033-2.02%
90.5200.033-4.59%
100.5070.032-6.97%

 如果R^2下降10% 是可以接受的,那么最佳的PLS成分数为2。



( c )其他一些复杂度不同的模型(在本书第二部分讨论)经过训练和调优后最终结果见图4-13 。如果目的是选择使得R^2达到最优的模型,那么应该选哪个模型?为什么? 

图4-13 估计的模型表现(横轴)和预测500 000 个新佯本点所需的时间(纵轴),这里使用的是化工生产的数据 

 

如果目的是选择使得R^2达到最优的模型,那么选随机森林random forests,因为其R^2最大(虽然从图上来看随机森林的R^2似乎并没有在统计上大于支持向量机的R^2)。



( d )计算时间和模型复杂度(4.8节)是在选择模型时还要考虑的因素。给定每个模型的预测时间、模型复杂度和R^2的估计,你会选择哪个模型,为什么?

如果除了R^2,同时考虑时间和模型复杂度,那么选择支持向量机SVM。因为支持向量机的R^2与随机森林相似,但是运行时间上支持向量机短很多。

About This Book This jam-packed book takes you under the hood with step by step instructions using the popular and free R predictive analytics package. It provides numerous examples, illustrations and exclusive use of real data to help you leverage the power of predictive analytics. A b o ok for every data analyst, student and applied researcher. Here is what it can do for you: • BOOST PRODUCTIVITY: Bestselling author and data scientist Dr. N.D. Lewis will show you how to build predictive analytic models in less time than you ever imagined possible! Even if you’re a busy professional or a student with little time. By spending as little as 10 minutes a day working through the dozens of real world examples, illustrations, practitioner tips and notes, you’ll be able to make giant leaps forward in your knowledge, strengthen your business performance, broaden your skill-set and improve your understanding. • SIMPLIFY ANALYSIS: You will discover over 90 easy to follow applied predictive analytic techniques that can instantly expand your modeling capability. Plus you’ll discover simple routines that serve as a check list you repeat next time you need a specific model. Even better, you’ll discover practitioner tips, work with real data and receive suggestions that will speed up your progress. So even if you’re completely stressed out by data, you’ll still find in this book tips, suggestions and helpful advice that will ease your journey through the data science maze. • SAVE TIME: Imagine having at your fingertips easy access to the very best of predictive analytics. In this book, you’ll learn fast effective ways to build powerful models using R. It contains over 90 of the most successful models used for learning from data; with step by step instructions on how to build them easily and quickly. • LEARN FASTER: 92 Applied Predictive Modeling Techniques in R offers a practical results orientated approach that will boost your productivity, expand your knowledge and create new and exciting opportunities for you to get the very best from your data. The book works because you eliminate the anxiety of trying to master every single mathematical detail. Instead your goal at each step is to simply focus on a single routine using real data that only takes about 5 to 15 minutes to complete. Within this routine is a series of actions by which the predictive analytic model is constructed. All you have to do is follow the steps. They are your checklist for use and reuse. • IMPROVE RESULTS: Want to improve your predictive analytic results, but don’t have enough time? Right now there are a dozen ways to instantly improve your predictive models performance. Odds are, these techniques will only take a few minutes apiece to complete. The problem? You might feel like there’s not enough time to learn how to do them all. The solution is in your hands. It uses R, which is free, open-source, and extremely powerful software. In this rich, fascinating—surprisingly accessible—guide, data scientist Dr. N.D. Lewis reveals how predictive analytics works, and how to deploy its power using the free and widely available R predictive analytics package. The book serves practitioners and experts alike by covering real life case studies and the latest state-of-the-art techniques. Everything you need to get started is contained within this book. Here is some of what is included: • Support Vector Machines • Relevance Vector Machines • Neural networks • Random forests • Random ferns • Classical Boosting • Model based boosting • Decision trees • Cluster Analysis For people interested in statistics, machine learning, data analysis, data mining, and future hands-on practitioners seeking a career in the field, it sets a strong foundation, delivers the prerequisite knowledge, and whets your appetite for more. Buy the book today. Your next big breakthrough using predictive analytics is only a page away!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值