Statistics与Machine Learning有什么区别

本文探讨了统计学与机器学习之间的区别与联系。虽然两者都关注如何从数据中学习,但在历史和社会因素的影响下,它们的关注点有所不同。统计学更侧重于低维问题中的正式统计推断,而机器学习则更多地关注高维预测问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

转统计大牛沃塞曼的一篇博文。

 

 

Statistics Versus Machine Learning

——Larry Wasserman, posted on June 12, 2012 at 7:46 pm

 

Welcome to my blog, which will discuss topics in Statistics and Machine Learning. Some posts will be technical  and others will be non-technical. Since this blog is about topics in both Statistics and Machine Learning, perhaps I should address the question: What is the difference between these two fields?

The short answer is: None. They are both concerned with the same question: how do we learn from data?

But a more nuanced view reveals that there are differences due to historical and sociological reasons. Statistics is an older field than Machine Learning (but young compared to Math, Physics etc). Thus, ideas about collecting and analyzing data in Statistics are rooted in the times before computers even existed. Of course, the field has adapted as times have changed but history matters and the result is that the way Statisticians think, teach, approach problems and choose research topics is often different than their colleagues in Machine Learning. I am fortunate to be at an institution (Carnegie Mellon) which is active in both (and I have appointments in both departments) so I get to see the similarities and differences.

If I had to summarize the main difference between the two fields I would say:

Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.

Machine Learning emphasizes high dimensional prediction problems.

But this is a gross over-simplification. Perhaps it is better to list some topics that receive more attention from one field rather than the other. For example:

Statistics: survival analysis, spatial analysis, multiple testing, minimax theory, deconvolution, semiparametric inference, bootstrapping, time series.

Machine Learning: online learning, semisupervised learning, manifold learning, active learning, boosting.

But the differences become blurrier all the time. Check out two flagship journals:

The Annals of Statistics and The Journal of Machine Learning Research.

The overlap in topics is striking. And many topics get started in one field and then are developed further in the other. For example, Reproducing Kernel Hilbert Space (RKHS) methods are hot in Machine Learning but they began in Statistics (thanks to Manny Parzen and Grace Wahba). Similarly, much of online learning has its roots in the work of the statisticians David Blackwell and Jim Hannan. And of course there are topics that are highly active in both areas such as concentration of measure, sparsity and convex optimization. There are also differences in terminology. Here are some examples:

Statistics       Machine Learning

———————————–————–

Estimation     Learning

Classifier       Hypothesis

Data point     Example/Instance

Regression    Supervised Learning

Classification  Supervised Learning

Covariate      Feature

Response      Label

 

and of course:

Statisticians use R.

Machine Learners use Matlab.

 

Overall, the the two fields are blending together more and more and I think this is a good thing.

 

另外两篇:

from: http://normaldeviate.wordpress.com/2012/06/12/statistics-versus-machine-learning-5-2/

转载于:https://www.cnblogs.com/nn0p/archive/2012/11/14/2770668.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值