deeplearning—book—整理——ml

最新推荐文章于 2024-08-18 10:25:23 发布

原创最新推荐文章于 2024-08-18 10:25:23 发布 · 405 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #统计学 #deeplearning

读书笔记同时被 2 个专栏收录

30 篇文章

订阅专栏

深度学习-理论

18 篇文章

订阅专栏

本文探讨了机器学习从统计学角度的本质，介绍了两种主要的统计学方法：频率主义估计和贝叶斯推断，并讨论了设计训练数据集的方法、映射函数与仿射函数的区别，以及数据采集过程中训练集与测试集误差的关系。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、看到的对机器学习本质的一种解释：

--Machine learning isessentially a form of applied statistics with increased emphasis on the use ofcomputers to statistically estimate complicated functions and a decreased emphasison proving conﬁdence intervals around these functions

2、机器学习的统计学的两种方法：

--present the two central approaches to statistics: frequentist estimators and Bayesian inference.

3、如何设计训练数据集(图片数据集除外)：

--One common way of describing a dataset is with a design matrix. A design matrix is a matrix containing a diﬀerent example in each row. Each column of thematrix corresponds to a diﬀerent feature.

4、区分映射函数和放射函数：

--the mapping from parameters to predictions is still a linear function but themapping from features to predictions is now an aﬃne function.

5、数据采集与测试集训练集误差的联系：

--Suppose we have a probability distributionp(x, y) and we sample from it repeatedly to generate the train set and the testset. For some ﬁxed valuew, the expected training set error is exactly the same asthe expected test set error, because both expectations are formed using the samedataset sampling process. The only diﬀerence between the two conditions is thename we assign to the dataset we sample.

--Of course, when we use a machine learning algorithm, we do not ﬁx theparameters ahead of time, then sample both datasets. We sample the training set,then use it to choose the parameters to reduce training set error, then sample thetest set.