Statistical learning: the setting and the estimator object in scikit-learn
Datasets
Scikit-learn deals with learning information from one or more datasets that are represented as 2D arrays. They can be understood as a list of multi-dimensional observations. We say that the first axis of these arrays is the samples axis, while the second is the features axis.
A simple example shipped with the scikit: iris dataset
It is made of 150 observations of irises, each described by 4 features: their sepal and petal length and width, as detailed iniris.DESCR.
When the data is not initially in the (n_samples, n_features) shape, it needs to be preprocessed in order to be used by scikit-learn.
Estimators objects 预测模型
Fitting data: the main API implemented by scikit-learn is that of the estimator. An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.
All estimator objects expose a fit method that takes a dataset (usually a 2-d array):

7831

被折叠的 条评论
为什么被折叠?



