提升算法的sklearn-kit的API

最新推荐文章于 2024-01-09 17:47:00 发布

火星种萝卜

最新推荐文章于 2024-01-09 17:47:00 发布

阅读量152

点赞数

分类专栏： udacity学城

原文链接：https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html#sklearn.ensemble.BaggingClassifier

版权

udacity学城专栏收录该内容

54 篇文章

订阅专栏

Turns Out…

We can see from the scores above that our Naive Bayes model actually does a pretty good job of classifying spam and “ham.” However, let’s take a look at a few additional models to see if we can’t improve anyway.

Specifically in this notebook, we will take a look at the following techniques:

Another really useful guide for ensemble methods can be found in the documentation here.

These ensemble methods use a combination of techniques you have seen throughout this lesson:

Bootstrap the data passed through a learner (bagging).
Subset the features used for a learner (combined with bagging signifies the two random components of random forests).
Ensemble learners together in a way that allows those that perform best in certain areas to create the largest impact (boosting).

In this notebook, let’s get some practice with these methods, which will also help you get comfortable with the process used for performing supervised machine learning in Python in general.

Since you cleaned and vectorized the text in the previous notebook, this notebook can be focused on the fun part - the machine learning part.

This Process Looks Familiar…

In general, there is a five step process that can be used each time you want to use a supervised learning method (which you actually used above):

Import the model.
Instantiate the model with the hyperparameters of interest.
Fit the model to the training data.
Predict on the test data.
Score the model by comparing the predictions to the actual values.

Follow the steps through this notebook to perform these steps using each of the ensemble methods: BaggingClassifier, RandomForestClassifier, and AdaBoostClassifier.