Turns Out…
We can see from the scores above that our Naive Bayes model actually does a pretty good job of classifying spam and “ham.” However, let’s take a look at a few additional models to see if we can’t improve anyway.
Specifically in this notebook, we will take a look at the following techniques:
Another really useful guide for ensemble methods can be found in the documentation here.
These ensemble methods use a combination of techniques you have seen throughout this lesson:
- Bootstrap the data passed through a learner (bagging).
- Subset the features used for a learner (combined with bagging signifies the two random components of random forests).
- Ensemble learners together in a way that allows those that perform best in certain areas to create the largest impact (boosting).
In this notebook, let’s get some practice with these methods, which will also help you get comfortable with the process used for performing supervised machine learning in Python in general.
Since you cleaned and vectorized the text in the previous notebook, this notebook can be focused on the fun part - the machine learning part.
This Process Looks Familiar…
In general, there is a five step process that can be used each time you want to use a supervised learning method (which you actually used above):
- Import the model.
- Instantiate the model with the hyperparameters of interest.
- Fit the model to the training data.
- Predict on the test data.
- Score the model by comparing the predictions to the actual values.
Follow the steps through this notebook to perform these steps using each of the ensemble methods: BaggingClassifier, RandomForestClassifier, and AdaBoostClassifier.
Step 1: First use the documentation to
importall three of the models.
本文探讨了使用集成学习方法提升垃圾邮件分类效果的技术,包括BaggingClassifier、RandomForestClassifier和AdaBoostClassifier等模型。文章详细介绍了这些方法如何通过组合多种技术提高分类精度,并提供了实践步骤。
340

被折叠的 条评论
为什么被折叠?



