
Machine Learning
Leyooo
将工作之余学习到的一些零碎知识攒起来;
与大家分享一些个人见解
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Training a logistic regression model with scikit-learn
1. Since scikit-learn implements a highly optimized version of logistic regression that also supports multiclass settings off-the-shelf, we will skip the implementation and use the sklearn.linear_mod翻译 2017-04-08 22:17:52 · 509 阅读 · 0 评论 -
Leveraging weak learners via adaptive boosting(AdaBoost)
In this section about ensemble methods, we discuss boosting with a special focus on its most common implementation, AdaBoost 。 1. Via the base_estimator attribute, we train the AdaBoostClassifier翻译 2017-04-25 21:07:56 · 336 阅读 · 0 评论 -
Bagging – building an ensemble of classifers from bootstrap samples
1. Create a more complex classifcation problem using the Wine dataset import pandas as pd df_wine = pd.read_csv('./datasets/wine/wine.data', header=None) # https://archive.翻译 2017-04-24 21:55:08 · 369 阅读 · 0 评论 -
Evaluating and tuning the ensemble classifer
1. Compute the ROC curves from the test set to check if the MajorityVoteClassifier generalizes well to unseen data . from sklearn.metrics import roc_curve from sklearn.metrics import auc colors =翻译 2017-04-22 13:55:24 · 334 阅读 · 0 评论 -
Implementing a simple majority vote classifer
start with a warm-up exercise and implement a simple ensemble classifer for majority voting in Python. Although the following algorithm also generalizes to multi-class settings via plurality voting,翻译 2017-04-22 12:49:59 · 770 阅读 · 0 评论 -
Learning with ensembles
The goal behind ensemble methods is to combine different classifers into a meta-classifer that has a better generalization performance than each individual classifer alone。focus on the most popular翻译 2017-04-22 12:19:48 · 372 阅读 · 0 评论 -
Kernel principal component analysis in scikit-learn
For our convenience, scikit-learn implements a kernel PCA class in the sklearn.decomposition submodule. The usage is similar to the standard PCA class, and we can specify the kernel via the kernel p翻译 2017-04-16 16:44:03 · 455 阅读 · 0 评论 -
Projecting new data points
In this section, We learn how to project data points that were not part of the training dataset . 1. modify the rbf_kernel_pca function returns the eigenvalues of the kernel matrix: from sc翻译 2017-04-16 16:33:06 · 313 阅读 · 0 评论 -
Plotting a receiver operating characteristic(ROC)
Receiver operator characteristic (ROC) graphs are useful tools for selecting models for classifcation based on their performance with respect to the false positive and true positive rates, which are翻译 2017-04-19 21:00:37 · 349 阅读 · 0 评论 -
Using kernel principal component analysis for nonlinear mappings
Take a look at a kernelized version of PCA, or kernel PCA, which relates to the concepts of kernel SVM. Using kernel PCA, we will learn how to transform data that is not linearly separable onto a n翻译 2017-04-15 10:16:54 · 306 阅读 · 0 评论 -
Learning Best Practices for Model Evaluation and Hyperparameter Tuning
1. Streamlining workflows with pipelines 1.1 Loading the Breast Cancer Wisconsin dataset from distutils.version import LooseVersion as Version from sklearn import __version__ as sklearn_vers翻译 2017-04-18 22:41:51 · 554 阅读 · 0 评论 -
Supervised data compression via LDA
Linear Discriminant Analysis (LDA) can be used as a technique for feature extraction to increase the computational effciency and reduce the degree of over-ftting due to the curse of dimensionality i翻译 2017-04-12 21:36:11 · 450 阅读 · 0 评论 -
Unsupervised dimensionality reduction via PCA
1. Total and explained variance 1.1 process the Wine data into separate training and test sets import pandas as pd from sklearn.model_selection import train_test_split from sklearn.prep翻译 2017-04-11 22:57:03 · 630 阅读 · 0 评论 -
Sparse solutions with L1 regularization
1. Common solutions to reduce the generalization error are listed as follows : • Collect more training data • Introduce a penalty for complexity via regularization【L1、 L2】 • Choose a simple翻译 2017-04-10 22:37:59 · 454 阅读 · 0 评论 -
K-nearest neighbors(KNN) learning algorithm
1. Call function from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski') knn.fit(X_train_std, y_train) 2. Plt result plot_de翻译 2017-04-09 16:15:16 · 369 阅读 · 0 评论 -
Decision tree learning
1. Building a decision tree from sklearn.tree import DecisionTreeClassifier tree = DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0) tree.fit(X_train, y_train) X_combined翻译 2017-04-09 12:46:23 · 427 阅读 · 0 评论 -
Solving non-linear problems using a kernel SVM
1. Create a simple dataset that has the form of an XOR gate using the logical_xor function from NumPy . np.random.seed(0) X_xor = np.random.randn(200, 2) y_xor = np.logical_xor(X_xor[:, 0] > 0,翻译 2017-04-09 12:21:14 · 335 阅读 · 0 评论 -
Dealing with the nonlinearly separable case using slack variables
1. Train a SVM model : from sklearn.svm import SVC svm = SVC(kernel='linear', C=1.0, random_state=0) svm.fit(X_train_std, y_train) plot_decision_regions(X_combined_std, y_combined, classifier=svm翻译 2017-04-09 11:04:18 · 330 阅读 · 0 评论 -
Applying Machine Learning to Sentiment Analysis
1. Obtaining the IMDb movie review dataset : A compressed archive of the movie review dataset ---- http://ai.stanford.edu/~amaas/data/sentiment/ import pandas as pd df = pd.read_csv('./datasets翻译 2017-04-27 19:39:44 · 530 阅读 · 0 评论