此内容在sklearn官网地址:http://scikit-learn.org/stable/modules/preprocessing.html#
sklearn版本:0.18.2
Binarization
Feature binarization is the process of thresholding numerical features to get boolean values. This can be useful for downstream probabilistic estimators that make assumption that the input data is distributed according to a multi-variate Bernoulli distribution. For instance, this is the case for the sklearn.neural_network.BernoulliRBM. ——scikit-learn.org
根据设定的阈值将连续的变量离散化,转化成0、1。具有以下优点:
- 可以用稀疏矩阵表示,节省存储空间,加快计算速度。
- 可以有效处理 miss_value(NA)
**稀疏矩阵(sparse matrix)&#