正态分布高斯分布泊松分布
For detailed implementation in python check my GitHub repository.
有关在python中的详细实现,请查看我的GitHub存储库。
介绍 (Introduction)
Some machine learning model like linear and logistic regression assumes a Gaussian distribution or normal distribution. One of the first steps of statistical analysis of your data is therefore to check the distribution of the data.
某些机器学习模型(例如线性和逻辑回归)采用高斯分布或正态分布。 因此,对数据进行统计分析的第一步就是检查数据的分布。
The familiar bell curve shows a normal distribution.
熟悉的钟形曲线显示正态分布。

If your data has a Gaussian distribution, the machine learning methods are powerful and well understood.
如果您的数据具有高斯分布,则机器学习方法功能强大且易于理解。
Most of the data scientists claim they are getting more accurate results when they transform the predictor variables.
大多数数据科学家声称,他们在转换预测变量时会获得更准确的结果。
To transform data, you perform a mathematical operation on each observation, then use these transformed data in our model.
要转换数据,您需要对每个观测值执行数学运算,然后在我们的模型中使用这些转换后的数据。
为什么我们需要正态分布? (Why do we need a normal distribution?)
If a method that assumes a Gaussian distribution, and your data was drawn from a different distribution other then normal distribution, then the findings may be misleading or plain wrong.
如果采用假定高斯分布的方法,并且您的数据是从不同于正态分布的其他分布中提取的,则发现可能会产生误导或明显错误。
It is possible that your data does not look Gaussian or fails a normality test, but can be transformed to make it fit a Gaussian distribution.
您的数据看起来可能不是高斯或未通过正态性检验,但可以进行转换以使其适合高斯分布。
转换类型 (Type of transformation)
- Log Transformation 日志转换
- Reciprocal Transformation 相互转换
- Square-Root Transformation 平方根变换
- Cube root Transformation 立方根转换
- Exponential Transformation 指数变换
- Box-Cox Transformation Box-Cox转换
- Yeo-Johnson Transformation 杨约翰逊变换