题目原文请点击查看



#matplotlib inline
import random
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import math
#Part 1
anscombe = sns.load_dataset("anscombe")
print("The mean of both x and y")
print(anscombe.groupby('dataset')['x', 'y'].mean())
print("\nThe variance of both x and y")
print(anscombe.groupby('dataset')['x', 'y'].var())
print("\nThe correlation coefficient between x and y")
print(anscombe.cov()['x']['y'] / (math.sqrt(anscombe['x'].var() * anscombe['y'].var())))
print("\nThe linear regression line: \n\t(hint: use statsmodels and look at the Statsmodels notebook)")
print(smf.ols('y ~ x', anscombe).fit().summary())
#Part 2
g = sns.FacetGrid(anscombe, col="dataset", hue="dataset", size=3)
g.map(plt.scatter, 'x', 'y')
plt.show()
Result:
Part 1:
The mean of both x and y
x y
dataset
I 9.0 7.500909
II 9.0 7.500909
III 9.0 7.500000
IV 9.0 7.500909
The variance of both x and y
x y
dataset
I 11.0 4.127269
II 11.0 4.127629
III 11.0 4.122620
IV 11.0 4.123249
The correlation coefficient between x and y
0.81636624276147
The linear regression line:
(hint: use statsmodels and look at the Statsmodels notebook)
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.666
Model: OLS Adj. R-squared: 0.659
Method: Least Squares F-statistic: 83.92
Date: Wed, 13 Jun 2018 Prob (F-statistic): 1.44e-11
Time: 17:44:49 Log-Likelihood: -67.358
No. Observations: 44 AIC: 138.7
Df Residuals: 42 BIC: 142.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0013 0.521 5.765 0.000 1.951 4.052
x 0.4999 0.055 9.161 0.000 0.390 0.610
==============================================================================
Omnibus: 1.513 Durbin-Watson: 2.327
Prob(Omnibus): 0.469 Jarque-Bera (JB): 0.896
Skew: 0.339 Prob(JB): 0.639
Kurtosis: 3.167 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Part 2:

本文通过使用Python的数据科学库,对Anscombe四组数据进行了详细的统计分析,包括均值、方差、相关系数及线性回归分析,并对四组数据进行了可视化展示。
1万+

被折叠的 条评论
为什么被折叠?



