Statistics and Linear Algebra 4

最新推荐文章于 2025-09-06 09:26:50 发布

weixin_30443895

最新推荐文章于 2025-09-06 09:26:50 发布

阅读量99

点赞数

CC 4.0 BY-SA版权

文章标签： python

原文链接：http://www.cnblogs.com/kingoscar/p/6124330.html

本文介绍了一种使用简单线性回归预测葡萄酒质量的方法。通过计算斜率、截距，建立预测模型，并评估模型误差及标准误差。

1.The way to calculate the slope: the covariance of x and y divided by the variance of x

　　from numpy import cov
　　slope_density = cov(wine_quality["quality"],wine_quality["density"])[0,1]/wine_quality["density"].var() #cov(x,y) is the function from numpy, which returns a 2*2 metric,.var() is pandas function.

2.To get the intercept: b = y - ax( x and y are the mean value of each column)

　　intercept_density = wine_quality["quality"].mean() - wine_quality["density"].mean() * (calc_slope(wine_quality["density"],wine_quality["quality"]))

3. Making perdictions: accoding to the slope and intercept we get from the mean of the value from the dataset. We can get the predict model. Then we can get the predition array according to the model.

　　def predict_quality(x):# define a function to calculate the preducted value from the model

　　　　y = calc_slope(wine_quality["density"],wine_quality["quality"]) * x + 　　　　calc_intercept(wine_quality["density"],wine_quality["quality"],calc_slope(wine_quality["density"],wine_quality["quality"]))
　　　　return y

　　predicted_quality = wine_quality["density"].apply(predict_quality)

4. Finding error: use the actrual data minus predicted data to get the error in order to evaluate the model(add up the sum of the squared residuals):

　　wine_quality["predicted"] = wine_quality["density"]*slope + intercept
　　wine_quality["predicted"] = (wine_quality["quality"] - wine_quality["predicted"]) **2
　　rss = sum(wine_quality["predicted"].values)

5. Standard error: tries to make the easimate for the whole population(sum of squared residuals, divide by the number of y-points minus two, and then take the square root):

　　standard_error = (rss / (len(predicted_y)-2))**(1/2) # get the standard error for the model
　　result =np.asarray(wine_quality["quality"] - predicted_y)
　　count_one = 0
　　count_two = 0
　　count_three = 0

　　for ele in result:
　　　　if abs(ele) <= standard_error:
　　　　　　count_one += 1
　　　　elif abs(ele) <= standard_error * 2:
　　　　　　count_two += 1
　　　　elif abs(ele) <= standard_error * 3:
　　　　　　count_three += 1
　　within_one = count_one/len(result) # Calculate what percentage of actual y values are within 1 standard error of the predicted y value
　　within_two = (count_one+count_two)/len(result) #Calculate what percentage of actual y values are within 2 standard errors of the predicted y value
　　within_three = (count_one+count_two+count_three)/len(result) #Calculate what percentage of actual y values are within 3 standard errors of the predicted y value