Statistics and Linear Algebra 4

本文介绍了一种使用简单线性回归预测葡萄酒质量的方法。通过计算斜率、截距,建立预测模型,并评估模型误差及标准误差。

1.The way to calculate the slope: the covariance of x and y divided by the variance of x

  from numpy import cov
  slope_density = cov(wine_quality["quality"],wine_quality["density"])[0,1]/wine_quality["density"].var() #cov(x,y) is the function from numpy, which returns a 2*2 metric,.var() is pandas function.

2.To get the intercept: b = y - ax( x and y are the mean value of each column)

  intercept_density = wine_quality["quality"].mean() - wine_quality["density"].mean() * (calc_slope(wine_quality["density"],wine_quality["quality"])) 

3. Making perdictions: accoding to the slope and intercept we get from the mean of the value from the dataset. We can get the predict model. Then we can get the predition array according to the model.

  def predict_quality(x):# define a function to calculate the preducted value from the model

    y = calc_slope(wine_quality["density"],wine_quality["quality"]) * x +     calc_intercept(wine_quality["density"],wine_quality["quality"],calc_slope(wine_quality["density"],wine_quality["quality"]))
    return y

  predicted_quality = wine_quality["density"].apply(predict_quality) 

4. Finding error: use the actrual data minus predicted data to get the error in order to evaluate the model(add up the sum of the squared residuals):

  wine_quality["predicted"] = wine_quality["density"]*slope + intercept
  wine_quality["predicted"] = (wine_quality["quality"] - wine_quality["predicted"]) **2
  rss = sum(wine_quality["predicted"].values)

 

5. Standard error: tries to make the easimate for the whole population(sum of squared residuals, divide by the number of y-points minus two, and then take the square root):

  standard_error = (rss / (len(predicted_y)-2))**(1/2) # get the standard error for the model
  result =np.asarray(wine_quality["quality"] - predicted_y)
  count_one = 0
  count_two = 0
  count_three = 0

  for ele in result:
    if abs(ele) <= standard_error:
      count_one += 1
    elif abs(ele) <= standard_error * 2:
      count_two += 1
    elif abs(ele) <= standard_error * 3:
      count_three += 1
  within_one = count_one/len(result) # Calculate what percentage of actual y values are within 1 standard error of the predicted y value
  within_two = (count_one+count_two)/len(result) #Calculate what percentage of actual y values are within 2 standard errors of the predicted y value
  within_three = (count_one+count_two+count_three)/len(result) #Calculate what percentage of actual y values are within 3 standard errors of the predicted y value

转载于:https://www.cnblogs.com/kingoscar/p/6124330.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值