ISLR;R语言; 机器学习 ;线性回归
一些专业词汇只知道英语的,中文可能不标准,请轻喷
8.利用简单的线性回归处理Auto数据集
library(MASS)
library(ISLR)
library(car)
Auto=read.csv("Auto.csv",header=T,na.strings="?")
Auto=na.omit(Auto)
attach(Auto)
summary(Auto)
输出结果:
mpg cylinders displacement horsepower
Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0
1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0
Median :22.75 Median :4.000 Median :151.0 Median : 93.5
Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0
Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0
weight acceleration year origin
Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000
1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000
Median :2804 Median :15.50 Median :76.00 Median :1.000
Mean :2978 Mean :15.54 Mean :75.98 Mean :1.577
3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000
Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
name
amc matador : 5
ford pinto : 5
toyota corolla : 5
amc gremlin : 4
amc hornet : 4
chevrolet chevette: 4
(Other) :365
线性回归:
lm.fit=lm(mpg~horsepower)
summary(lm.fit)
输出结果:
Call:
lm(formula = mpg ~ horsepower)
Residuals:
Min 1Q Median 3Q Max
-13.5710 -3.2592 -0.3435 2.7630 16.9240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.935861 0.717499 55.66 <2e-16 ***
horsepower -0.157845 0.006446 -24.49 <2e-16 ***
---
Signif. codes: 0 ‘\*\*\*’ 0.001 ‘\*\*’ 0.01 ‘\*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
a)
- 零假设 H 0:βhorsepower=0,假设horsepower与mpg不相关。
由于F-statistic值远大于1,p值接近于0,拒绝原假设,则horsepower和mpg具有统计显著关系。 - mpg的平均值为23.45,线性回归的RSE为4.906,有20.9248%的相对误差。R-squared为0.6059,说明60.5948%的mpg可以被horsepower解释。
- 线性回归系数小于零,说明mpg与horsepower之间的关系是消极的。
预测mpg
predict(lm.fit,data.frame(mpg=c(98)),interval="prediction") Warning message: 'newdata'必需有1行 但变量里有392行
修改办法:
predictor=mpg
response=horsepower
lm.fit2=lm(predictor~response)
predict(lm.fit2,data.frame(response=c(98)),interval="confidence")
fit lwr upr
1 24.47 23.97 24.96
predict(lm.fit2,data.frame(response=c(98)),interval="prediction")
fit lwr upr
1 24.46708 14.8094 34.12476
b)绘制mpg与horsepower散点图和最小二乘直线
plot(response,predictor)