回归系数依旧是4.425,但是参数检验中p值增大(但是依旧足够小,回归依旧高度显著)
所以X1和X2可能有很强的相关性,所以导致删除X2后模型依旧显著。
3.7
(1)直接对所有参数做线性回归:
进行逐步回归:
逐步回归得到的线性回归方程中,保留了x1,x2,x3.
但是x3不够显著,所以只对x1,x2进行回归。
所以最优的回归方程为:Y=53.00609+1.41589X1+0.65029X2
(2)p值为1.58e-9,95%置信区间运行结果:
(3)红圈为观测值,蓝点为预测值
(这里y=0指的是yi=0这个事件没有发生。prob列表示的是模型预测的大小。)
(2)逐步回归运行结果:
两个模型中概率的差值为:
可以注意到,两个模型对于训练集中的所有预测结果都是一样的,概率的相差也在0.1以内。
所以在本题中,逐步回归法效果较好,在减少变量的同时预测结果也保持良好的性质。这可能是由于x2,x3,x5与y之间并没有很大的影响。
4.1
4.6
使用协方差矩阵,运行结果如下
使用相关系数矩阵,运行结果如下:
4.11
运行结果:
公共因子F1,即PA3,主体为x1,x2,x5,x5,意义为年轻人对这个食物的爱好程度。
公共因子F2,即PA2,主体为x1-x5,意义为男性对这个食物的爱好程度。
公共因子F3,即图中PA1,主体为x6-x10,意义为女性对这个食物的爱好程度。
三个因子的累计贡献度为:93.43%
5.7
(1)距离判别:
回代法运行结果:
混淆矩阵为:
交叉确认法运行结果为:
混淆矩阵为:
误判率为2/50=0.04
(2)Bayes判别:
回代法:
混淆矩阵为:
误判率为0.02
交叉确认法运行结果为:
混淆矩阵为:
误判率为0.06
5.11
(1)回代法运行结果:
混淆矩阵为:
序号5、9、12.误判率为:3/150=0.02
(2)后验概率矩阵为:
5.13
利用klaR包中的greedy.wilks函数进行选择结果如下:
由于x1的p值相较于其他几个较大,所以我们在逐步判别中,选择c~x2+x3+x4
回代法:混淆矩阵:
误判率:0.02
交叉确认法:混淆矩阵:
误判率:0.033
6.3
6.6
谱系图结果如下:
6.8:
(1)
分为三类:最长距离法:3、6、11、16;10、15、19;剩余
类平均:19;3、10、11、15、16;剩余
(2)
分类结果:
6.10
(1)分类结果:
(2)分类结果:
(3)分类结果:
类平均聚类谱系图:
所以分成三类为:j1,j2为一类
j3,j4,j5,j6,j7,j8,j11为一类
j9,j10,j12为一类。
源代码:
3.4
#1
data3.4 <- read.table(file = “3.4.txt”,header= TRUE);
data3.4
lm.reg<-lm(v3~v1+v2, data=data3.4)
summary(lm.reg)
anova(lm.reg)
confint(lm.reg) #模型参数的置信区间
#2
exa<-data.frame(v1=5, v2=4)
lm.pred<-predict(lm.reg, exa, interval=“prediction”, level=0.99)
lm.pred
#3
lm.reg<-lm(v3~v1, data=data3.4)
summary(lm.reg)
anova(lm.reg)
confint(lm.reg) #模型参数的置信区间
3.7
#1
data3.7 <- read.table(“3.7.txt”,header=TRUE)
data3.7
reg3.7 <- lm(y~x1+x2+x3+x4, data=data3.7)
summary(reg3.7)
reg.Stepwise3.7 <-step(reg3.7,direction=“both”)
summary(reg.Stepwise3.7)
#2
reg3.7 <- lm(y~x1+x2, data=data3.7)
summary(reg3.7)
confint(reg3.7)
#3
lm.pred<-predict(reg3.7,data3.7[,-c(3,4,5)], interval=“prediction”, level=0.95)
lm.pred
dataplot3.7=data.frame(lm.pred,data3.7[,c(5)],c(1:13))
plot(dataplot3.2[,c(5,4)],col=2) #真实值点
points(dataplot3.2[,c(5,1)],pch=20,col=4) #预测值点
lines(dataplot3.2[,c(5,2)]) #下95%置信曲线
lines(dataplot3.2[,c(5,3)])
3.8
#1
data3.8 <- read.table(“3.8.txt”,header = TRUE);
data3.8
y
[
d
a
t
a
3.8
y[data3.8
y[data3.8y0]=2;
data3.8
y
[
d
a
t
a
3.8
y[data3.8
y[data3.8y1]=0;
data3.8
y
[
d
a
t
a
3.8
y[data3.8
y[data3.8y2]=1;
Logistic.Reg3.8 <- glm(y~x1+x2+x3+x4+x5, family = binomial(), data = data3.8)
summary(Logistic.Reg3.8)
data3.8
p
r
o
b
<
−
p
r
e
d
i
c
t
(
L
o
g
i
s
t
i
c
.
R
e
g
3.8
,
t
y
p
e
=
"
r
e
s
p
o
n
s
e
"
)
d
a
t
a
3.8
prob <- predict(Logistic.Reg3.8, type = "response") data3.8
prob<−predict(Logistic.Reg3.8,type="response")data3.8TrueorFalse=data3.8
y
=
=
(
d
a
t
a
3.8
y==(data3.8
y==(data3.8prob>0.5)*1
data3.8
#2
data3.8.stepwise <-data3.8
Logistic.Reg3.8.stepwise <- step(Logistic.Reg3.8,direction=“both”)
summary(Logistic.Reg3.8.stepwise)
data3.8.stepwise
p
r
o
b
<
−
p
r
e
d
i
c
t
(
L
o
g
i
s
t
i
c
.
R
e
g
3.8.
s
t
e
p
w
i
s
e
,
t
y
p
e
=
"
r
e
s
p
o
n
s
e
"
)
d
a
t
a
3.8.
s
t
e
p
w
i
s
e
prob <- predict(Logistic.Reg3.8.stepwise, type = "response") data3.8.stepwise
prob<−predict(Logistic.Reg3.8.stepwise,type="response")data3.8.stepwiseTrueorFalse=data3.8.stepwise
y
=
=
(
d
a
t
a
3.8.
s
t
e
p
w
i
s
e
y==(data3.8.stepwise
y==(data3.8.stepwiseprob>0.5)*1
data3.8.stepwise
data3.8.dif=data3.8-data3.8.stepwise
data3.8.dif
4.6
data4.6<-read.table(“4.6.txt”,header = TRUE);
data4.6
princomp4.6 <- princomp(data4.6,scores = T)#协方差矩阵
summary(princomp4.6,loadings=T)
princomp4.62 <- princomp(data4.6,scores = F)#相关系数矩阵
summary(princomp4.62,loadings=T)
4.11
data4.11 <-read.table(“4.11.txt”,header=TRUE);
data4.11
library(psych)
Factor4.11 <-
fa(data4.11[,-1], nfactors=3, SMC=FALSE,
covar=F,rotate=“none”,max.iter=1, fm=“pa”)
Factor4.11_rotate <-
fa(data4.11[,-1], nfactors=3, SMC=FALSE,
covar=F,rotate=“varimax”,max.iter=1, fm=“pa”,scores = T)
Factor4.11_rotateKaTeX parse error: Expected 'EOF', got '#' at position 26: … #̲因子得分系数 print(Fa…c[data5.7
t
y
p
e
=
=
"
y
1
"
]
=
1
d
a
t
a
5.7
type=="y1"]=1 data5.7
type=="y1"]=1data5.7c[data5.7
t
y
p
e
=
=
"
y
2
"
]
=
2
d
a
t
a
5.7
l
i
b
r
a
r
y
(
M
A
S
S
)
d
a
t
a
5.
7
d
i
s
c
r
i
m
<
−
l
d
a
(
c
.
,
d
a
t
a
5.7
[
,
−
1
]
,
p
r
i
o
r
=
c
(
1
,
1
)
/
2
)
d
a
t
a
5.
7
d
i
s
c
r
i
m
d
a
t
a
5.
7
p
r
e
d
i
c
t
<
−
p
r
e
d
i
c
t
(
d
a
t
a
5.
7
d
i
s
c
r
i
m
)
type=="y2"]=2 data5.7 library(MASS) data5.7_discrim <- lda(c~.,data5.7[,-1],prior=c(1,1)/2) data5.7_discrim data5.7_predict <- predict(data5.7_discrim)
type=="y2"]=2data5.7library(MASS)data5.7discrim<−lda(c .,data5.7[,−1],prior=c(1,1)/2)data5.7discrimdata5.7predict<−predict(data5.7discrim)c
data5.7_predict
#原始结果与判别结果比较:
(Compare=data.frame(年份=c(1952:2001),
原结果=data5.7
c
,
判
别
结
果
=
d
a
t
a
5.
7
p
r
e
d
i
c
t
,
正
确
与
否
=
c
(
d
a
t
a
5.7
c, 判别结果=data5.7_predict, 正确与否=c(data5.7
c,判别结果=data5.7predict,正确与否=c(data5.7cdata5.7_predict)))
table(data5.7_predict, data5.7KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 d…c,
判别结果=data5.7_discrim_cross
c
l
a
s
s
,
正
确
与
否
=
c
(
d
a
t
a
5.7
class, 正确与否=c(data5.7
class,正确与否=c(data5.7cdata5.7_discrim_cross
c
l
a
s
s
)
)
)
t
a
b
l
e
(
d
a
t
a
5.
7
d
i
s
c
r
i
m
c
r
o
s
s
class))) table(data5.7_discrim_cross
class)))table(data5.7discrimcrossclass, data5.7KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 #…c
data5.7_bayes_predict
(Compare=data.frame(年份=c(1952:2001),
原结果=data5.7
c
,
判
别
结
果
=
d
a
t
a
5.
7
b
a
y
e
s
p
r
e
d
i
c
t
,
正
确
与
否
=
c
(
d
a
t
a
5.7
c, 判别结果=data5.7_bayes_predict, 正确与否=c(data5.7
c,判别结果=data5.7bayespredict,正确与否=c(data5.7cdata5.7_bayes_predict)))
table(data5.7_bayes_predict, data5.7KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 d…c,
判别结果=data5.7_bayes_cross
c
l
a
s
s
,
正
确
与
否
=
c
(
d
a
t
a
5.7
class, 正确与否=c(data5.7
class,正确与否=c(data5.7cdata5.7_bayes_cross
c
l
a
s
s
)
)
)
t
a
b
l
e
(
d
a
t
a
5.
7
b
a
y
e
s
c
r
o
s
s
class))) table(data5.7_bayes_cross
class)))table(data5.7bayescrossclass, data5.7KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 5…c
data5.11_bayes_predict
(Compare=data.frame(序号=data5.11
n
u
m
,
原
结
果
=
d
a
t
a
5.11
num, 原结果=data5.11
num,原结果=data5.11c,
判别结果=data5.11_bayes_predict,
正确与否=c(data5.11
c
=
=
d
a
t
a
5.1
1
b
a
y
e
s
p
r
e
d
i
c
t
)
)
)
t
a
b
l
e
(
d
a
t
a
5.1
1
b
a
y
e
s
p
r
e
d
i
c
t
,
d
a
t
a
5.11
c==data5.11_bayes_predict))) table(data5.11_bayes_predict, data5.11
c==data5.11bayespredict)))table(data5.11bayespredict,data5.11c) #混淆矩阵:查看回代误判情况
data5.11_bayes_cross <- lda(c~.,data5.11[,-1],CV=T)
data5.11_bayes_cross
(Compare=data.frame(年份=data5.11
n
u
m
,
原
结
果
=
d
a
t
a
5.11
num, 原结果=data5.11
num,原结果=data5.11c,
判别结果=data5.11_bayes_cross
c
l
a
s
s
,
正
确
与
否
=
c
(
d
a
t
a
5.11
class, 正确与否=c(data5.11
class,正确与否=c(data5.11cdata5.11_bayes_cross
c
l
a
s
s
)
)
)
t
a
b
l
e
(
d
a
t
a
5.1
1
b
a
y
e
s
c
r
o
s
s
class))) table(data5.11_bayes_cross
class)))table(data5.11bayescrossclass, data5.11KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 d…posterior
5.13
data5.13 <- read.table(“5.11.txt”,header = TRUE)
data5.13
library(klaR)
Stepdiscrim <- greedy.wilks(c~.,data5.13[,-1])
Stepdiscrim
StepdiscrimKaTeX parse error: Expected 'EOF', got '#' at position 11: formula #̲查看选出的变量 #c ~ x3…c
data5.13_bayes_predict
(Compare=data.frame(序号=data5.13
n
u
m
,
原
结
果
=
d
a
t
a
5.13
num, 原结果=data5.13
num,原结果=data5.13c,
判别结果=data5.13_bayes_predict,
正确与否=c(data5.13
c
=
=
d
a
t
a
5.1
3
b
a
y
e
s
p
r
e
d
i
c
t
)
)
)
t
a
b
l
e
(
d
a
t
a
5.1
3
b
a
y
e
s
p
r
e
d
i
c
t
,
d
a
t
a
5.13
c==data5.13_bayes_predict))) table(data5.13_bayes_predict, data5.13
c==data5.13bayespredict)))table(data5.13bayespredict,data5.13c) #混淆矩阵:查看回代误判情况
data5.13_bayes_cross <- lda(c~.,data5.13[,-c(1,3)],CV=T)
data5.13_bayes_cross
(Compare=data.frame(年份=data5.13
n
u
m
,
原
结
果
=
d
a
t
a
5.13
num, 原结果=data5.13
num,原结果=data5.13c,
判别结果=data5.13_bayes_cross
c
l
a
s
s
,
正
确
与
否
=
c
(
d
a
t
a
5.13
class, 正确与否=c(data5.13
class,正确与否=c(data5.13cdata5.13_bayes_cross
c
l
a
s
s
)
)
)
t
a
b
l
e
(
d
a
t
a
5.1
3
b
a
y
e
s
c
r
o
s
s
class))) table(data5.13_bayes_cross
class)))table(data5.13bayescrossclass, data5.13KaTeX parse error: Expected 'EOF', got '#' at position 4: c) #̲混淆矩阵:查看回代误判情况 6…num<-c(1:5);
opar<-par(mfrow=c(1, 3))
plot(data6.6.cluster_1, hang=-1,labels=data6.6[,6]); #labels指定谱系图的聚类对象名称
plot(data6.6.cluster_2, hang=-1,labels=data6.6[,6]);
plot(data6.6.cluster_3, hang=-1,labels=data6.6[,6]);
par(opar)
6.8
#1
data6.8<-read.table(“6.8.txt”,header=TRUE);
data6.8KaTeX parse error: Expected 'EOF', got '#' at position 131: … ="complete") #̲最长距离法 data6.8.c…cluster))
opar<-par(mfrow=c(2,2))
plot(data6.8[,1],data6.8[,2],
xlab=“地区”,ylab = “x1”,col = data6.8.kmeans
c
l
u
s
t
e
r
,
p
c
h
=
a
s
.
c
h
a
r
a
c
t
e
r
(
d
a
t
a
6.8.
k
m
e
a
n
s
cluster, pch=as.character(data6.8.kmeans
cluster,pch=as.character(data6.8.kmeanscluster),
main=“Kmeans聚类”)
plot(data6.8[,2],data6.8[,3],
xlab=“x1”,ylab = “x2”,col = data6.8.kmeans
c
l
u
s
t
e
r
,
p
c
h
=
a
s
.
c
h
a
r
a
c
t
e
r
(
d
a
t
a
6.8.
k
m
e
a
n
s
cluster, pch=as.character(data6.8.kmeans
cluster,pch=as.character(data6.8.kmeanscluster),
main=“Kmeans聚类”)
points(data6.8.kmeansKaTeX parse error: Expected 'EOF', got '#' at position 42: … = 20, cex = 2)#̲画出各类中心点 #分类结果: …Area[data6.8.kmeans
c
l
u
s
t
e
r
=
=
1
]
d
a
t
a
6.8
cluster==1] data6.8
cluster==1]data6.8Area[data6.8.kmeans
c
l
u
s
t
e
r
=
=
2
]
d
a
t
a
6.8
cluster==2] data6.8
cluster==2]data6.8Area[data6.8.kmeansKaTeX parse error: Expected 'EOF', got '#' at position 18: …uster==3] 6.10 #̲1 data6.10<-rea…j[data6.10.kmeans
c
l
u
s
t
e
r
=
=
1
]
d
a
t
a
6.10
cluster==1] data6.10
cluster==1]data6.10j[data6.10.kmeans
c
l
u
s
t
e
r
=
=
2
]
d
a
t
a
6.10
cluster==2] data6.10
cluster==2]data6.10j[data6.10.kmeansKaTeX parse error: Expected 'EOF', got '#' at position 13: cluster==3] #̲2 library(clust…clustering
#分类结果
data6.10
j
[
d
a
t
a
6.10.
k
m
e
d
o
i
d
s
j[data6.10.kmedoids
j[data6.10.kmedoidsclustering1]
data6.10
j
[
d
a
t
a
6.10.
k
m
e
d
o
i
d
s
j[data6.10.kmedoids
j[data6.10.kmedoidsclustering2]
data6.10
j
[
d
a
t
a
6.10.
k
m
e
d
o
i
d
s
j[data6.10.kmedoids
j[data6.10.kmedoidsclustering3]
data6.10
j
[
d
a
t
a
6.10.
k
m
e
d
o
i
d
s
j[data6.10.kmedoids
j[data6.10.kmedoidsclustering==4]
#3
data6.10_dist<-dist(data6.10)
data6.10_dist
data6.10.cluster <- hclust(data6.10_dist, method =“ave”) #类平均距离法
plot(data6.10.cluster, hang=-1,labels=data6.10[,1]);