Explore Statistics with R (EDX)
WEEK3-Visualizing hypothesis tests in R 视频笔记
1. #one tailed test
#rare in health sciences, more common in industrial process control
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
abline(v=qnorm(0.95,80,10)) #vertical

左边红的 null hypothesis (H0),右边蓝的 alternative hypothesis (H1),灰的竖线 cut off (临界)
这里是第一类错误(type I error, alpha error),弃真
2. #two tailed test
#common in the health sciences, because in most cases, both
#an increase or a decrease in a variable would affect health
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
abline(v=qnorm(0.025,80,10))
abline(v=qnorm(0.975,80,10))
#colour the rejection area, also referred to as alpha.
#If the p-value is equal to or lower than alpha - reject the null hypothesis
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x1 <- c((round(qnorm(0.975, 80, 10))),seq((round(qnorm(0.975, 80, 10))), 120,1),120)
cord.y1 <- c(0,dnorm(seq((round(qnorm(0.975, 80, 10))), 120, 1), 80, 10),0)
polygon(cord.x1,cord.y1,col='red')
cord.x2 <- c(50,seq(50,round(qnorm(0.025, 80, 10),1)),round(qnorm(0.025, 80, 10)))
cord.y2 <- c(0,dnorm(seq(50,round(qnorm(0.025, 80, 10),1)), 80, 10),0)
polygon(cord.x2,cord.y2,col='red')

#Imagine that the alternative hypothesis were true.
#Beta is the risk that you will keep (=not reject) the false null hypothesis
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x2<- c(0,seq((round(1-qnorm(0.025,110,10))),100,1),100)
cord.y2 <- c(0,dnorm(seq((round(1-qnorm(0.025, 110, 10))), 100, 1), 110, 10),0)
polygon(cord.x2,cord.y2,col='red')
abline(v=round(qnorm(0.975, 80, 10, lower.tail=T)))
abline(v=round(qnorm(0.025, 80, 10, lower.tail=T)))
text(95,0.005, "β",xpd=5)

这里就是第二类错误了,取伪
#Statistical power, 1-beta, is the probability to reject a false null hypothesis
x<- seq(50,140,length=200)
y1<- dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2<- dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x2<- c(0,seq((round(1-qnorm(0.025,110,10))),100,1),100)
cord.y2 <- c(0,dnorm(seq((round(1-qnorm(0.025, 110, 10))), 100, 1), 110, 10),0)
polygon(cord.x2,cord.y2,col='red')
abline(v=round(qnorm(0.975, 80, 10, lower.tail=T)))
abline(v=round(qnorm(0.025, 80, 10, lower.tail=T)))
cord.x1 <- c(100,seq(round(qnorm(0.975, 80, 10, lower.tail=T)), 140,1),140)
cord.y1 <- c(0,dnorm(seq(round(qnorm(0.975, 80, 10, lower.tail=T)),140, 1), 110, 10),0)
polygon(cord.x1,cord.y1,col='6')
text(95,0.005, "β",xpd=5)
text(115,0.005, "1-β",xpd=5)
WEEK3-t-tests in R 视频笔记
1. 说明文档:Student's t-Test
## Default S3 method:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
## S3 method for class 'formula'
t.test(formula, data, subset, na.action, ...)
2. 数据
A | B |
97 | 76 |
98 | 87 |
78 | 89 |
80 | 90 |
81 | 94 |
84 | 96 |
85 | 98 |
85 | 99 |
86 | 100 |
94 | 106 |
100 | 109 |
102 | 112 |
103 | 113 |
104 | 105 |
3. 一些 t检验
由于一直狂按快进看视频,今天突然才明白剪贴板读数据这件事:就是选中一部分数据,复制,然后读,这样就行了。
> weight <- read.table("clipboard", header=T)
> t.test(weight)
One Sample t-test
data: weight
t = 48.2568, df = 27, p-value < 2.2e-16 #degrees of freedom = 27 (28 measurements)
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
90.65293 98.70421
sample estimates:
mean of x
94.67857
> t.test(weight$A, weight$B)
Welch Two Sample t-test
data: weight$A and weight$B
t = -1.8423, df = 25.684, p-value = 0.077 #compensated degrees of freedom = 25.684
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.6635326 0.8063898
sample estimates:
mean of x mean of y
91.21429 98.14286
> t.test(weight$A, weight$B, var.equal=T)
Two Sample t-test
data: weight$A and weight$B
t = -1.8423, df = 26, p-value = 0.07686 #28 - 1(per group)
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.6589062 0.8017634
sample estimates:
mean of x mean of y
91.21429 98.14286
> t.test(weight$A, weight$B, paired=T)
Paired t-test
data: weight$A and weight$B
t = -2.4884, df = 13, p-value = 0.02717
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.943697 -0.913446
sample estimates:
mean of the differences
-6.928571
4. 例子
#A function for creating random examples of t-tests
ttest.for.examination <- function(x,y,z,k)
{
subjects <- x
mean1 <- y
mean2 <- z
standarddev <- k
print( c("Number of measurements: ", x))
print( c("Mean of group 1: ", y))
print( c( "Mean of group 2: ",z))
print( c("Standard deviation: ", k))
group1 <- round(rnorm(x, y, k))
group2 <- round(rnorm(x, z, k))
framedata <- cbind(group1, group2)
print(framedata)
print( list (t.test(group1, group2, var.equal = T), t.test(group1, group2, var.equal = T, paired = T)))
}
> ttest.for.examination(13,90,105,10)
[1] "Number of measurements: " "13"
[1] "Mean of group 1: " "90"
[1] "Mean of group 2: " "105"
[1] "Standard deviation: " "10"
group1 group2
[1,] 80 110
[2,] 101 108
[3,] 92 109
[4,] 107 114
[5,] 76 125
[6,] 80 107
[7,] 81 103
[8,] 96 97
[9,] 88 113
[10,] 107 118
[11,] 110 110
[12,] 101 107
[13,] 75 95
[[1]]
Two Sample t-test
data: group1 and group2
t = -4.115, df = 24, p-value = 0.0003939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-25.641951 -8.511895
sample estimates:
mean of x mean of y
91.84615 108.92308
[[2]]
Paired t-test
data: group1 and group2
t = -4.4543, df = 12, p-value = 0.000787
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-25.430113 -8.723733
sample estimates:
mean of the differences
-17.07692
下面这个P值比较大。
> ttest.for.examination(13,92,100,10)
[1] "Number of measurements: " "13"
[1] "Mean of group 1: " "92"
[1] "Mean of group 2: " "100"
[1] "Standard deviation: " "10"
group1 group2
[1,] 102 95
[2,] 99 97
[3,] 76 107
[4,] 102 113
[5,] 97 111
[6,] 88 93
[7,] 95 85
[8,] 103 80
[9,] 86 102
[10,] 85 106
[11,] 96 112
[12,] 109 105
[13,] 94 103
[[1]]
Two Sample t-test
data: group1 and group2
t = -1.5669, df = 24, p-value = 0.1302
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-13.724765 1.878611
sample estimates:
mean of x mean of y
94.76923 100.69231
[[2]]
Paired t-test
data: group1 and group2
t = -1.4568, df = 12, p-value = 0.1708
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.781916 2.935762
sample estimates:
mean of the differences
-5.923077