WEEK3-Visualizing hypothesis tests in R, t-tests in R

最新推荐文章于 2025-01-22 15:52:23 发布

Katfly

最新推荐文章于 2025-01-22 15:52:23 发布

阅读量1.2k

点赞数

CC 4.0 BY-SA版权

分类专栏： R 文章标签： R hypothesis test

本文链接：https://blog.youkuaiyun.com/Katfly/article/details/39576651

R 专栏收录该内容

5 篇文章

订阅专栏

这篇博客是关于在R中可视化假设检验，特别是t测试的视频笔记。涵盖了单尾和双尾测试，解释了第一类错误（类型I错误，即alpha错误）和第二类错误（类型II错误）。此外，还介绍了R中的Student's t-Test，包括默认的S3方法和公式方法，并强调了如何从剪贴板读取数据进行t检验。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Explore Statistics with R (EDX)

WEEK3-Visualizing hypothesis tests in R 视频笔记

1. #one tailed test

#rare in health sciences, more common in industrial process control
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
abline(v=qnorm(0.95,80,10)) #vertical

左边红的 null hypothesis （H0），右边蓝的 alternative hypothesis （H1），灰的竖线 cut off （临界）

这里是第一类错误（type I error, alpha error），弃真

2. #two tailed test

#common in the health sciences, because in most cases, both
#an increase or a decrease in a variable would affect health
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
abline(v=qnorm(0.025,80,10))
abline(v=qnorm(0.975,80,10))

#colour the rejection area, also referred to as alpha.

#If the p-value is equal to or lower than alpha - reject the null hypothesis
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x1 <- c((round(qnorm(0.975, 80, 10))),seq((round(qnorm(0.975, 80, 10))), 120,1),120) 
cord.y1 <- c(0,dnorm(seq((round(qnorm(0.975, 80, 10))), 120, 1), 80, 10),0) 
polygon(cord.x1,cord.y1,col='red')
cord.x2 <- c(50,seq(50,round(qnorm(0.025, 80, 10),1)),round(qnorm(0.025, 80, 10))) 
cord.y2 <- c(0,dnorm(seq(50,round(qnorm(0.025, 80, 10),1)), 80, 10),0) 
polygon(cord.x2,cord.y2,col='red')

#Imagine that the alternative hypothesis were true.

#Beta is the risk that you will keep (=not reject) the false null hypothesis
x=seq(50,140,length=200)
y1=dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2=dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x2<- c(0,seq((round(1-qnorm(0.025,110,10))),100,1),100)
cord.y2 <- c(0,dnorm(seq((round(1-qnorm(0.025, 110, 10))), 100, 1), 110, 10),0) 
polygon(cord.x2,cord.y2,col='red')
abline(v=round(qnorm(0.975, 80, 10, lower.tail=T)))
abline(v=round(qnorm(0.025, 80, 10, lower.tail=T)))
text(95,0.005, "β",xpd=5)

这里就是第二类错误了，取伪

#Statistical power, 1-beta, is the probability to reject a false null hypothesis
x<- seq(50,140,length=200)
y1<- dnorm(x,80, 10)
plot(x,y1,type='l',lwd=2,col='red')
y2<- dnorm(x,110, 10)
lines(x,y2,type='l',lwd=2,col='blue')
cord.x2<- c(0,seq((round(1-qnorm(0.025,110,10))),100,1),100)
cord.y2 <- c(0,dnorm(seq((round(1-qnorm(0.025, 110, 10))), 100, 1), 110, 10),0) 
polygon(cord.x2,cord.y2,col='red')
abline(v=round(qnorm(0.975, 80, 10, lower.tail=T)))
abline(v=round(qnorm(0.025, 80, 10, lower.tail=T)))
cord.x1 <- c(100,seq(round(qnorm(0.975, 80, 10, lower.tail=T)), 140,1),140) 
cord.y1 <- c(0,dnorm(seq(round(qnorm(0.975, 80, 10, lower.tail=T)),140, 1), 110, 10),0) 
polygon(cord.x1,cord.y1,col='6')
text(95,0.005, "β",xpd=5)
text(115,0.005, "1-β",xpd=5)

WEEK3-t-tests in R 视频笔记

1. 说明文档：Student's t-Test

## Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

## S3 method for class 'formula'
t.test(formula, data, subset, na.action, ...)

2. 数据

A	B
97	76
98	87
78	89
80	90
81	94
84	96
85	98
85	99
86	100
94	106
100	109
102	112
103	113
104	105

3. 一些 t检验

由于一直狂按快进看视频，今天突然才明白剪贴板读数据这件事：就是选中一部分数据，复制，然后读，这样就行了。

> weight <- read.table("clipboard", header=T)
> t.test(weight)

	One Sample t-test

data:  weight
t = 48.2568, df = 27, p-value < 2.2e-16   #degrees of freedom = 27 (28 measurements)
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 90.65293 98.70421
sample estimates:
mean of x 
 94.67857 

> t.test(weight$A, weight$B)

	Welch Two Sample t-test

data:  weight$A and weight$B
t = -1.8423, df = 25.684, p-value = 0.077   #compensated degrees of freedom = 25.684
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.6635326   0.8063898
sample estimates:
mean of x mean of y 
 91.21429  98.14286 

> t.test(weight$A, weight$B, var.equal=T)

	Two Sample t-test

data:  weight$A and weight$B
t = -1.8423, df = 26, p-value = 0.07686   #28 - 1(per group)
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.6589062   0.8017634
sample estimates:
mean of x mean of y 
 91.21429  98.14286 

> t.test(weight$A, weight$B, paired=T)

	Paired t-test

data:  weight$A and weight$B
t = -2.4884, df = 13, p-value = 0.02717
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.943697  -0.913446
sample estimates:
mean of the differences 
              -6.928571

4. 例子

#A function for creating random examples of t-tests

ttest.for.examination <- function(x,y,z,k)
{
subjects <- x
mean1 <- y
mean2 <- z
standarddev <- k
print( c("Number of measurements:   ", x))
print( c("Mean of group 1: ", y))
print( c( "Mean of group 2: ",z))
print( c("Standard deviation:  ", k))
group1 <- round(rnorm(x, y, k))
group2 <- round(rnorm(x, z, k))
framedata <- cbind(group1, group2)
print(framedata)
print( list (t.test(group1, group2, var.equal = T), t.test(group1, group2, var.equal = T, paired = T)))
}

> ttest.for.examination(13,90,105,10)
[1] "Number of measurements:   " "13"                        
[1] "Mean of group 1: " "90"               
[1] "Mean of group 2: " "105"              
[1] "Standard deviation:  " "10"                   
      group1 group2
 [1,]     80    110
 [2,]    101    108
 [3,]     92    109
 [4,]    107    114
 [5,]     76    125
 [6,]     80    107
 [7,]     81    103
 [8,]     96     97
 [9,]     88    113
[10,]    107    118
[11,]    110    110
[12,]    101    107
[13,]     75     95
[[1]]

	Two Sample t-test

data:  group1 and group2
t = -4.115, df = 24, p-value = 0.0003939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -25.641951  -8.511895
sample estimates:
mean of x mean of y 
 91.84615 108.92308 


[[2]]

	Paired t-test

data:  group1 and group2
t = -4.4543, df = 12, p-value = 0.000787
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -25.430113  -8.723733
sample estimates:
mean of the differences 
              -17.07692

下面这个P值比较大。

> ttest.for.examination(13,92,100,10)
[1] "Number of measurements:   " "13"                        
[1] "Mean of group 1: " "92"               
[1] "Mean of group 2: " "100"              
[1] "Standard deviation:  " "10"                   
      group1 group2
 [1,]    102     95
 [2,]     99     97
 [3,]     76    107
 [4,]    102    113
 [5,]     97    111
 [6,]     88     93
 [7,]     95     85
 [8,]    103     80
 [9,]     86    102
[10,]     85    106
[11,]     96    112
[12,]    109    105
[13,]     94    103
[[1]]

	Two Sample t-test

data:  group1 and group2
t = -1.5669, df = 24, p-value = 0.1302
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -13.724765   1.878611
sample estimates:
mean of x mean of y 
 94.76923 100.69231 


[[2]]

	Paired t-test

data:  group1 and group2
t = -1.4568, df = 12, p-value = 0.1708
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.781916   2.935762
sample estimates:
mean of the differences 
              -5.923077