Source: http://www.r-bloggers.com/wilcoxon-signed-rank-test/
非参数的统计假设检验,用于比较两个配对样本之间的均值。
一个城市的市长想要看看在关闭一些街道对汽车的通行后污染层度是否有所减少。于是每60分钟就测量一次污染率(8am ~ 22pm:总共15次测量),分别在交通开放的一天和交通关闭的一天各测量一组,以下是空气污染的值:
With traffic: 214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219, 119, 234
Without traffic: 159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171, 112
显然这两组数是配对的,读数之间是有联系的,尽管在不同的两天测量而实际上我们考察的同一座城市(有其特定的天气,通风情况等等)。因为不能假设所记录值为Gaussian分布,我们必须进行非参数检验,即Wilcoxon 符号秩检验
a <- c(214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219, 119, 234)
b <- c(159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171, 112)
wilcox.test(a,b, paired=TRUE)
Wilcoxon signed rank test
data: a and b
V = 80, p-value = 0.2769
alternative hypothesis: true location shift is not equal to 0
因为p-value大于0.05,我们论断均值基本保持不变(我们接受null hypothesis H0),就城市的污染来说,对一天阻止交通没有引起任何提升。值V=80对应于分配给有正号的差异的秩的和。我们可人工计算分配给有正号的差异的秩的和,以及分配给有负号的差异的秩的和,将这个区间与针对配对样本的Wilcoxon表上查表得到的区间进行比较,以确认我们的统计决策。以下显示如何计算这两个和:
a <- c(214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219, 119, 234)
b <- c(159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171, 112)
#calculating the vector containing the differences
diff <- c(a - b)
#delete all differences equal to zero
diff <- diff[ diff!=0 ]
#check the ranks of the differences, taken in absolute
diff.rank <- rank(abs(diff))
#check the sign to the ranks, recalling the signs of the values of the differences
diff.rank.sign <- diff.rank * sign(diff)
#calculating the sum of ranks assigned to the differences as a positive, ie greater than zero
ranks.pos <- sum(diff.rank.sign[diff.rank.sign > 0])
#calculating the sum of ranks assigned to the differences as a negative, ie less than zero
ranks.neg <- -sum(diff.rank.sign[diff.rank.sign < 0])
#it is the value V of the wilcoxon signed rank test
ranks.pos
[1] 80
ranks.neg
[1] 40
计算的区间为(40,80)。而在Wilcoxon配对样本表上,有15差异的查表区间为(25,95)。因为计算的区间包含在查表区间中,我们接受null hypothesis H0,即均值相等。如同由p-value预测的情况,关闭公路交通没有为污染率带来任何改善。
=========================================
实际上,在以下两种情况的输入参数时,wilcox.test使用的是sign rank检验方法:
wilcox.test(a, b, paired=T)
Wilcoxon signed rank test
data: a and b
V = 80, p-value = 0.2769
alternative hypothesis: true location shift is not equal to 0
#----------------------------------------
wilcox.test(a-b, paired=F, mu=0)
Wilcoxon signed rank test
data: a - b
V = 80, p-value = 0.2769
alternative hypothesis: true location is not equal to 0
#----------------------------------------
wilcox.test(b, a, paired=T)
Wilcoxon signed rank test
data: b and a
V = 40, p-value = 0.2769
alternative hypothesis: true location shift is not equal to 0
#----------------------------------------
wilcox.test(b-a, paired=F, mu=0)
Wilcoxon signed rank test
data: b - a
V = 40, p-value = 0.2769
alternative hypothesis: true location is not equal to 0
这里前两个函数形式等同,后两个函数形式也等同。而求得的V值分别是低尾部值和高尾部值。而我们可以用以下函数获取可接受的区间:
qsignrank(0.025, length(a-b), lower.tail=T)
[1] 26
qsignrank(0.025, length(a-b), lower.tail=F)
[1] 94
显然区间(40, 80)包含在区间[26, 94]之内,所以接受null hypothesis,即a和b的均值显著相等。