R in Action 学习笔记－第九章－Analysis of Variance

最新推荐文章于 2024-10-24 17:10:34 发布

原创最新推荐文章于 2024-10-24 17:10:34 发布 · 1.2k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#RinAction #ANOVA

R学习笔记专栏收录该内容

3 篇文章

订阅专栏

本文介绍了使用R语言进行一元方差分析(ANOVA)的步骤，包括通过QQ图检查正态性，使用aov进行ANOVA测试，通过TukeyHSD查看组间差异，并利用outlierTest检查是否存在异常值。实验结果显示，不同治疗方式对胆固醇降低效果存在显著差异。

One-way ANOVA

> install.packages("multcomp")

> library(multcomp)

> attach(cholesterol)

条件：因为 one-way ANOVA 的因变量需要满足正态分布，并且各组等方差

In a one-way ANOVA, the dependent variable is assumed to be normally distributed, and have equal variance in each group

Step-1：用QQPlot检查是否符合正态分布条件

use a Q-Q plot to assess the normality assumption

> install.packages("car")

> library(car)

> qqPlot(lm(response~trt,data=cholesterol),simulate=TRUE,main="Q-Q Plot",labels=FALSE)

#data fall within the 95 percent confidence envelope,

#suggesting that the normality assumption has been met fairly well

Step-2: 用 ANOVA

> fit <- aov(response~trt)

> summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
trt 4 1351.4 337.8 32.43 9.82e-13 ***
Residuals 45 468.8 10.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

结论：

The ANOVA F test for treatment (trt) is significant (p < .0001), providing evidence that the five treatments aren't all equally effective

可视化

> library(gplots)

> plotmeans(response~trt,xlab="Treatment",ylab="Response",main="Mean Plot with 95% CI")

Step-3: 用其他test方法可得到相同结论

Bartlett’s test

> bartlett.test()

Fligner–Killeen test

> fligner.test()

Brown–Forsythe test

> hov()

> bartlett.test(response~trt,data=cholesterol)

Bartlett test of homogeneity of variances

data: response by trt
Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653#indicates that the variances in the five groups don’t differ significantly

Step-4：看具体两两组间的差异

tell you which treatments differ from one another

> TukeyHSD(fit)#TukeyHSD() function provides a test of all pairwise differences between group means
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = response ~ trt)

$trt
diff lwr upr p adj
2times-1time 3.44300 -0.6582817 7.544282 0.1380949#for1timeand2timesaren’t significantly different from each other (p = 0.138)
4times-1time 6.59281 2.4915283 10.694092 0.0003542#the difference between 1time and 4times is significantly different (p < .001)
drugD-1time 9.57920 5.4779183 13.680482 0.0000003
drugE-1time 15.16555 11.0642683 19.266832 0.0000000
4times-2times 3.14981 -0.9514717 7.251092 0.2050382
drugD-2times 6.13620 2.0349183 10.237482 0.0009611
drugE-2times 11.72255 7.6212683 15.823832 0.0000000
drugD-4times 2.98639 -1.1148917 7.087672 0.2512446
drugE-4times 8.57274 4.4714583 12.674022 0.0000037
drugE-drugD 5.58635 1.4850683 9.687632 0.0030633

可视化

> par(las=2)＃rotatesthe axis labels

> par(mar=c(5,8,4,2))＃increases the left margin area so that the labels fit

> plot(TukeyHSD(fit))

可视化#reproduces the Tukey HSD test, along with a different graphical representation of the results

> par(mar=c(5,4,6,2))#increased the top margin to fit the letter array

> tuk <- glht(fit,linfct=mcp(trt="Tukey"))

#glht() provides a much more comprehensive set of methods for multiple mean comparisons

#you can use for both linear models and generalized linear models

> plot(cld(tuk,level=.05),col="lightgrey")#significance level to use 0.05, or 95percent confidence in this case

参考：

http://www.inside-r.org/packages/cran/multcomp/docs/glht

http://www.stat.wmich.edu/wang/664/egs/Rmice.html

图表说明：具有相同字母的两组间没有显著的不同

Groups that have the same letter don’t have significantly different means

taking the cholesterol-lowering drug in5 mg dosesfour times a day was better than taking a 20 mg dose once per day

The competitordrugD wasn’t superior to this four-times-per-day regimen

competitor drugE was superior to both drugD and all three dosage strategies for our focus drug

条件：因为 ANOVA 对outlier比较敏感

analysis of variance methodologies can be sensitive to the presence of outliers

Step-5: 所以要检查是否没有outlier

> outlierTest(fit)

No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
19 2.251149 0.029422 NA #there’s no indication of outliers in the cholesterol data (NA occurs when p > 1)