One-way ANOVA
> install.packages("multcomp")
> library(multcomp)
> attach(cholesterol)
条件:因为 one-way ANOVA 的因变量需要满足正态分布,并且各组等方差
In a one-way ANOVA, the dependent variable is assumed to be normally distributed, and have equal variance in each group
Step-1:用QQPlot检查是否符合正态分布条件
use a Q-Q plot
to assess the normality assumption
> install.packages("car")
> library(car)
> qqPlot(lm(response~trt,data=cholesterol),simulate=TRUE,main="Q-Q Plot",labels=FALSE)
#data fall within the 95 percent confidence envelope,
#suggesting that the normality assumption has been met fairly well
Step-2: 用 ANOVA
> fit <- aov(response~trt)
> summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
trt 4 1351.4 337.8 32.43 9.82e-13 ***
Residuals 45 468.8 10.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
结论:
The ANOVA F test for treatment (trt) is significant (p < .0001), providing
evidence that the five treatments aren't all equally effective
可视化
> library(gplots)
>
plotmeans(response~trt,xlab="Treatment",ylab="Response",main="Mean Plot with 95% CI")
Step-3: 用其他test方法可得到相同结论
Bartlett’s test
> bartlett.test()
Fligner–Killeen test
Brown–Forsythe test
> bartlett.test(response~trt,data=cholesterol)
Bartlett test of homogeneity of variances
data: response by trt
Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653#indicates that the variances in the five groups don’t differ significantly
Step-4:看具体两两组间的差异
tell you which treatments
differ from one another
> TukeyHSD(fit)#TukeyHSD() function
provides a test of all pairwise differences between group means
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = response ~ trt)
$trt
diff lwr upr p adj
2times-1time 3.44300 -0.6582817 7.544282 0.1380949#for1timeand2timesaren’t
significantly different from each other (p = 0.138)
4times-1time 6.59281 2.4915283 10.694092 0.0003542#the difference between 1time and 4times is significantly different (p < .001)
drugD-1time 9.57920 5.4779183 13.680482 0.0000003
drugE-1time 15.16555 11.0642683 19.266832 0.0000000
4times-2times 3.14981 -0.9514717 7.251092 0.2050382
drugD-2times 6.13620 2.0349183 10.237482 0.0009611
drugE-2times 11.72255 7.6212683 15.823832 0.0000000
drugD-4times 2.98639 -1.1148917 7.087672 0.2512446
drugE-4times 8.57274 4.4714583 12.674022 0.0000037
drugE-drugD 5.58635 1.4850683 9.687632 0.0030633
可视化
> par(las=2)#rotatesthe axis labels
> par(mar=c(5,8,4,2))#increases the left margin area so that the labels fit
>
plot(TukeyHSD(fit))
可视化#reproduces the Tukey HSD test, along with a different graphical representation of the results
> par(mar=c(5,4,6,2))#increased the top margin to fit the letter array
> tuk <- glht(fit,linfct=mcp(trt="Tukey"))
#glht() provides a much more comprehensive set of methods for multiple mean comparisons
#you can use for both linear models and generalized linear models
> plot(cld(tuk,level=.05),col="lightgrey")#significance level to use 0.05, or 95percent confidence in this case
参考:
http://www.inside-r.org/packages/cran/multcomp/docs/glht
http://www.stat.wmich.edu/wang/664/egs/Rmice.html
图表说明:具有相同字母的两组间没有显著的不同
Groups that have the same letter don’t have significantly different means
taking the cholesterol-lowering drug in5 mg dosesfour times a day was better than taking a 20 mg dose once per day
The competitordrugD wasn’t superior to this four-times-per-day regimen
competitor drugE was superior to both drugD and all three dosage strategies for our focus drug
条件:因为 ANOVA 对outlier比较敏感
analysis of variance methodologies can be sensitive to the presence of outliers
Step-5: 所以要检查是否没有outlier
>
outlierTest(fit)
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
19 2.251149 0.029422 NA
#there’s no indication of outliers in the cholesterol data (NA occurs when p > 1)
最后:
Taking the Q-Q plot, Bartlett’s test, and outlier test together, the data appear to fit the ANOVA model quite well.
总结:
1. 先用 qqPlot 看变量是否符合正态分布,car 包
2. 用 aov 再 summary 看多组间是否具有显著差异性,multcomp包
3. 用 ghlt 的 Tukey 看两两间的差异性
4. 用 outlierTest 看是否满足没有 outlier