NOTES
1 Intruduction of ANOVA
1.1 terminology
If we have 4 groupps of samples, with mean μ 1 , μ 2 , μ 3 , μ 4 \mu_1, \mu_2, \mu_3, \mu_4 μ1,μ2,μ3,μ4, respectively. If we use t-test to test whether the means of 4 population are equle, we have to repeat 6 times t-test.
If all tests are made at some specified significance level (the possibility of error I α \alpha α), the overall level of 6 tests together will be 1 − ( 1 − α ) 6 = 0.625 > > 0.05 1-(1-\alpha)^6 = 0.625 >> 0.05 1−(1−α)6=0.625>>0.05.
Generally speaking, with the increase of the times of carrying on the significant test, the significant level will decrease.
ANOVA (analysis of variance) is used to test equality of multiple overall mean value.
Suppose we want to compare the quality of different industries according to the number of complaints received.
Here, the object be tested (diferent industries) is defined as factor, and the proformence of the factor(the number of complaints received) is defined as treatment. One-way anova means there are only one factor.
1.2 Principles and basic ideas
- describe with plot
Using scatter plot to explore the data, and check the diference. - error separation
SST: Reflecting the error of all datas.
SSE: Reflecting the with-in group error.
SSA: Reflecting the group error. - error analysis
Analyse where the error comes from, with-in group or between group.
1.3 Assumption
- The populations are normal distributions.
- The populations have the same variance.
- observation is independent.
Typically:
H 0 : μ 1 = μ 2 = . . . = μ k H 1 : μ i ≠ μ j f o r s o m e p a i r ( i , j ) H_0: \mu_1=\mu_2=...=\mu_k \\ H_1: \mu_i \ne \mu_j\quad for\ some\ pair(i, j) H0:μ1=μ2=...=μkH1:μi=μjfor some pair(i,j)
Exrcise
Start by installing (if needed) and loading:
- car
- lmtest
- multcomp
One-way anova analysis is used when we want to see if the mean of a continuous variable differs between groups (i.e. between levels of a single categorical variable, a.k.a. factor). It is a generalisation of the t-test, and can be applied to more than two groups. The significance of the categorical variable depends on the relationship between the wit