1. Centering data set
If we have a data set
X ∈ R n × p
(each row is a sample), then column mean of this data set can be expressed in
X ¯ = 1 n X T 1 n
So the centered data set is
X c = X − 1 n X ¯ T = X − 1 n 1 n 1 T n X = ( I − 1 n 1 n 1 T n ) X
The matrix
C = ( I − 1 n 1 p 1 T p )
is called centering matrix.
2. Application
Note that this is a more complex decomposition by centering matrix
Proof of Proposition 1 in http://blog.youkuaiyun.com/comeyan/article/details/50514596
proof: Firstly we express mean of full data set by group means,
μ ¯ = ∑ g = 1 N n g N μ ¯ g = 1 N ( n 1 − − √ μ ¯ 1 , n 2 − − √ μ ¯ 2 , ⋯ , n g − − √ μ ¯ G ) ( n 1 − − √ , n 2 − − √ , ⋯ , n G − − − √ ) T
Let
K = ( n 1 − − √ , n 2 − − √ , ⋯ , n G − − − √ ) T
, then using the formula of between covariance matrix, we have
( n 1 − − √ μ ¯ 1 − n 1 − − √ μ ¯ , n 2 − − √ μ ¯ 2 − n 2 − − √ μ ¯ , ⋯ , n G − − − √ μ ¯ G − n G − − − √ μ ¯ ) = ( n 1 − − √ μ ¯ 1 , n 2 − − √ μ ¯ 2 , ⋯ , n G − − − √ μ ¯ G ) − μ ¯ ( n 1 − − √ , n 2 − − √ , ⋯ , n G − − − √ ) = ( n 1 − − √ μ ¯ 1 , n 2 − − √ μ ¯ 2 , ⋯ , n G − − − √ μ ¯ G ) ( I − 1 N K K T )
n g − − √ μ ¯ g = n g − − √ 1 n g X T 1 g = 1 n g − − √ X T 1 g
Σ ^ b = 1 N ∑ g = 1 G n g ( μ ¯ g − μ ¯ ) ( μ ¯ g − μ ¯ ) T = 1 N ∑ g = 1 G n g − − √ ( μ ¯ g − μ ¯ ) n g − − √ ( μ ¯ g − μ ¯ ) T = 1 N ( n 1 − − √ μ ¯ 1 , n 2 − − √ μ ¯ 2 , ⋯ , n G − − − √ μ ¯ G ) ( I − 1 N K K T ) ( n 1 − − √ μ ¯ 1 , n 2 − − √ μ ¯ 2 , ⋯ , n G − − − √ μ ¯ G ) T = 1 N X T ( 1 n g − − √ 1 g ) N × G ( I − 1 N K K T ) ( 1 n g − − √ 1 g ) T N × G X = 1 N X T ( 1 n g − − √ 1 g ) N × G C ( 1 n g − − √ 1 g ) T N × G X
Claim that
C = H ~ T H ~
, where
H ~ ∈ R G − 1 × G
. That is to say
C = ( I − 1 N K K T ) = H ~ H ~ T
so
( K , H ~ T )
is an orthogonal matrix. From
the theory of orthogonal contrasts for unbalanced data , we have the
G − 1
orthogonal contrasts have the following form:
δ r = n r + 1 − − − − √ ( ∑ h = 1 r n h ( μ ¯ h − μ ¯ r + 1 ) )
Denoted by
h r
the
r
th row of H ~ . Then from the definition of orthogonal contrasts, for some constant
C r
,
X T ( 1 n g − − √ ) h T r = C r δ r
which can be rewritten as
∑ j = 1 G h r h n j − − √ μ ¯ j = C r n r + 1 − − − − √ ⎛ ⎝ ∑ j = 1 r n j ( μ ¯ j − μ ¯ r + 1 ) ⎞ ⎠
Then
h r j n j − − √ h r r + 1 n r + 1 − − − − √ h r i = C r n r + 1 − − − − √ n j = − C r n r + 1 − − − − √ ∑ t = 1 r n t = 0 for j = 1 , 2 , ⋯ , r
which gives
h r j h r r + 1 h r i = C r n r + 1 n j − − − − − − √ = − C r ∑ t = 1 r n t = 0 for j = 1 , 2 , ⋯ , r
To making
∥ h r ∥ 2 = 1
, set
C 2 r ⎛ ⎝ ∑ i = 1 r n r + 1 n i + ⎛ ⎝ ∑ j = 1 r n j ⎞ ⎠ 2 ⎞ ⎠ = 1
C r = 1 ∑ r i = 1 n i ∑ r + 1 j = 1 n j
Now
X T ( 1 n g − − √ 1 g ) h r = n r + 1 − − − − √ ∑ r i = 1 n i ∑ r + 1 j = 1 n j ( ∑ h = 1 r n h ( μ ¯ h − μ ¯ r + 1 ) )
So
Σ ^ b = 1 N X T ( 1 n g − − √ 1 g ) N × G C ( 1 n g − − √ 1 g ) T N × G X = 1 N X T ( 1 n g − − √ 1 g ) N × G H ~ T H ~ ( 1 n g − − √ 1 g ) T N × G X = Δ Δ T
where
Δ = 1 N √ X T ( 1 n g √ 1 g ) N × G H ~ T
and
Δ r = n r + 1 − − − − √ N − − √ ∑ r i = 1 n i ∑ r + 1 j = 1 n j ( ∑ h = 1 r n h ( μ ¯ h − μ ¯ r + 1 ) )