1. 基本定义
对于一组数值 Xn=[x1,x2,...,xn]X_n = [x_1, x_2, ..., x_n]Xn=[x1,x2,...,xn],其均值为
Xn‾=1n∑i=1nxi\overline{X_n} = \frac{1}{n} \sum_{i=1}^n x_iXn=n1i=1∑nxi
方差 (variance) 为 σn2=1n∑i=1n(xi−Xn‾)2\sigma_n ^2= \frac{1}{n}\sum_{i=1}^n(x_i-\overline{X_n})^2σn2=n1i=1∑n(xi−Xn)2
结合另一组数值 Yn=[y1,y2,...,yn]Y_n=[y_1,y_2, ..., y_n]Yn=[y1,y2,...,yn],它们的协方差 (covariance) 为
cov(Xn,Yn)=1n∑i=1n(xi−Xn‾)(yi−Yn‾)cov(X_n, Y_n) = \frac{1}{n} \sum_{i=1}^n (x_i-\overline{X_n})(y_i-\overline{Y_n})cov(Xn,Yn)=n1i=1∑n(xi−Xn)(yi−Yn)
2. 流式计算需求
流式计算无法保留全量数据,因此要求只遍历数据一次就能算出这些值。均值的计算比较简单,直接遍历一次即可求和以及统计出数据总数n。方差的计算需要先遍历一次数据求出均值,再遍历一次才能算出方差,总共需要遍历两次,因此暴力算法不符合需求。协方差的计算同理。下面给出增量的计算方法,只需要遍历数据一次。
定义 Xn‾\overline{X_n}Xn 为数组 [x1,x2,...,xn][x_1, x_2, ..., x_n][x1,x2,...,xn] 的均值,Xn−1‾\overline{X_{n-1}}Xn−1表示前n-1个数的均值。
定义 vnv_nvn 为 nσn2n\sigma_n^2nσn2,即前n个数的方差的n倍,vn−1v_{n-1}vn−1表示 (n−1)σn−12(n-1)\sigma_{n-1}^2(n−1)σn−12,即前n-1个数的方差的 n-1 倍。
当我们已知前 n-1 个数的均值 Xn−1‾\overline{X_{n-1}}Xn−1 和 vn−1v_{n-1}vn−1时,给定新到的第n个数 xnx_nxn,新的均值和方差可如下计算:
Xn‾=Xn−1‾+xn−Xn−1‾nvn=vn−1+(xn−Xn−1‾)(xn−Xn‾)
\overline{X_n}=\overline{X_{n-1}}+\frac{x_n-\overline{X_{n-1}}}{n}\\
v_n=v_{n-1}+(x_n-\overline{X_{n-1}})(x_n-\overline{X_n})
Xn=Xn−1+nxn−Xn−1vn=vn−1+(xn−Xn−1)(xn−Xn)对于协方差,定义 VnV_nVn 为 n 倍的 cov(Xn,Yn)cov(X_n, Y_n)cov(Xn,Yn) ,即前n对数的协方差的n倍。可如下增量计算:Vn=Vn−1+(xn−Xn‾)(yn−Yn−1‾)或=Vn−1+(xn−Xn−1‾)(yn−Yn‾)\begin{aligned}
V_n&=V_{n-1}+(x_n-\overline{X_n})(y_n-\overline{Y_{n-1}})\\
或&=V_{n-1}+(x_n-\overline{X_{n-1}})(y_n-\overline{Y_n})\end{aligned}Vn或=Vn−1+(xn−Xn)(yn−Yn−1)=Vn−1+(xn−Xn−1)(yn−Yn)上面两式右边部分是相等的。文章最后有这些公式的推导证明。
3. 聚合计算需求
当数据量比较大,需要分布式并行计算时,要求能把多个分片的中间结果合并成最终结果。假设有两个分片,分别有 n 和 m 组数据,第一个分片为 [(x1,1,y1,1),(x1,2,y1,2),(x1,3,y1,3),...,(x1,n,y1,n)][(x_{1,1}, y_{1,1}), (x_{1,2}, y_{1,2}), (x_{1,3}, y_{1,3}), ..., (x_{1,n}, y_{1,n})][(x1,1,y1,1),(x1,2,y1,2),(x1,3,y1,3),...,(x1,n,y1,n)],第二个分片为 [(x2,1,y2,1),(x2,2,y2,2),(x2,3,y2,3),...,(x2,m,y2,m)][(x_{2,1}, y_{2,1}), (x_{2,2}, y_{2,2}), (x_{2,3}, y_{2,3}), ..., (x_{2,m}, y_{2,m})][(x2,1,y2,1),(x2,2,y2,2),(x2,3,y2,3),...,(x2,m,y2,m)]
设 Xn‾\overline{X_n}Xn 为 [x1,1,x1,2,x1,3,...,x1,n][x_{1,1}, x_{1,2}, x_{1,3}, ...,x_{1,n}][x1,1,x1,2,x1,3,...,x1,n] 的均值,Yn‾\overline{Y_n}Yn 为 [y1,1,y1,2,y1,3,...,y1,n][y_{1,1}, y_{1,2}, y_{1,3}, ...,y_{1,n}][y1,1,y1,2,y1,3,...,y1,n] 的均值,VnV_nVn 为 n 倍的 cov(Xn,Yn)cov(X_n, Y_n)cov(Xn,Yn),即 Vn=∑i=1n(xi−Xn‾)(yi−Yn‾)
V_n = \sum_{i=1}^n (x_i-\overline{X_n})(y_i-\overline{Y_n})
Vn=i=1∑n(xi−Xn)(yi−Yn) Xm‾\overline{X_m}Xm 为 [x2,1,x2,2,x2,3,...,x2,m][x_{2,1}, x_{2,2}, x_{2,3}, ...,x_{2,m}][x2,1,x2,2,x2,3,...,x2,m] 的均值,Ym‾\overline{Y_m}Ym 为 [y2,1,y2,2,y2,3,...,y2,m][y_{2,1}, y_{2,2}, y_{2,3}, ...,y_{2,m}][y2,1,y2,2,y2,3,...,y2,m] 的均值,VmV_mVm 为 m 倍的 cov(Xm,Ym)cov(X_m, Y_m)cov(Xm,Ym). 现在要求 Xn+m‾\overline{X_{n+m}}Xn+m 和 Vn+mV_{n+m}Vn+m,可如下计算: Xn+m‾=nXn‾+mXm‾n+mVn+m=Vn+Vm+nmn+m(Xn‾−Xm‾)(Yn‾−Ym‾)
\overline{X_{n+m}} = \frac{n\overline{X_n}+m\overline{X_m}}{n+m}\\
V_{n+m}=V_n+V_m+\frac{nm}{n+m}(\overline{X_n}-\overline{X_m})(\overline{Y_n}-\overline{Y_m})Xn+m=n+mnXn+mXmVn+m=Vn+Vm+n+mnm(Xn−Xm)(Yn−Ym)
4. 滑动窗口计算需求
在窗口函数的计算中,窗口滑动时会新增一个数据,也可能剔除一个最老的数据。新增数据可使用前面流式计算的方法来更新,剔除数据也是利用相同的公式。由于均值、方差、协方差的计算与数据顺序无关,假设要剔除的就是 xnx_nxn,于是
Xn−1‾=Xn‾−xn−Xn‾n−1vn−1=vn−(xn−Xn−1‾)(xn−Xn‾)Vn−1=Vn−(xn−Xn‾)(yn−Yn−1‾)或=Vn−(xn−Xn−1‾)(yn−Yn‾)
\overline{X_{n-1}}=\overline{X_n}-\frac{x_n-\overline{X_n}}{n-1}\\
v_{n-1}=v_n-(x_n-\overline{X_{n-1}})(x_n-\overline{X_n})\\
\begin{aligned}
V_{n-1}&=V_n-(x_n-\overline{X_n})(y_n-\overline{Y_{n-1}})\\
或&=V_n-(x_n-\overline{X_{n-1}})(y_n-\overline{Y_n})
\end{aligned}
Xn−1=Xn−n−1xn−Xnvn−1=vn−(xn−Xn−1)(xn−Xn)Vn−1或=Vn−(xn−Xn)(yn−Yn−1)=Vn−(xn−Xn−1)(yn−Yn)
5. 推导证明
5.1 均值的增量计算
Xn‾=1n∑i=1nxi=1n(xn+∑i=1n−1xi)=1n(xn+(n−1)Xn−1‾)=1n(nXn−1‾+xn−Xn−1‾)=Xn−1‾+xn−Xn−1‾n \overline{X_n}=\frac{1}{n} \sum_{i=1}^n x_i=\frac{1}{n} (x_n + \sum_{i=1}^{n-1} x_i) =\frac{1}{n} (x_n + (n-1)\overline{X_{n-1}})\\ =\frac{1}{n} (n\overline{X_{n-1}}+x_n-\overline{X_{n-1}}) =\overline{X_{n-1}}+\frac{x_n-\overline{X_{n-1}}}{n} Xn=n1i=1∑nxi=n1(xn+i=1∑n−1xi)=n1(xn+(n−1)Xn−1)=n1(nXn−1+xn−Xn−1)=Xn−1+nxn−Xn−1
5.2 方差的增量计算
vn=∑i=1n(xi−Xn‾)2=∑i=1n−1(xi−Xn‾)2+(xn−Xn‾)2=∑i=1n−1(xi−Xn−1‾+Xn−1‾−Xn‾)2+(xn−Xn‾)2=∑i=1n−1(xi−Xn−1‾)2+2(Xn−1‾−Xn‾)∑i=1n−1(xi−Xn−1‾)+(n−1)(Xn−1‾−Xn‾)2+(xn−Xn‾)2
v_n=\sum_{i=1}^n(x_i-\overline{X_n})^2=\sum_{i=1}^{n-1}(x_i-\overline{X_n})^2+(x_n-\overline{X_n})^2\\
=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}}+\overline{X_{n-1}}-\overline{X_n})^2+(x_n-\overline{X_n})^2\\
=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})^2+2(\overline{X_{n-1}}-\overline{X_n})\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})+(n-1)(\overline{X_{n-1}}-\overline{X_n})^2+(x_n-\overline{X_n})^2
vn=i=1∑n(xi−Xn)2=i=1∑n−1(xi−Xn)2+(xn−Xn)2=i=1∑n−1(xi−Xn−1+Xn−1−Xn)2+(xn−Xn)2=i=1∑n−1(xi−Xn−1)2+2(Xn−1−Xn)i=1∑n−1(xi−Xn−1)+(n−1)(Xn−1−Xn)2+(xn−Xn)2上式第一项就是vn−1v_{n-1}vn−1,第二项是0,因为
∑i=1n−1(xi−Xn−1‾)=∑i=1n−1xi−(n−1)Xn−1‾=(n−1)Xn−1‾−(n−1)Xn−1‾=0\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})=\sum_{i=1}^{n-1}x_i-(n-1)\overline{X_{n-1}}=(n-1)\overline{X_{n-1}}-(n-1)\overline{X_{n-1}}=0
i=1∑n−1(xi−Xn−1)=i=1∑n−1xi−(n−1)Xn−1=(n−1)Xn−1−(n−1)Xn−1=0于是 vn=vn−1+(n−1)(Xn−1‾−Xn‾)2+(xn−Xn‾)2v_n=v_{n-1}+(n-1)(\overline{X_{n-1}}-\overline{X_n})^2+(x_n-\overline{X_n})^2vn=vn−1+(n−1)(Xn−1−Xn)2+(xn−Xn)2
简化一下增量部分,注意到有
(n−1)(Xn−1‾−Xn‾)=(n−1)Xn−1‾−(n−1)Xn‾=nXn‾−xn−(n−1)Xn‾=Xn‾−xn(n-1)(\overline{X_{n-1}}-\overline{X_n})=(n-1)\overline{X_{n-1}}-(n-1)\overline{X_n}=n\overline{X_n}-x_n-(n-1)\overline{X_n}=\overline{X_n}-x_n
(n−1)(Xn−1−Xn)=(n−1)Xn−1−(n−1)Xn=nXn−xn−(n−1)Xn=Xn−xn于是
(n−1)(Xn−1‾−Xn‾)2+(xn−Xn‾)2=(n−1)(Xn−1‾−Xn‾)(Xn−1‾−Xn‾)+(xn−Xn‾)2=(Xn‾−xn)(Xn−1‾−Xn‾)+(xn−Xn‾)2=(xn−Xn‾)(Xn‾−Xn−1‾)+(xn−Xn‾)2=(xn−Xn‾)(xn−Xn‾+Xn‾−Xn−1‾)=(xn−Xn‾)(xn−Xn−1‾)
(n-1)(\overline{X_{n-1}}-\overline{X_n})^2+(x_n-\overline{X_n})^2\\
\begin{aligned}
&=(n-1)(\overline{X_{n-1}}-\overline{X_n})(\overline{X_{n-1}}-\overline{X_n})+(x_n-\overline{X_n})^2\\
&=(\overline{X_n}-x_n)(\overline{X_{n-1}}-\overline{X_n})+(x_n-\overline{X_n})^2\\
&=(x_n-\overline{X_n})(\overline{X_n}-\overline{X_{n-1}})+(x_n-\overline{X_n})^2\\
&=(x_n-\overline{X_n})(x_n-\overline{X_n}+\overline{X_n}-\overline{X_{n-1}})\\
&=(x_n-\overline{X_n})(x_n-\overline{X_{n-1}})
\end{aligned}
(n−1)(Xn−1−Xn)2+(xn−Xn)2=(n−1)(Xn−1−Xn)(Xn−1−Xn)+(xn−Xn)2=(Xn−xn)(Xn−1−Xn)+(xn−Xn)2=(xn−Xn)(Xn−Xn−1)+(xn−Xn)2=(xn−Xn)(xn−Xn+Xn−Xn−1)=(xn−Xn)(xn−Xn−1)于是 vn=vn−1+(xn−Xn−1‾)(xn−Xn‾)v_n=v_{n-1}+(x_n-\overline{X_{n-1}})(x_n-\overline{X_n})vn=vn−1+(xn−Xn−1)(xn−Xn)
5.3 协方差的增量计算
思路与方差的证明相同(实际上方差就是协方差的特殊形式)
Vn=∑i=1n−1(xi−Xn‾)(yi−Yn‾)+(xn−Xn‾)(yn−Yn‾)
V_n=\sum_{i=1}^{n-1}(x_i-\overline{X_n})(y_i-\overline{Y_n})+(x_n-\overline{X_n})(y_n-\overline{Y_n})
Vn=i=1∑n−1(xi−Xn)(yi−Yn)+(xn−Xn)(yn−Yn)把第一项中的均值换成 Xn−1‾\overline{X_{n-1}}Xn−1 和 Yn−1‾\overline{Y_{n-1}}Yn−1,先换 Xn‾\overline{X_n}Xn
∑i=1n−1(xi−Xn‾)(yi−Yn‾)=∑i=1n−1(xi−Xn−1‾+Xn−1‾−Xn‾)(yi−Yn‾)=∑i=1n−1(xi−Xn−1‾)(yi−Yn‾)+(Xn−1‾−Xn‾)∑i=1n−1(yi−Yn‾)=∑i=1n−1(xi−Xn−1‾)(yi−Yn−1‾+Yn−1‾−Yn‾)+(Xn−1‾−Xn‾)∑i=1n−1(yi−Yn‾)=∑i=1n−1(xi−Xn−1‾)(yi−Yn−1‾)+∑i=1n−1(xi−Xn−1‾)(Yn−1‾−Yn‾)+(Xn−1‾−Xn‾)∑i=1n−1(yi−Yn‾)
\begin{aligned}
\sum_{i=1}^{n-1}(x_i-\overline{X_n})(y_i-\overline{Y_n})
&=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}}+\overline{X_{n-1}}-\overline{X_n})(y_i-\overline{Y_n})\\
&=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})(y_i-\overline{Y_n})+(\overline{X_{n-1}}-\overline{X_n})\sum_{i=1}^{n-1}(y_i-\overline{Y_n})\\
&=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})(y_i-\overline{Y_{n-1}}+\overline{Y_{n-1}}-\overline{Y_n})+(\overline{X_{n-1}}-\overline{X_n})\sum_{i=1}^{n-1}(y_i-\overline{Y_n})\\
&=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})(y_i-\overline{Y_{n-1}})+\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})(\overline{Y_{n-1}}-\overline{Y_n})+(\overline{X_{n-1}}-\overline{X_n})\sum_{i=1}^{n-1}(y_i-\overline{Y_n})
\end{aligned}
i=1∑n−1(xi−Xn)(yi−Yn)=i=1∑n−1(xi−Xn−1+Xn−1−Xn)(yi−Yn)=i=1∑n−1(xi−Xn−1)(yi−Yn)+(Xn−1−Xn)i=1∑n−1(yi−Yn)=i=1∑n−1(xi−Xn−1)(yi−Yn−1+Yn−1−Yn)+(Xn−1−Xn)i=1∑n−1(yi−Yn)=i=1∑n−1(xi−Xn−1)(yi−Yn−1)+i=1∑n−1(xi−Xn−1)(Yn−1−Yn)+(Xn−1−Xn)i=1∑n−1(yi−Yn)
上式第二项为0,因为∑i=1n−1(xi−Xn−1‾)=∑i=1n−1xi−(n−1)Xn−1‾=∑i=1n−1xi−∑i=1n−1xi=0\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})=\sum_{i=1}^{n-1}x_i-(n-1)\overline{X_{n-1}}=\sum_{i=1}^{n-1}x_i-\sum_{i=1}^{n-1}x_i=0∑i=1n−1(xi−Xn−1)=∑i=1n−1xi−(n−1)Xn−1=∑i=1n−1xi−∑i=1n−1xi=0
第三项可简化为 (Xn−1‾−Xn‾)(Yn‾−yn)(\overline{X_{n-1}}-\overline{X_n})(\overline{Y_n}-y_n)(Xn−1−Xn)(Yn−yn),
因为∑i=1n−1(yi−Yn‾)+yn−Yn‾=∑i=1nyi−nYn‾=0\sum_{i=1}^{n-1}(y_i-\overline{Y_n})+y_n-\overline{Y_n}=\sum_{i=1}^n{y_i}-n\overline{Y_n}=0∑i=1n−1(yi−Yn)+yn−Yn=∑i=1nyi−nYn=0
所以有∑i=1n−1(yi−Yn‾)=Yn‾−yn\sum_{i=1}^{n-1}(y_i-\overline{Y_n})=\overline{Y_n}-y_n∑i=1n−1(yi−Yn)=Yn−yn
于是
∑i=1n−1(xi−Xn‾)(yi−Yn‾)=∑i=1n−1(xi−Xn−1‾)(yi−Yn−1‾)−(Xn−1‾−Xn‾)(yn−Yn‾)=Vn−1+(Xn‾−Xn−1‾)(yn−Yn‾)
\sum_{i=1}^{n-1}(x_i-\overline{X_n})(y_i-\overline{Y_n})=\sum_{i=1}^{n-1}(x_i-\overline{X_{n-1}})(y_i-\overline{Y_{n-1}})-(\overline{X_{n-1}}-\overline{X_n})(y_n-\overline{Y_n})\\
=V_{n-1}+(\overline{X_n}-\overline{X_{n-1}})(y_n-\overline{Y_n})
i=1∑n−1(xi−Xn)(yi−Yn)=i=1∑n−1(xi−Xn−1)(yi−Yn−1)−(Xn−1−Xn)(yn−Yn)=Vn−1+(Xn−Xn−1)(yn−Yn)于是
Vn=Vn−1+(Xn‾−Xn−1‾)(yn−Yn‾)+(xn−Xn‾)(yn−Yn‾)=Vn−1+(xn−Xn‾+Xn‾−Xn−1‾)(yn−Yn‾)=Vn−1+(xn−Xn−1‾)(yn−Yn‾)
\begin{aligned}
V_n&=V_{n-1}+(\overline{X_n}-\overline{X_{n-1}})(y_n-\overline{Y_n})+(x_n-\overline{X_n})(y_n-\overline{Y_n})\\
&=V_{n-1}+(x_n-\overline{X_n}+\overline{X_n}-\overline{X_{n-1}})(y_n-\overline{Y_n})\\
&=V_{n-1}+(x_n-\overline{X_{n-1}})(y_n-\overline{Y_n})
\end{aligned}
Vn=Vn−1+(Xn−Xn−1)(yn−Yn)+(xn−Xn)(yn−Yn)=Vn−1+(xn−Xn+Xn−Xn−1)(yn−Yn)=Vn−1+(xn−Xn−1)(yn−Yn)
5.4 协方差的聚合计算
Vn=∑i=1n(x1,i−Xn‾)(y1,i−Yn‾)Vm=∑i=1m(x2,i−Xm‾)(y2,i−Ym‾)Vn+m=∑i=1n(x1,i−Xn+m‾)(y1,i−Yn+m‾)+∑i=1m(x2,i−Xn+m‾)(y2,i−Yn+m‾)
V_n = \sum_{i=1}^n (x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_n})\\
V_m = \sum_{i=1}^m(x_{2,i}-\overline{X_m})(y_{2,i}-\overline{Y_m})\\
V_{n+m} = \sum_{i=1}^n(x_{1,i}-\overline{X_{n+m}})(y_{1,i}-\overline{Y_{n+m}})+\sum_{i=1}^m(x_{2,i}-\overline{X_{n+m}})(y_{2,i}-\overline{Y_{n+m}})
Vn=i=1∑n(x1,i−Xn)(y1,i−Yn)Vm=i=1∑m(x2,i−Xm)(y2,i−Ym)Vn+m=i=1∑n(x1,i−Xn+m)(y1,i−Yn+m)+i=1∑m(x2,i−Xn+m)(y2,i−Yn+m) 简化一下第一项,
∑i=1n(x1,i−Xn‾+Xn‾−Xn+m‾)(y1,i−Yn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn+m‾)+(Xn‾−Xn+m‾)∑i=1n(y1,i−Yn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn+m‾)+(Xn‾−Xn+m‾)(nYn‾−nYn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn+m‾)+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn‾+Yn‾−Yn+m‾)+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn‾)+(Yn‾−Yn+m‾)∑i=1n(x1,i−Xn‾)+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾)=∑i=1n(x1,i−Xn‾)(y1,i−Yn‾)+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾)=Vn+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾)
\sum_{i=1}^n(x_{1,i}-\overline{X_n}+\overline{X_n}-\overline{X_{n+m}})(y_{1,i}-\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_{n+m}}) +
(\overline{X_n}-\overline{X_{n+m}})\sum_{i=1}^n(y_{1,i}-\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_{n+m}}) +
(\overline{X_n}-\overline{X_{n+m}})(n\overline{Y_n}-n\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_{n+m}}) +
n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_n}+\overline{Y_n}-\overline{Y_{n+m}}) +
n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_n})+
(\overline{Y_n}-\overline{Y_{n+m}})\sum_{i=1}^n(x_{1,i}-\overline{X_n})+
n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}})\\
=\sum_{i=1}^n(x_{1,i}-\overline{X_n})(y_{1,i}-\overline{Y_n})+
n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}})\\
=V_n+n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}})
i=1∑n(x1,i−Xn+Xn−Xn+m)(y1,i−Yn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn+m)+(Xn−Xn+m)i=1∑n(y1,i−Yn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn+m)+(Xn−Xn+m)(nYn−nYn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn+m)+n(Xn−Xn+m)(Yn−Yn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn+Yn−Yn+m)+n(Xn−Xn+m)(Yn−Yn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn)+(Yn−Yn+m)i=1∑n(x1,i−Xn)+n(Xn−Xn+m)(Yn−Yn+m)=i=1∑n(x1,i−Xn)(y1,i−Yn)+n(Xn−Xn+m)(Yn−Yn+m)=Vn+n(Xn−Xn+m)(Yn−Yn+m) 同理,可将 Vn+mV_{n+m}Vn+m 表达式的第二项简化为
Vm+m(Xm‾−Xn+m‾)(Ym‾−Yn+m‾)
V_m+m(\overline{X_m}-\overline{X_{n+m}})(\overline{Y_m}-\overline{Y_{n+m}})
Vm+m(Xm−Xn+m)(Ym−Yn+m)
注意到 n(Xn‾−Xn+m‾)+m(Xm‾−Xn+m‾)=nXn‾+mXm‾−(n+m)Xn+m‾=0n(\overline{X_n}-\overline{X_{n+m}}) + m(\overline{X_m}-\overline{X_{n+m}}) = n\overline{X_n}+m\overline{X_m}-(n+m)\overline{X_{n+m}}=0n(Xn−Xn+m)+m(Xm−Xn+m)=nXn+mXm−(n+m)Xn+m=0
因此有 −n(Xn‾−Xn+m‾)=m(Xm‾−Xn+m‾)-n(\overline{X_n}-\overline{X_{n+m}})=m(\overline{X_m}-\overline{X_{n+m}})−n(Xn−Xn+m)=m(Xm−Xn+m)
于是 Vn+m=Vn+Vm+n(Xn‾−Xn+m‾)(Yn‾−Yn+m‾−Ym‾+Yn+m‾)=Vn+Vm+n(Xn‾−Xn+m‾)(Yn‾−Ym‾)
V_{n+m}=V_n+V_m+n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_{n+m}}-\overline{Y_m}+\overline{Y_{n+m}})\\
=V_n+V_m+n(\overline{X_n}-\overline{X_{n+m}})(\overline{Y_n}-\overline{Y_m})
Vn+m=Vn+Vm+n(Xn−Xn+m)(Yn−Yn+m−Ym+Yn+m)=Vn+Vm+n(Xn−Xn+m)(Yn−Ym)而 n(Xn‾−Xn+m‾)=nn+m((n+m)Xn‾−(n+m)Xn+m‾)=nn+m(nXn‾+mXn‾−∑i=1nx1,i−∑i=1mx2,i)=nn+m(∑i=1nx1,i+mXn‾−∑i=1nx1,i−mXm‾)=nmn+m(Xn‾−Xm‾)
n(\overline{X_n}-\overline{X_{n+m}})=\frac{n}{n+m}((n+m)\overline{X_n}-(n+m)\overline{X_{n+m}})\\
=\frac{n}{n+m}(n\overline{X_n}+m\overline{X_n}-\sum_{i=1}^n{x_{1,i}}-\sum_{i=1}^m{x_{2,i}})\\
=\frac{n}{n+m}(\sum_{i=1}^n{x_{1,i}}+m\overline{X_n}-\sum_{i=1}^n{x_{1,i}}-m\overline{X_m})\\
=\frac{nm}{n+m}(\overline{X_n}-\overline{X_m})
n(Xn−Xn+m)=n+mn((n+m)Xn−(n+m)Xn+m)=n+mn(nXn+mXn−i=1∑nx1,i−i=1∑mx2,i)=n+mn(i=1∑nx1,i+mXn−i=1∑nx1,i−mXm)=n+mnm(Xn−Xm) 于是 Vn+m=Vn+Vm+nmn+m(Xn‾−Xm‾)(Yn‾−Ym‾)
V_{n+m}=V_n+V_m+\frac{nm}{n+m}(\overline{X_n}-\overline{X_m})(\overline{Y_n}-\overline{Y_m})Vn+m=Vn+Vm+n+mnm(Xn−Xm)(Yn−Ym)