表达式模板,在《c++ template》中是为了同时支持数组操作的优雅性和高效性,并说,metaprogramming主要适用与小的大小固定的数组,而expression templates适用于中性和大型数组的运行期操作。但实际上,经过我的测试,元编程和表达式模板不像书中说的那么神奇,或许,是我没有实现好。下面,我将测试expression templates。
如果要支持如下的操作:
Array<double> x(1000), y(1000);
...
x = 1.2 * x + x * y;
数组Array必定要重载operator+ 和operator*,并返回一个临时的Array对象。上面代码有两点缺陷:
1、将会产生3个临时的数组对象,每个都是1000大小;
2、将会读6000个double变量,写4000个double变量;
从性能上来讲,如下代码具有很好的性能,但是丧失了代码的优雅性。
for ( int idx = 0; i < x.size(); ++ i ) {
x[idx] = 1.2*x[idx] + x[idx]*y[idx];
}
表达式模板的作用就是为了支持这种简单的代码写法并保持数组操作的高效性。源代码和测试代码都已经放在资源上了,以下是测试代码:
int CNT = 10000000;
//test sarray
Integer i3(3), i4(4), i10(10);
SArray<Integer> a(CNT), b(CNT);
SArray<Integer> const& af = a;
SArray<Integer> const& bf = b;
clock_t s, e;
s = clock();
for ( int i = 0; i < CNT; ++ i )
a[i] = i3 * af[i] + bf[i]*af[i] + af[i]*i4 + bf[i]*i10;
e = clock();
cout << "for time: " << e-s << "ms " << " addcnt: "
<< int_addcnt << " mulcnt: " << int_mulcnt
<< " readcnt: " << SArray<Integer>::readcnt
<< " writecnt: " << SArray<Integer>::writecnt
<< " temperary: " << SArray<Integer>::sarraycnt << endl;
int_addcnt = 0;
int_mulcnt = 0;
SArray<Integer>::sarraycnt = 0;
SArray<Integer>::readcnt = 0;
SArray<Integer>::writecnt = 0;
SArray<Integer> inta(CNT), intb(CNT);
s = clock();
inta = i3*inta + intb*inta + inta*i4 + intb*i10;
e = clock();
cout << "SArray time: " << e-s << "ms " << " addcnt: "
<< int_addcnt << " mulcnt: " << int_mulcnt
<< " readcnt: " << SArray<Integer>::readcnt
<< " writecnt: " << SArray<Integer>::writecnt
<< " temperary: " << SArray<Integer>::sarraycnt << endl;
int_addcnt = 0;
int_mulcnt = 0;
SArray<Integer>::sarraycnt = 0;
SArray<Integer>::readcnt = 0;
SArray<Integer>::writecnt = 0;
Array<Integer, SArray<Integer> > ia(CNT), ib(CNT);
s = clock();
ia = i3*ia + ib*ia + ia*i4 + ib*i10;
e = clock();
cout << "Array time: " << e-s << "ms " << " addcnt: "
<< int_addcnt << " mulcnt: " << int_mulcnt
<< " readcnt: " << SArray<Integer>::readcnt
<< " writecnt: " << SArray<Integer>::writecnt
<< " temperary: " << SArray<Integer>::sarraycnt << endl;
ubuntu11.04下运行结果:
for time: 900000ms addcnt: 30000000 mulcnt: 40000000 readcnt: 50000000 writecnt: 10000000 temperary: 2
SArray time: 3360000ms addcnt: 30000000 mulcnt: 40000000 readcnt: 110000000 writecnt: 70000000 temperary: 9
Array time: 14710000ms addcnt: 30000000 mulcnt: 40000000 readcnt: 50000000 writecnt: 10000000 temperary: 2
Win7下结果:
for time: 6637ms addcnt: 30000000 mulcnt: 40000000 readcnt: 50000000 writecnt: 10000000
temperary: 2
SArray time: 31399ms addcnt: 30000000 mulcnt: 40000000 readcnt: 110000000 writecnt:
70000000 temperary: 16
Array time: 174527ms addcnt: 30000000 mulcnt: 40000000 readcnt: 50000000 writecnt:
10000000 temperary: 2
可以看到,读和写的次数Array都比SArray小了很多,并且Array没有临时变量的产生,但最终的结果却是让我大跌眼镜,Array比SArray慢不止一两倍!我估计是Array采用了expression templates,虽然避免了临时数组的产生,并显著地减少了变量的访问,但是却带了一个问题:过多的函数调用导致运行时间反而增大。不知道我的分析有没有道理,期待高手的指点。
接下来的问题就是,expression templates在什么场景下才能发挥最佳的作用,难道10000000的数组还不够大么?冰天雪地360度裸体跪求真相。