记一次测试,测试环境centos 64位,便于以后学习
源码里将字符串对比分为2种情况,长度大于4和小于4:
- 长度小于4的情况按常规处理。
- 长度大于4则先对检测长度/4,并行优化,最后对不足4个的字符按常规处理。
每四个字符进行比较,与cache 的优化有关:cache line 的Data大小即为4字节。

cache 和内存的映射关系。
int
STRNCMP (const char *s1, const char *s2, size_t n)
{
unsigned char c1 = '\0';
unsigned char c2 = '\0';
if (n >= 4)
{
size_t n4 = n >> 2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
} while (--n4 > 0);
n &= 3;
}
while (n > 0)
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
n--;
}
return c1 - c2;
}
对比方法:比较50长度的字符串,分别执行100000次,对比结果:
real 0m0.001s
user 0m0.001s
sys 0m0.000s
real 0m0.006s
user 0m0.006s
sys 0m0.000s
可见,使用源码的方式对比速度快很多。
本文记录了一次在CentOS 64位环境下对strcmp函数的优化测试。优化策略是针对字符串长度大于4的情况,通过比较每四个字符来利用缓存行优化,提高效率。测试结果显示,优化后的strcmp在执行大量比较时速度显著提升。
566





