1. strcmp
strcmp 用于比较两个字符串是否相等。内部实现与我们最初的想法一致,就是循环遍历两个字符串,遇到不相等的那一位即返回差值。
int
strcmp (p1, p2)
const char *p1;
const char *p2;
{
register const unsigned char *s1 = (const unsigned char *) p1;
register const unsigned char *s2 = (const unsigned char *) p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0')
return c1 - c2;
}
while (c1 == c2);
return c1 - c2;
} 这里用到了 register 修饰符。register 修饰符意思是将此变量的值存放到寄存器中。除此之外,也没做别的特殊优化。
2. strncmp
此函数功能与 strcmp 类似,也是比较两个字符串的大小,不同的是只比较前 n 个字符,相对于之前的版本来说更加安全。
对于此函数,最简单的想法就是按字符逐个比较,直到遇到字符串结束。
while (n > 0)
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
n--;
}
以下是 strncmp 的 glibc 的实现。
/* Compare no more than N characters of S1 and S2,
returning less than, equal to or greater than zero
if S1 is lexicographically less than, equal to or
greater than S2. */
int
strncmp (const char *s1, const char *s2, size_t n)
{
unsigned char c1 = '\0';
unsigned char c2 = '\0';
if (n >= 4)
{
size_t n4 = n >> 2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
} while (--n4 > 0);
n &= 3;
}
while (n > 0)
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
n--;
}
return c1 - c2;
}
算法的重点在那个大大的 “do” 循环里面。明明一个小循环就能搞定的,为什么要把同样的四行代码 copy 四遍呢?
这应该与CPU处理器进行流水线操作时的分支预测有关系,这么做可以让其每四次才经历一次分支预测。下表是两个函数分别运行 10w 次的运行结果。
| 字符串长度 | strncmp 运行时间(ms) | my_strncmp 运行时间(ms) |
|---|---|---|
| 5 | 2.62 | 3.10 |
| 20 | 8.17 | 10.98 |
| 100 | 38.70 | 45.36 |
| 200 | 75.55 | 88.62 |
从上表可以看出,对于预测流水线的优化有一定效果,但是优化效果在10%~20%左右。
本文详细解析了C库中strcmp和strncmp函数的工作原理及内部优化策略,通过对比实验证明了针对特定场景的优化能显著提升性能。
284

被折叠的 条评论
为什么被折叠?



