问题的引出
在订阅的某邮件列表里看见这样一个问题:
Hi, having coding in C for 3 years but I'm still not clear with this one.
Consider this code.
...
char *p;
unsigned int i = 0xcccccccc;
unsigned int j;
p = (char *) &i;
printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);
memcpy(&j, p, sizeof(unsigned int));
printf("%x\n", j);
...
Output:
ffffffcc ffffffcc ffffffcc ffffffcc
0xcccccccc
My questions are:
1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is
unsigned char* then it will print correctly "cc cc cc cc")
2. Why pointer to char p copied to j correctly, why not every memberin p overflow? since it is a signed char.
问题2我想不用多说了,我觉着没什么问题。
问题1是个不常用的printf特性,自己实验了一下:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char a = 0xcc;
unsigned char b = 0xcc;
char c = 0x0c;
unsigned char d = 0x0c;
printf("%.2x\n", a);
printf("%.2x\n", b);
printf("%.2x\n", c);
printf("%.2x\n", d);
char *p = &a;
unsigned char *q = &a;
printf("%.2x\n", *p);
printf("%.2x\n", *q);
p = &b;
q = &b;
printf("%.2x\n", *p);
printf("%.2x\n", *q);
exit(0);
}
结果如下:
gcc emacs.c && ./a.out
ffffffcc
cc
0c
0c
ffffffcc
cc
ffffffcc
cc
关键看前四行输出。单看结果,得到结论如下:
当以%.2x格式打印char时,当最高位是1(0xcc的2进制是10001000)时,会补1。
当打印unsigned char时,不会补最高位。
猜测一下背后原理
我想背后原理应该是:
当printf要输出%x时,会把参数转化为int的临时变量。char转化为int时,会扩展符号位。因此,输出临时变量int时,就出现了f。
为此又实验了以下程序:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char c = 0xcc;
unsigned char d = 0xcc;
int a = c;
int b = d;
printf("%x\n", a);
printf("%x\n", b);
exit(0);
}
输出结果:
gcc emacs2.c && ./a.out
ffffffcc
cc
其相关部分汇编代码如下: 8048435: c6 44 24 1f cc movb $0xcc,0x1f(%esp)
804843a: c6 44 24 1e cc movb $0xcc,0x1e(%esp)
804843f: 0f be 44 24 1f movsbl 0x1f(%esp),%eax
8048444: 89 44 24 18 mov %eax,0x18(%esp)
8048448: 0f b6 44 24 1e movzbl 0x1e(%esp),%eax
804844d: 89 44 24 14 mov %eax,0x14(%esp)
可见当char转化为int时,使用了movsbl。而unsigned char转化为int时,使用了movzbl。
关于char/unsigned char向int转化时,只在K&R里找到这样一段说明:
There is one subtle point about the conversion of characters to integers. The language does not specify
whether variables of type char are signed or unsigned quantities. When a char is converted to an int, can
it ever produce a negative integer? The answer varies from machine to machine, reflecting differences in
architecture. On some machines a char whose leftmost bit is 1 will be converted to a negative integer (``sign
extension''). On others, a char is promoted to an int by adding zeros at the left end, and thus is always
positive.
查看免费的C99 standard时,没有找到明确的关于char到int的规定(有一个unsigned char/short转化为unsigned int的微妙讨论,不过跟这个问题无关)。
不知道收费的完整的C99 standard里,对这个问题有没有明确的规定。
最终结论
没有发现[unsigned] char转化为int时的明确规定,根据K&R描述可能是机器相关的。
在我的机器上,编译器在unsigned char和char转化为int时,会分别使用movzbl和movsbl填充高位。
转化为unsigned int也是一样的,因为汇编指令只当作mov[z|s]bl。