[unsigned] char转化为int_unsigned char转int-优快云博客

本文链接：https://blog.youkuaiyun.com/tricky1997/article/details/7993672

本文通过几个具体的C语言代码示例，深入探讨了char和unsigned char类型在转换为int类型时的不同行为，以及这种差异如何影响printf函数的输出结果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题的引出

在订阅的某邮件列表里看见这样一个问题：

Hi, having coding in C for 3 years but I'm still not clear with this one.
Consider this code.
...
char *p;
unsigned int i = 0xcccccccc;
unsigned int j;
p = (char *) &i;
printf("%.2x %.2x %.2x %.2x\n", *p, p[1], p[2], p[3]);
memcpy(&j, p, sizeof(unsigned int));
printf("%x\n", j);
...
Output:
ffffffcc ffffffcc ffffffcc ffffffcc
0xcccccccc

My questions are:
1. Why it prints "ffffffcc ffffffcc ffffffcc ffffffcc"? (if p is
unsigned char* then it will print correctly "cc cc cc cc")
2. Why pointer to char p copied to j correctly, why not every member
in p overflow? since it is a signed char.

问题2我想不用多说了，我觉着没什么问题。

问题1是个不常用的printf特性，自己实验了一下：

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        char a = 0xcc;
        unsigned char b = 0xcc;
        char c = 0x0c;
        unsigned char d = 0x0c;
        printf("%.2x\n", a);
        printf("%.2x\n", b);
        printf("%.2x\n", c);
        printf("%.2x\n", d);

        char *p = &a;
        unsigned char *q = &a;
        printf("%.2x\n", *p);
        printf("%.2x\n", *q);

        p = &b;
        q = &b;
        printf("%.2x\n", *p);
        printf("%.2x\n", *q);

        exit(0);
}

结果如下：

gcc emacs.c && ./a.out
ffffffcc
cc
0c
0c
ffffffcc
cc
ffffffcc
cc

关键看前四行输出。单看结果，得到结论如下：

当以%.2x格式打印char时，当最高位是1（0xcc的2进制是10001000）时，会补1。

当打印unsigned char时，不会补最高位。

猜测一下背后原理

我想背后原理应该是：

当printf要输出%x时，会把参数转化为int的临时变量。char转化为int时，会扩展符号位。因此，输出临时变量int时，就出现了f。

为此又实验了以下程序：

#include <stdio.h>
#include <stdlib.h>

int main(void) {
        char c = 0xcc;
        unsigned char d = 0xcc;
        int a = c;
        int b = d;

	printf("%x\n", a);
        printf("%x\n", b);
        exit(0);
}

输出结果：

gcc emacs2.c && ./a.out
ffffffcc
cc

其相关部分汇编代码如下：

 8048435:       c6 44 24 1f cc          movb   $0xcc,0x1f(%esp)
 804843a:       c6 44 24 1e cc          movb   $0xcc,0x1e(%esp)
 804843f:       0f be 44 24 1f          movsbl 0x1f(%esp),%eax
 8048444:       89 44 24 18             mov    %eax,0x18(%esp)
 8048448:       0f b6 44 24 1e          movzbl 0x1e(%esp),%eax
 804844d:       89 44 24 14             mov    %eax,0x14(%esp)

可见当char转化为int时，使用了movsbl。而unsigned char转化为int时，使用了movzbl。

关于char/unsigned char向int转化时，只在K&R里找到这样一段说明：

There is one subtle point about the conversion of characters to integers. The language does not specify
whether variables of type char are signed or unsigned quantities. When a char is converted to an int, can
it ever produce a negative integer? The answer varies from machine to machine, reflecting differences in
architecture. On some machines a char whose leftmost bit is 1 will be converted to a negative integer (``sign
extension''). On others, a char is promoted to an int by adding zeros at the left end, and thus is always
positive.

查看免费的C99 standard时，没有找到明确的关于char到int的规定（有一个unsigned char/short转化为unsigned int的微妙讨论，不过跟这个问题无关）。

不知道收费的完整的C99 standard里，对这个问题有没有明确的规定。

最终结论

没有发现[unsigned] char转化为int时的明确规定，根据K&R描述可能是机器相关的。

在我的机器上，编译器在unsigned char和char转化为int时，会分别使用movzbl和movsbl填充高位。

转化为unsigned int也是一样的，因为汇编指令只当作mov[z|s]bl。