字符串转化为数值_1000 使用strtod转换为1-优快云博客

本文链接：https://blog.youkuaiyun.com/gogdizzy/article/details/6833892

本文通过对比分析，揭示了在处理整数转换时，手工编写的代码相较于使用库函数（如sscanf、strtol、atoi等）在性能上的优势。特别是当目标是仅转化整数时，手写代码表现得更为高效。同时，文章还深入探讨了编译器优化策略及其对特定汇编指令的影响，展示了编译器在优化过程中的强大能力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

工作时遇到这样的一个情况：服务器A向B发送请求并接受结果，耗时300ms，服务器B从接受请求到发送完毕，耗时100ms。由于是内网，不可能有200ms的网络延迟。后经检测，是因为传输时用的是字符串格式，解析时用了sscanf造成的。

由于sscanf要考虑到变参问题，并且接受的种类更加丰富（相对于strto*函数族），所以会很慢，之前我以为scanf函数族慢是因为IO，现在看来当时的看法很幼稚。经过测试对比，发现还是手工编写的有针对性的代码速度更快。测试代码如下（其中测时类Timer是我自己定义的，用的rdtsc取cpu的cycle）

#include <stdio.h>
#include <stdlib.h>

#include "utility.d/timing.d/timing.h"

#define N 30000

int table[256];
int* ptable = table + 128;

void
init_table()
{
	for( char x = -128; x < '0'; ++x ) ptable[x] = -1;
	for( char x = '0'; x <= '9'; ++x ) ptable[x] = x - '0';
	for( char x = '9' + 1; x > 0; ++x ) ptable[x] = -1;
}

#define str2i_table( str, rst ) \
	do{ \
		rst = 0; \
		char* __s = str; \
		int  __x = (*__s) == '-' ? ( ++__s, 1 ) : \
				   (*__s) == '+' ? ( ++__s, 0 ) : 0; \
		while( ptable[*__s] >= 0 ) rst = rst * 10 + ptable[*__s++]; \
		if( __x ) rst = -rst; \
	}while(0)

inline int
str2l( char* str )
{
	int sign = (*str) == '-' ? ( ++str, 1 ) :
			   (*str) == '+' ? ( ++str, 0 ) : 0;
	int rst = 0;
	//while( '0' <= *str && *str <= '9' ) rst = (rst<<3) + (rst<<1) + ( *str++ - '0' );
	while( '0' <= *str && *str <= '9' ) rst = rst * 10 + ( *str++ - '0' );
	if( sign ) rst = -rst;
	return rst;
}

int main()
{
	init_table();
	_UtilitY_::Timer  timer;
	char str[] = "\"count\":\"-123456\"";
	int  arr[N];
	timer.start();
	for( size_t i = 0; i < N; ++i ) sscanf( str + 9, "%d", arr + i );
	timer.stop();
	printf( "sscanf cost : %lld %d\n", timer.get_ticks(), arr[0] );

	timer.start();
	for( size_t i = 0; i < N; ++i ) arr[i] = strtol( str + 9, NULL, 10 );
	timer.stop();
	printf( "strtol cost : %lld %d\n", timer.get_ticks(), arr[0] );

	timer.start();
	for( size_t i = 0; i < N; ++i ) arr[i] = atoi( str + 9 );
	timer.stop();
	printf( "atoi cost : %lld %d\n", timer.get_ticks(), arr[0] );

	timer.start();
	for( size_t i = 0; i < N; ++i ) arr[i] = str2l( str + 9 );
	timer.stop();
	printf( "str2l cost : %lld %d\n", timer.get_ticks(), arr[0] );

	timer.start();
	for( size_t i = 0; i < N; ++i ) str2i_table( str + 9, arr[i] );
	timer.stop();
	printf( "str2i_table cost : %lld %d\n", timer.get_ticks(), arr[0] );

	return 0;
}

开优化 -O2时，输出如下：

sscanf cost : 13705800 -123456
strtol cost : 2819204 -123456
atoi cost : 2692044 -123456
str2l cost : 735160 -123456
str2i_table cost : 828984 -123456

可以看到，只转化整数，还是手工会快很多。但是查表法str2i_table还是稍慢于直接转化str2l的。

另外我在想能否将rst*10优化，如上面我注释掉的代码，将一个乘法转化为两个移位与一个加法，结果出乎意料，用这个方法反而更慢了。看来编译器能做的优化远远超出我的想象啊。于是我将其编译成汇编代码，看了一下区别。

rst = rst * 10 + ( *str++ - '0' );
 16     leal    (%rcx,%rcx,4), %eax                   
            # %rcx = rst, 所以这一句是 %eax = rst + rst * 4 = rst * 5
 17     movsbl  %dl,%edx
            # %dl = *str, 将其按有符号扩展到%edx
 18     leal    -48(%rdx,%rax,2), %ecx
            # %ecx (存放rst) = %rdx ( 存放*str ) + %rax ( 存放rst的5倍 ) * 2 - 48 ( 这是'0' )

rst = (rst<<3) + (rst<<1) + ( *str++ - '0' );
 16     leal    (%rcx,%rcx), %eax                     
            # %rcx = rst, 这一句是 %eax = %rcx + %rcx = rst * 2
 17     movsbl  %dl,%edx                              
            # %dl = *str, 将其按有符号扩展到%edx
 18     leal    -48(%rax,%rcx,8), %eax                
            # %eax = %rax + %rcx * 8 - 48 = rst * 2 + rst * 8 - '0' = rst * 10 - '0'
 19     leal    (%rax,%rdx), %ecx                     
            # %ecx (存放rst) = %rax ( rst * 10 - '0') + %rdx ( 存放符号扩展的*str )

多出了一条lea指令，所以会变慢。

lea和mov的opcode都是一个字节，区别就是

leal (ebx), eax <==> movl ebx, eax

但是由于lea指令是加载有效地址，里面按照Base+Index*Scale+Displacement，一条指令可以直接计算两个加法和一个乘法，可以用这个特点做一些算术运算，当然，要求乘法因子Scale必须是2的幂。例如：