Intel sse instruction -- divsd

文章详细描述了在逆向工程中遇到的sse指令,特别是divsd指令的使用情况,解释了其功能(双精度浮点数除法),并提到了gcc的优化如何影响这些指令。作者还展示了如何在GDB中调试和观察xmm寄存器的内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

     逆向时碰到一段指令,出现了sse指令,是gcc优化成sse指令的结果。编译程序时,可以使用 -mno-sse 选项禁止使用sse编译优化。因没有源码,碰到sse指令,试试看了。几个指令在一起,分开来分析。

    347d:	66 0f ef c0          	pxor   %xmm0,%xmm0          // xmm0=0.0
    3481:	f2 48 0f 2a c0       	cvtsi2sd %rax,%xmm0         // xmm0=rax=45.0
    3486:	f2 0f 5e 44 24 48    	divsd  0x48(%rsp),%xmm0     // xmm0=xmm0/(rsp+0x48)=45.0/100.0=0.45
    348c:	f2 0f 5c c8          	subsd  %xmm0,%xmm1          // xmm1=xmm1-xmm0=29524.169999999998-0.45=29523.719999999998
    3490:	66 0f ef db          	pxor   %xmm3,%xmm3          // xmm3=0.0
    3494:	48 8d 45 01          	lea    0x1(%rbp),%rax       // rax=strchr_ret+1
    3498:	83 3d a5 5f 00 00 00 	cmpl   $0x0,0x5fa5(%rip) */       # 9444 <stderr@@GLIBC_2.2.5+0x24>   // 9444=0
                if (gvar_9444 == 0)  {
                
    349f:	48 89 44 24 08       	mov    %rax,0x8(%rsp)       // (rsp+0x8)=rax=strchr_ret+1
    34a4:	f2 0f 5f d9          	maxsd  %xmm1,%xmm3          // xmm3=xmm1=29523.719999999998
    34a8:	f2 0f 11 5c 24 48    	movsd  %xmm3,0x48(%rsp)     // (rsp+0x48)=29523.719999999998
    34ae:	0f 84 26 02 00 00    	je     36da <__sprintf_chk@plt+0xf6a>   // jump
                    // je 36da
                }

    3486:    f2 0f 5e 44 24 48        divsd  0x48(%rsp),%xmm0

divsd 指令:

        Divide scalar double precision floating-point values。

        目的寄存器低64位1个双精度浮点数除以源存储器低64位1个双精度浮点数,结果送入目的寄存器的低64位,高64位不变,内存变量不必对齐内存16字节。

gdb 调测:

=> 0x000055555555747d:  66 0f ef c0     pxor   %xmm0,%xmm0

(gdb) p/f $xmm0
$1 = {v4_float = {-2.33146828e-17, 7.05728912, 0, 0}, v2_double = {36522.519999999997, 0}, v16_int8 = {61, 10, 
    -41, -93, 80, -43, -31, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {2621, -23593, -10928, 16609, 0, 0, 0, 0}, 
  v4_int32 = {-2.33146828e-17, 7.05728912, 0, 0}, v2_int64 = {36522.519999999997, 0}, 
  uint128 = 1.70422279711280534059e-4932}

(gdb) n
0x0000555555557481 in ?? ()
=> 0x0000555555557481:  f2 48 0f 2a c0  cvtsi2sd %rax,%xmm0

(gdb) p/f $xmm0
$2 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = {0 <repeats 16 times>}, v8_int16 = {0, 0, 0, 0, 
    0, 0, 0, 0}, v4_int32 = {0, 0, 0, 0}, v2_int64 = {0, 0}, uint128 = 0}
(gdb) p/x $rax 
$3 = 0x2d

(gdb) n
0x0000555555557486 in ?? ()
=> 0x0000555555557486:  f2 0f 5e 44 24 48       divsd  0x48(%rsp),%xmm0

(gdb) p/f $xmm0
$4 = {v4_float = {0, 3.1015625, 0, 0}, v2_double = {45, 0}, v16_int8 = {0, 0, 0, 0, 0, -128, 70, 64, 0, 0, 0, 
    0, 0, 0, 0, 0}, v8_int16 = {0, 0, -32768, 16454, 0, 0, 0, 0}, v4_int32 = {0, 3.1015625, 0, 0}, v2_int64 = {
    45, 0}, uint128 = 1.68828510035211006466e-4932}
(gdb) x/fg $rsp+0x48
0x7fffffff9ad8: 100

(gdb) n
0x000055555555748c in ?? ()
=> 0x000055555555748c:  f2 0f 5c c8     subsd  %xmm0,%xmm1

(gdb) p/f $xmm0     
$5 = {v4_float = {-107374184, 1.7249999, 0, 0}, v2_double = {0.45000000000000001, 0}, v16_int8 = {-51, -52, 
    -52, -52, -52, -52, -36, 63, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-13107, -13108, -13108, 16348, 0, 0, 0, 
    0}, v4_int32 = {-107374184, 1.7249999, 0, 0}, v2_int64 = {0.45000000000000001, 0}, 
  uint128 = 1.67743993732028180901e-4932}

(gdb) p/f $xmm1
$6 = {v4_float = {-2.33146828e-17, 7.05728912, 0, 0}, v2_double = {36522.519999999997, 0}, v16_int8 = {61, 10, 
    -41, -93, 80, -43, -31, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {2621, -23593, -10928, 16609, 0, 0, 0, 0}, 
  v4_int32 = {-2.33146828e-17, 7.05728912, 0, 0}, v2_int64 = {36522.519999999997, 0}, 
  uint128 = 1.70422279711280534059e-4932}

(gdb) n
0x0000555555557490 in ?? ()
=> 0x0000555555557490:  66 0f ef db     pxor   %xmm3,%xmm3

(gdb) p/f $xmm0
$7 = {v4_float = {-107374184, 1.7249999, 0, 0}, v2_double = {0.45000000000000001, 0}, v16_int8 = {-51, -52, 
    -52, -52, -52, -52, -36, 63, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-13107, -13108, -13108, 16348, 0, 0, 0, 
    0}, v4_int32 = {-107374184, 1.7249999, 0, 0}, v2_int64 = {0.45000000000000001, 0}, 
  uint128 = 1.67743993732028180901e-4932}

(gdb) p/f $xmm1
$8 = {v4_float = {0.0587499999, 7.05728245, 0, 0}, v2_double = {36522.07, 0}, v16_int8 = {-41, -93, 112, 61, 
    66, -43, -31, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-23593, 15728, -10942, 16609, 0, 0, 0, 0}, 
  v4_int32 = {0.0587499999, 7.05728245, 0, 0}, v2_int64 = {36522.07, 0}, uint128 = 1.7042227745681469421e-4932}

(gdb) n
0x0000555555557494 in ?? ()
=> 0x0000555555557494:  48 8d 45 01     lea    0x1(%rbp),%rax

(gdb) p/f $xmm3
$9 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = {0 <repeats 16 times>}, v8_int16 = {0, 0, 0, 0, 
    0, 0, 0, 0}, v4_int32 = {0, 0, 0, 0}, v2_int64 = {0, 0}, uint128 = 0}

(gdb) n
0x0000555555557498 in ?? ()
=> 0x0000555555557498:  83 3d a5 5f 00 00 00    cmpl   $0x0,0x5fa5(%rip)        # 0x55555555d444
(gdb) 
0x000055555555749f in ?? ()
=> 0x000055555555749f:  48 89 44 24 08  mov    %rax,0x8(%rsp)
(gdb) 
0x00005555555574a4 in ?? ()
=> 0x00005555555574a4:  f2 0f 5f d9     maxsd  %xmm1,%xmm3

(gdb) p/f $xmm1
$10 = {v4_float = {0.0587499999, 7.05728245, 0, 0}, v2_double = {36522.07, 0}, v16_int8 = {-41, -93, 112, 61, 
    66, -43, -31, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-23593, 15728, -10942, 16609, 0, 0, 0, 0}, 
  v4_int32 = {0.0587499999, 7.05728245, 0, 0}, v2_int64 = {36522.07, 0}, uint128 = 1.7042227745681469421e-4932}

(gdb) p/f $xmm3
$11 = {v4_float = {0, 0, 0, 0}, v2_double = {0, 0}, v16_int8 = {0 <repeats 16 times>}, v8_int16 = {0, 0, 0, 0, 
    0, 0, 0, 0}, v4_int32 = {0, 0, 0, 0}, v2_int64 = {0, 0}, uint128 = 0}

(gdb) n
0x00005555555574a8 in ?? ()
=> 0x00005555555574a8:  f2 0f 11 5c 24 48       movsd  %xmm3,0x48(%rsp)

(gdb) p/f $xmm3
$12 = {v4_float = {0.0587499999, 7.05728245, 0, 0}, v2_double = {36522.07, 0}, v16_int8 = {-41, -93, 112, 61, 
    66, -43, -31, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {-23593, 15728, -10942, 16609, 0, 0, 0, 0}, 
  v4_int32 = {0.0587499999, 7.05728245, 0, 0}, v2_int64 = {36522.07, 0}, uint128 = 1.7042227745681469421e-4932}

(gdb) n
0x00005555555574ae in ?? ()
=> 0x00005555555574ae:  0f 84 26 02 00 00       je     0x5555555576da

(gdb) x/fg $rsp+0x48
0x7fffffff9ad8: 36522.07

结论:

        divsd  除法指令,重点关注一下xmm寄存器中的 v2_double 字段,先除后减再把值传到内存,跟着稍微验算一下就能弄清指令的特性。

        但还没有搞清楚这段sse指令的目的为何。

### 使用SSE指令集实现浮点数或整数的除法运算 SSE(Streaming SIMD Extensions)是一组由Intel推出的SIMD(Single Instruction Multiple Data)指令集,能够在一个指令周期内对多个数据执行相同的操作。它支持多种类型的数值运算,包括加法、减法、乘法以及除法等[^2]。 #### 浮点数除法 对于单精度浮点数的除法操作,可以使用 `divss` 或 `divps` 指令来完成: - **`DIVSS xmm1, xmm2/m32`**: 对两个单精度浮点数进行逐元素相除,并返回结果到目标寄存器中。 - **`DIVPS xmm1, xmm2/m128`**: 对两组打包的单精度浮点数组进行逐元素相除,并存储结果。 以下是通过汇编代码展示如何利用 `divss` 实现单个浮点数的除法计算: ```assembly movss xmm0, [operand1] ; 将第一个操作数加载到 XMM0 中 movss xmm1, [operand2] ; 将第二个操作数加载到 XMM1 中 divss xmm0, xmm1 ; 计算 XMM0 / XMM1 的商并存储回 XMM0 ``` 如果要处理双精度浮点数,则应改用对应的 `divsd` 和 `divpd` 指令- **`DIVSD xmm1, xmm2/m64`** - **`DIVPD xmm1, xmm2/m128`** 例如,以下是一个简单的例子演示双精度浮点数之间的除法过程: ```assembly movsd xmm0, qword ptr [numerator] ; 加载分子至 XMM0 movsd xmm1, qword ptr [denominator] ; 加载分母至 XMM1 divsd xmm0, xmm1 ; 执行 XMM0 /= XMM1 并更新 XMM0 值 ``` #### 整数除法 需要注意的是,原始版本的 SSE 不直接提供针对整型数据的除法功能;然而可以通过一些间接手段或者借助后续扩展版如 SSE4 来达成目的。通常情况下,程序员更倾向于采用通用 CPU 指令来进行此类任务因为效率更高且更为直观简单[^3]。 当确实需要运用 SIMD 技术加速大量同类型整数间的比例变换时,可能考虑先转换成固定点表示形式然后再实施移位等方式模拟得到近似的商值效果——但这已经超出了单纯讨论 SSE 能力范围之外的内容了。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值