arm A53 指令优化指南

natural assembly

  • no register dependency, no penalty
ld1     {v0.4s}, [r0], #16
fmla    v10.4s, v16.4s, v24.s[0]
fmla    v11.4s, v16.4s, v24.s[1]
fmla    v12.4s, v16.4s, v24.s[2]
fmla    v13.4s, v16.4s, v24.s[3]

A53

  • 128bit vector load cannot be dual issued with fmla, wait 2 cycles
  • 64bit vector load cannot be dual issued with fmla, wait 1 cycle
  • 64bit integer load can be dual issued with fmla, no penalty
  • pointer update can be dual issued with fmla, no penalty
  • 64bit vector load and 64bit vector insert can be dual issued, no penalty
  • any vector load cannot be issued on the 4th cycle of each fmla (enters the accumulator pipeline)

practical guide

  • use 64bit vector load only
  • issue vector load every three fmla
  • 1 cycle to load 64bit, dual issue with the previous interleaved 64bit insert
  • load the remaining 64bit into integer register,
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值