【RUY】tensorflow int8量化推理代码学习

原创

已于 2023-03-23 17:32:12 修改 · 825 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#RUY #int8 #量化 #推理 #HPC

于 2023-03-23 15:24:40 首次发布

本文详细介绍了ruy库的编译过程，通过运行和分析不同示例代码，如ExampleMulFloat、ExampleMulFloatWithBiasAddAndClamp等，展示了int8量化计算的原理，包括零点处理和量化乘法的实现细节。文章还探讨了源码中的矩阵运算优化策略，以及如何处理不同类型的输入数据。

0、写在前面

在这里插入图片描述
本来想学习gemm的相关代码，看一看tensorflow是怎么高效的进行int8推理，然后在这里看到most of gemmlowp is superseded by ruy，于是研究了一会儿ruy。

1、工程编译

将源码clone下来进行编译

git clone https://github.com/google/ruy.git
cd ruy/

mkdir build
cd build/
cmake ..

期间遇到报错

CMake Error at CMakeLists.txt:76 (message):
  This file does not exist:

  xxx/ruy/third_party/cpuinfo/CMakeLists.txt

  That typically means that the git submodules of the ruy repository haven't
  been checked out.

于是把git的submodule也下载下来

git submodule update --init
cmake ..
make -j8

编译成功

2、跑个demo

跑个例子看一下

./example/ruy_example_example 
Example Mul, float:
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
5 11 
11 25 

Example Mul, float with bias addition and clamp:
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
6 12 
11 15 

Example Mul, uint8 quantized with asymmetric zero points:
LHS:
124 125 
126 127 
RHS:
129 131 
130 132 
Result:
131 130 
126 129 

Example Mul, int8 quantized with per-channel multipliers
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
4 8 
2 4 

Example Mul, returning raw int32 accumulators:
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
5 11 
11 25 

Example Mul, int8 times int16 quantized with per-channel multipliers
LHS:
1 2 
3 4 
RHS:
1000 3000 
2000 4000 
Result:
3750 8250 
1719 3906

3、Example分析

3.1 ExampleMulFloat

测试代码

const float lhs_data[] = {
   
   1, 2, 3, 4};
const float rhs_data[] = {
   
   1, 2, 3, 4};
float dst_data[4];
  
ruy::Matrix<float> lhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<float> rhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<float> dst;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
  
ruy::MulParams<float, float> mul_params;
ruy::Mul(lhs, rhs, mul_params, context, &dst);
  
std::cout << "Example Mul, float:\n";
std::cout << "LHS:\n" << lhs;
std::cout << "RHS:\n" << rhs;
std::cout << "Result:\n" << dst << "\n";

运行结果

Example Mul, float:
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
5 11 
11 25

分析

$\left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \left[ \begin{matrix} 1 & 3 \\ 2 & 4 \end{matrix} \right] = \left[ \begin{matrix} 5 & 11 \\ 11 & 25 \end{matrix} \right]$

这个是浮点的计算，在我看来只要结果一直就可以，没什么我需要特别关注的。

3.2 ExampleMulFloatWithBiasAddAndClamp

测试代码

const float lhs_data[] = {
   
   1, 2, 3, 4};
const float rhs_data[] = {
   
   1, 2, 3, 4};
const float bias_data[] = {
   
   1, 0};
float dst_data[4];
  
ruy::Matrix<float> lhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<float> rhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<float> dst;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
  
ruy::MulParams<float, float> mul_params;
mul_params.set_bias(bias_data);
mul_params.set_clamp_min(0);
mul_params.set_clamp_max(15);
ruy::Mul(lhs, rhs, mul_params, context, &dst);
  
std::cout << "Example Mul, float with bias addition and clamp:\n";
std::cout << "LHS:\n" << lhs;
std::cout << "RHS:\n" << rhs;
std::cout << "Result:\n" << dst << "\n";

运行结果

Example Mul, float with bias addition and clamp:
LHS:
1 2 
3 4 
RHS:
1 3 
2 4 
Result:
6 12 
11 15

分析

$\left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \left[ \begin{matrix} 1 & 3 \\ 2 & 4 \end{matrix} \right] + \left[ \begin{matrix} 1 \\ 0 \end{matrix} \right] = \left[ \begin{matrix} 6 & 12 \\ 11 & 25 \end{matrix} \right]$