0、写在前面

本来想学习gemm的相关代码,看一看tensorflow是怎么高效的进行int8推理,然后在这里看到most of gemmlowp is superseded by ruy,于是研究了一会儿ruy。
1、工程编译
将源码clone下来进行编译
git clone https://github.com/google/ruy.git
cd ruy/
mkdir build
cd build/
cmake ..
期间遇到报错
CMake Error at CMakeLists.txt:76 (message):
This file does not exist:
xxx/ruy/third_party/cpuinfo/CMakeLists.txt
That typically means that the git submodules of the ruy repository haven't
been checked out.
于是把git的submodule也下载下来
git submodule update --init
cmake ..
make -j8
编译成功
2、跑个demo
跑个例子看一下
./example/ruy_example_example
Example Mul, float:
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
5 11
11 25
Example Mul, float with bias addition and clamp:
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
6 12
11 15
Example Mul, uint8 quantized with asymmetric zero points:
LHS:
124 125
126 127
RHS:
129 131
130 132
Result:
131 130
126 129
Example Mul, int8 quantized with per-channel multipliers
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
4 8
2 4
Example Mul, returning raw int32 accumulators:
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
5 11
11 25
Example Mul, int8 times int16 quantized with per-channel multipliers
LHS:
1 2
3 4
RHS:
1000 3000
2000 4000
Result:
3750 8250
1719 3906
3、Example分析
3.1 ExampleMulFloat
测试代码
const float lhs_data[] = {
1, 2, 3, 4};
const float rhs_data[] = {
1, 2, 3, 4};
float dst_data[4];
ruy::Matrix<float> lhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<float> rhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<float> dst;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
ruy::MulParams<float, float> mul_params;
ruy::Mul(lhs, rhs, mul_params, context, &dst);
std::cout << "Example Mul, float:\n";
std::cout << "LHS:\n" << lhs;
std::cout << "RHS:\n" << rhs;
std::cout << "Result:\n" << dst << "\n";
运行结果
Example Mul, float:
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
5 11
11 25
分析
[ 1 2 3 4 ] [ 1 3 2 4 ] = [ 5 11 11 25 ] \left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \left[ \begin{matrix} 1 & 3 \\ 2 & 4 \end{matrix} \right] = \left[ \begin{matrix} 5 & 11 \\ 11 & 25 \end{matrix} \right] [1324][1234]=[5111125]
这个是浮点的计算,在我看来只要结果一直就可以,没什么我需要特别关注的。
3.2 ExampleMulFloatWithBiasAddAndClamp
测试代码
const float lhs_data[] = {
1, 2, 3, 4};
const float rhs_data[] = {
1, 2, 3, 4};
const float bias_data[] = {
1, 0};
float dst_data[4];
ruy::Matrix<float> lhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<float> rhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<float> dst;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
ruy::MulParams<float, float> mul_params;
mul_params.set_bias(bias_data);
mul_params.set_clamp_min(0);
mul_params.set_clamp_max(15);
ruy::Mul(lhs, rhs, mul_params, context, &dst);
std::cout << "Example Mul, float with bias addition and clamp:\n";
std::cout << "LHS:\n" << lhs;
std::cout << "RHS:\n" << rhs;
std::cout << "Result:\n" << dst << "\n";
运行结果
Example Mul, float with bias addition and clamp:
LHS:
1 2
3 4
RHS:
1 3
2 4
Result:
6 12
11 15
分析
[ 1 2 3 4 ] [ 1 3 2 4 ] + [ 1 0 ] = [ 6 12 11 25 ] \left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \left[ \begin{matrix} 1 & 3 \\ 2 & 4 \end{matrix} \right] + \left[ \begin{matrix} 1 \\ 0 \end{matrix} \right] = \left[ \begin{matrix} 6 & 12 \\ 11 & 25 \end{matrix} \right] [1324][1234]+[10]=[

本文详细介绍了ruy库的编译过程,通过运行和分析不同示例代码,如ExampleMulFloat、ExampleMulFloatWithBiasAddAndClamp等,展示了int8量化计算的原理,包括零点处理和量化乘法的实现细节。文章还探讨了源码中的矩阵运算优化策略,以及如何处理不同类型的输入数据。
最低0.47元/天 解锁文章
2604

被折叠的 条评论
为什么被折叠?



