模型量化4-Int8量化1：映射fp32的张量（模型权重或模型输入）到int8张量【absmax量化（对称）、Zeropoint量化（非对称）】【零点量化比absmax稍好，但计算成本也更高】-优快云博客

本文链接：https://blog.youkuaiyun.com/u013250861/article/details/139884610

在这里插入图片描述

两种量化技术：

一种具有绝对最大 (absmax) 量化的对称技术
一种具有零点量化的非对称技术。在这两种情况下，

目标都是映射 FP32 张量 $\mathbf{X}$ （原始权重）到 INT8 张量 $\mathbf{X}_{\text{quant}}$ （量化权重）。

一、量化方法

选择对称量化还是非对称量化取决于具体的应用场景和数据分布特性。

对于数据分布大致对称且以0为中心的情况，对称量化可能是一个更简单且有效的选择。
而对于数据分布不均匀或明显偏离0的情况，非对称量化能够提供更好的量化精度，从而可能获得更好的模型性能。

1、absmax量化（对称）

对称量化是一种简单的量化方法，它将浮点数映射到等间距的整数上。在对称量化中，正数和负数具有相同的量化范围和步长（即量化的粒度）。这意味着0总是被量化为0（或者一个固定的整数值，如果整个量化范围进行了平移），量化操作是关于0对称的。如下图所示。
在这里插入图片描述

优点:实现简单，

容易理解。
对于以0为中心的数据分布效果较好。

缺点:

可能无法充分利用量化后数值的表示范围，尤其是当数据分布不是关于0对称时。

使用absmax 量化，原始数字除以张量的绝对最大值，并乘以缩放因子 (127)，以将输入映射到范围 [-127, 127]。

为了检索原始 FP16 值，将 INT8 数字除以量化因子，承认由于舍入而造成的一些精度损失。

$scale_factor = 127 max ⁡ ∣ X ∣ \text{scale\_factor}=\frac{127}{\max|\mathbf{X}|}$

$scale_factor ⋅ X ) X dequant = X quant scale_factor \begin{align*} &\mathbf{X}_{\text{quant}} = \text{round}\Biggl (\text{scale\_factor} \cdot \mathbf{X} \Biggr ) \\ &\mathbf{X}_{\text{dequant}} = \cfrac{\mathbf{X}_{\text{quant}}}{\text{scale\_factor}} \end{align*}$

例如，假设我们的绝对最大值为 3.2（训练好的权重可以容易知道，但是对输入值得用模拟数据动态采样获取）。

0.1 的权重将被量化为 $\text{round}(0.1 \times \cfrac{127}{3.2}) = \text{round}(3.96875) = 4$ 。如果我们想对其进行反量化，我们会得到 $\cfrac{127}{3.2} = 0.1008$ ，这意味着误差为 $0.1008 - 0.1 = 0.008$ 。
0.09 的权重将被量化为 $\text{round}(0.09 \times \cfrac{127}{3.2}) = \text{round}(3.571875) = 4$ 。如果我们想对其进行反量化，我们会得到 $\cfrac{127}{3.2} = 0.1008$ ，这意味着误差为 $0.1008 - 0.09 = 0.0108$ 。

下面是相应的 Python 实现：

import torch

def absmax_quantize(X):
    # Calculate scale
    scale = 127 / torch.max(torch.abs(X))

    # Quantize
    X_quant = (scale * X).round()

    # Dequantize
    X_dequant = X_quant / scale

    return X_quant.to(torch.int8), X_dequant

2、Zeropoint/零点量化（非对称）

非对称量化允许量化操作的最小值和最大值与0不对称，这意味着正数和负数可以有不同的量化范围和步长。在非对称量化中，数据的实际最小值和最大值会被用来确定量化参数，从而使得量化后的整数能够更加紧密地覆盖原始数据的范围。如下图所示。

在这里插入图片描述
优点:

能够更好地适应数据的实际分布，尤其是当数据分布不是关于0对称时。
可以更有效地使用量化后的数值范围，提高量化后的数据表示精度。

缺点:

实现相对复杂，需要存储额外的量化参数（如量化的最小值和最大值）。
计算过程可能稍微复杂一些，因为需要考虑量化的偏移。

通过零点量化，我们可以考虑不对称输入分布，例如，当您考虑 ReLU 函数的输出（仅正值）时，这非常有用。

输入值首先按值的总范围 (255) 除以最大值和最小值之差进行缩放。然后将该分布移动零点，将其映射到范围 [-128, 127]（注意与 absmax 相比的额外值）。首先，我们计算比例因子和零点值：

$scale_factor = 255 max ⁡ ( X ) − min ⁡ ( X ) zeropoint = − round ( scale_factor ⋅ min ⁡ ( X ) ) − 128 \begin{align*} &\text{scale\_factor} = \frac{255}{\max(\mathbf{X}) - \min(\mathbf{X})} \\ &\text{zeropoint} = - \text{round}(\text{scale\_factor} \cdot \min(\mathbf{X})) - 128 \end{align*}$

然后，我们可以使用这些变量来量化或反量化我们的权重：

$scale_factor ⋅ X + zeropoint ) X dequant = X quant − zeropoint scale_factor \begin{align*} &\mathbf{X}_{\text{quant}} = \text{round}\bigg(\text{scale\_factor} \cdot \mathbf{X} + \text{zeropoint} \bigg) \\ &\mathbf{X}_{\text{dequant}} = \frac{\mathbf{X}_{\text{quant}} - \text{zeropoint}}{\text{scale\_factor}} \end{align*}$

例如，假设我们的最大值为 3.2，最小值为 -3.0（训练好的权重可以容易知道，但是对输入值得用模拟数据动态采样获取）。

我们可以计算出 $scale_factor = 255 3.2 − ( − 3.0 ) = 41.13 \text{scale\_factor}=\cfrac{255}{3.2 - (-3.0)} = 41.13$ 、 $\text{zeropoint}=-\text{round}(41.13 \cdot *(-3.0)) - 128 = 123 - 128 = -5$ ，

所以我们之前的权重 0.1 将被量化为 $scale_factor ⋅ X + zeropoint ) = round ( 41.13 ⋅ 0.1 − 5 ) = − 1 \text{round}(\text{scale\_factor} \cdot \mathbf{X} + \text{zeropoint} )=\text{round}(41.13 \cdot 0.1 - 5) = -1$ 。这与之前使用 absmax 获得的值（4）有很大不同。反量化结果为： $scale_factor = − 1 − ( − 5 ) 41.13 = 0.09725 \cfrac{\mathbf{X}_{\text{quant}} - \text{zeropoint}}{\text{scale\_factor}}= \cfrac{-1 - (-5)}{41.13}=0.09725$ ，误差为 $0.1 - 0.09725 = 0.00275$

在这里插入图片描述

def zeropoint_quantize(X):
    # Calculate value range (denominator)
    x_range = torch.max(X) - torch.min(X)
    x_range = 1 if x_range == 0 else x_range

    # Calculate scale
    scale = 255 / x_range

    # Shift by zero-point
    zeropoint = (-scale * torch.min(X) - 128).round()

    # Scale and round the inputs
    X_quant = torch.clip((X * scale + zeropoint).round(), -128, 127)

    # Dequantize
    X_dequant = (X_quant - zeropoint) / scale

    return X_quant.to(torch.int8), X_dequant

二、代码

import numpy as np
from scipy.spatial.distance import cosine

# 固定随机种子
np.random.seed(42)

# 创建一个形状为(4, 6)的随机浮点数张量
tensor_float32 = np.random.rand(4, 6).astype(np.float32)

# 打印原始张量
print("原始张量:")
print(tensor_float32)

print("="*100)

# ==========================================================================
# 1. 对称量化（absmax）
# ==========================================================================
# 计算绝对最大值
abs_max = np.max(np.abs(tensor_float32))

# 量化参数
scale = 127 / abs_max  # 127是因为int8的正数范围是0到127

# 量化函数
def quantize_absmax(value, scale):
    return np.round(value * scale).astype(np.int8)

# 反量化函数
def dequantize_absmax(value, scale):
    return value / scale


# 量化张量
tensor_int8_absmax = quantize_absmax(tensor_float32, scale)
print("对称量化（absmax）后的张量")
print(tensor_int8_absmax)

# 反量化张量以计算误差
tensor_dequantized_absmax = dequantize_absmax(tensor_int8_absmax, scale)

# 计算误差
error_absmax = np.abs((tensor_float32 - tensor_dequantized_absmax)/tensor_float32)

# 打印量化误差
print("对称量化误差:")
print(error_absmax)

# 计算误差的均值和最大值
mean_error_absmax = np.mean(error_absmax)
max_error_absmax = np.max(error_absmax)

print("对称量化误差的均值:", mean_error_absmax)
print("对称量化误差的最大值:", max_error_absmax)

cosine_similarity_absmax = 1 - cosine(tensor_float32.flatten(), tensor_dequantized_absmax.flatten())
print("Cosine Similarity (AbsMax):", cosine_similarity_absmax)
print("="*100)


# ==========================================================================
# 2. 非对称量化（Zeropoint）
# ==========================================================================
# 计算最大值和最小值
max_val = np.max(tensor_float32)
min_val = np.min(tensor_float32)

# 量化参数
scale = 255 / (max_val - min_val)
zero_point = -np.round(scale * min_val).astype(np.int8) - 128

# 量化函数
def quantize_zeropoint(value, scale, zero_point):
    return np.round(scale * value + zero_point) 

# 反量化函数
def dequantize_zeropoint(value, scale, zero_point, min_val):
    return (value - zero_point) / scale

# 量化张量
tensor_int8_zeropoint = quantize_zeropoint(tensor_float32, scale, zero_point)
print("对称量化（absmax）后的张量")
print(tensor_int8_absmax)

# 反量化张量以计算误差
tensor_dequantized_zeropoint = dequantize_zeropoint(tensor_int8_zeropoint, scale, zero_point, min_val)

# 计算误差
error_zeropoint = np.abs((tensor_float32 - tensor_dequantized_zeropoint) / tensor_float32)

# 打印量化误差
print("非对称量化误差:")
print(error_zeropoint)

# 计算误差的均值和最大值
mean_error_zeropoint = np.mean(error_zeropoint)
max_error_zeropoint = np.max(error_zeropoint)

print("非对称量化误差的均值:", mean_error_zeropoint)
print("非对称量化误差的最大值:", max_error_zeropoint)



cosine_similarity_zeropoint = 1 - cosine(tensor_float32.flatten(), tensor_dequantized_zeropoint.flatten())


print("Cosine Similarity (Zeropoint):", cosine_similarity_zeropoint)

打印结果：

原始张量:
[[0.37454012 0.9507143  0.7319939  0.5986585  0.15601864 0.15599452]
 [0.05808361 0.8661761  0.601115   0.7080726  0.02058449 0.96990985]
 [0.83244264 0.21233912 0.18182497 0.1834045  0.30424225 0.52475643]
 [0.43194503 0.29122913 0.6118529  0.13949387 0.29214466 0.36636186]]
====================================================================================================
对称量化（absmax）后的张量
[[ 49 124  96  78  20  20]
 [  8 113  79  93   3 127]
 [109  28  24  24  40  69]
 [ 57  38  80  18  38  48]]
对称量化误差:
[[8.62218402e-04 3.90832411e-03 1.59329944e-03 4.95414379e-03  2.10034924e-02 2.08520879e-02]
 [5.18747141e-02 3.67763231e-03 3.68441800e-03 3.07361161e-03  1.13034713e-01 0.00000000e+00]
 [3.98603840e-07 7.06074937e-03 8.05765315e-03 6.24060281e-04  4.07952213e-03 4.19711207e-03]
 [7.79923860e-03 3.50199634e-03 1.44814496e-03 1.45263046e-02  6.62483686e-03 5.95703693e-04]]
对称量化误差的均值: 0.011959765636609115
对称量化误差的最大值: 0.11303471252593807
Cosine Similarity (AbsMax): 0.9999917065271415
====================================================================================================
非对称量化（Zeropoint）后的张量
[[ -33.  121.   63.   27.  -92.  -92.]
 [-118.   99.   27.   56. -128.  127.]
 [  90.  -77.  -85.  -85.  -52.    7.]
 [ -18.  -56.   30.  -97.  -56.  -36.]]
非对称量化误差:
[[0.00391725 0.00146097 0.00192137 0.00120173 0.00218448 0.00233947]
 [0.02551319 0.00143924 0.00288973 0.00103405 0.08514041 0.00180693]
 [0.00177129 0.00064513 0.00326912 0.00537135 0.00338869 0.00031372]
 [0.0002201  0.00290922 0.00213518 0.01253547 0.00603392 0.00415738]]
非对称量化误差的均值: 0.007233308
非对称量化误差的最大值: 0.08514041
Cosine Similarity (Zeropoint): 0.9999973773956299