// Quantization parameters, determining the mapping of quantized values
// to real values (i.e. determining how quantized values are mathematically
// interpreted).
//
// The correspondence is as follows:
//
// real_value = scale * (quantized_value - zero_point);
//
// In other words, zero_point designates which quantized value corresponds to
// the real 0 value, and scale designates the difference between the real values
// corresponding to consecutive quantized values differing by 1.
struct QuantizationParams {
int32 zero_point = 0;
double scale = 0.;
};
inline void FullyConnected(const uint8* input_data, const Dims<4>& input_dims,
int32 input_offset, const uint8* filter_data,
const Dims<4>& filter_dims, int32 filter_offset,
const int32* bias_data, const Dims<4>& bias_dims,
int32 output_offset, int32 output_multiplier,
int output_shift, int32 output_activation_min,
int32 output_activation_max, uint8* output_data,
const Dims<4>& output_dims,
gemmlowp::GemmContext* gemm_context) {
(void)gemm_context;

本文深入探讨了TensorFlow的量化技术,详细介绍了QuantizationParams、FullyConnected等核心概念,以及GetOrComputeMinMax和quantize_v2等关键函数的使用。通过解析op_def_library.py中的相关代码,揭示了TensorFlow量化API的工作原理。
最低0.47元/天 解锁文章

818

被折叠的 条评论
为什么被折叠?



