尝试进行stable diffusion静态量化,当前用了onnx runtime进行处理。出现了: Load model from /tmp/ort.quant.x__uutdn/augmented_model.onnx failed:Protobuf parsing failed问题。不断尝试用几种AI大模型进行询问,建议了
1)检查onnx模型;
2)检查输入数据的类型和名字和模型是否匹配;
3)减少输入参数个数;
4)升级onnx,onnxruntime版本,升级protobuf版本4.25.3->6.31.1;
5)增添了extra_options={“external_data_format”: True}有点接近,但还不是正确答案。
个人也尝试了
1)直接量化decoder模型,是可以成功量化的;
2)转成onnx模型的时候修改opt_version,没有变化;
3)尝试onnxsim,onnxsim_large_model,有一些其他问题;
4)最终发现>2GB的onnx模型会被打散成小的模型,在进行量化的时候使用use_external_data_format=True就能正常运行了。
原始情况
from onnxruntime.quantization import CalibrationDataReader, quantize_static, QuantType, QuantFormat
import torch
import numpy as np
import onnx
model_fp32 = "/home/xxx/ProjectStableDiffusion/stable-diffusion-pytorch/data_check_onnx/diffusion.onnx"
model_quant = "/home/xxx/ProjectStableDiffusion/stable-diffusion-pytorch/data_check_onnx/quantized_diffusion.onnx"
str_list = ["a"]
calibration_dataset=[]
def evaluate_diffusion():
for j,str_name in enumerate(str_list):
context = torch.load("data_check/context" + str_name + ".pt")
print(j,str_name)
for i in range(1):
input_latents = torch.load("data_check/input_latents_before" + str_name + str(i) + ".pt")
time_embedding = torch.load("data_check/time_embedding" + str_name + str(i) + ".pt")
calibration_dataset.append({"input_latents": input_latents.cpu().numpy(),
"context": context.cpu().numpy(),
"time_embedding": time_embedding.cpu().numpy()
})
print(i,"finished")
class MultiInputDataReader(CalibrationDataReader):
def __init__(self, dataset):
self.dataset = iter(dataset)
def get_next(self):
return next(self.dataset, None)
onnx_model = onnx.load(model_fp32)
onnx.checker.check_model(model_fp32, full_check=True)
for input in onnx_model.graph.input:
print(f"Input Name: {input.name}, Shape: {input.type.tensor_type.shape.dim}, Type: {input.type.tensor_type.elem_type}")
del onnx_model
evaluate_diffusion()
data_reader = MultiInputDataReader(calibration_dataset)
for key, value in calibration_dataset[0].items():
print(f"Key '{key}' 的数组形状是: {value.shape}")
quantize_static(
model_input=model_fp32,
model_output=model_quant,
calibration_data_reader=data_reader,
#quant_format=QuantFormat.QOperator, # 量化格式(QOperator 或 QDQ)
quant_format=QuantFormat.QDQ, # 量化格式(QOperator 或 QDQ)
activation_type=QuantType.QUInt8, # 激活值量化类型(INT8/UINT8)
weight_type=QuantType.QUInt8 # 权重量化类型
)
以上代码出现如下错误,请分析应该如何解决
Exception has occurred: InvalidProtobuf
[ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/ort.quant.x__uutdn/augmented_model.onnx failed:Protobuf parsing failed.
File “/home/xxx/ProjectStableDiffusion/stable-diffusion-pytorch/quantize_onnx_diffusion.py”, line 44, in
model_input=model_fp32,
model_output=model_quant,
calibration_data_reader=data_reader,
#quant_format=QuantFormat.QOperator, # 量化格式(QOperator 或 QDQ)
quant_format=QuantFormat.QDQ, # 量化格式(QOperator 或 QDQ)
activation_type=QuantType.QUInt8, # 激活值量化类型(INT8/UINT8)
weight_type=QuantType.QUInt8 # 权重量化类型
)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/ort.quant.x__uutdn/augmented_model.onnx failed:Protobuf parsing failed.
关键修改
quantize_static(
model_input=model_fp32,
model_output=model_quant,
calibration_data_reader=data_reader,
#quant_format=QuantFormat.QOperator, # 量化格式(QOperator 或 QDQ)
quant_format=QuantFormat.QDQ, # 量化格式(QOperator 或 QDQ)
activation_type=QuantType.QUInt8, # 激活值量化类型(INT8/UINT8)
weight_type=QuantType.QUInt8, # 权重量化类型
use_external_data_format=True,
)
2701

被折叠的 条评论
为什么被折叠?



