模型的python版本推理效果很好,但是导出onnx并用tensorrt解析转换(做了FP16量化)成engine后推理效果稀烂的,输出无数个得分很低且不正确的目标,判断可能是做了FP16量化后模型内部某些地方数据因为精度不够发生了溢出,仔细检查代码,将模型网络里高度可疑的模块及其出入关联部分强制不做FP16量化:
std::set<nvinfer1::ILayer*> backwardLayers;
std::set<nvinfer1::ILayer*> forwardLayers;
std::map<nvinfer1::ITensor*, nvinfer1::ILayer*> tensorToProducer = buildTensorProducerMap(network);
std::map<nvinfer1::ITensor*, std::vector<nvinfer1::ILayer*>> tensorToConsumer = buildTensorConsumerMap(network);
for (int i = 126; i < 179; i++) {
nvinfer1::ILayer* layer = network->getLayer(i);
std::cout << "layer " << i <<" " << layer->getName() << std::endl;
setLayerAndPredecessorsToFP32(layer, tensorToProducer, backwardLayers, 3);
setLayerAndSuccessorsToFP32(layer, tensorToConsumer, forwardLayers, 3);
}
然后再转换engine,但是不管怎么修改禁止量化的部分的代码,转换时都始终报错:
[NVINFER LOG]: 1: [verify_output_type] Mismatched type for tensor (Unnamed Layer_ 153) [Shuffle]_output', f16 vs. expected type:f32.
[NVINFER LOG]: 1: [codeGenerator.cpp::compileGraph::894] Error Code 1: Myelin ([verify_output_type] Mismatched type for tensor (Unnamed Layer_ 153) [Shuffle]_output', f16 vs. expected type:f32.)
看意思是Layer_ 153处的tensor期望是FP32,但是实际是FP16,开始没想明白,为何我强制设置了FP32还仍然有FP16类型的tensor冒出来?没法,只能笨办法去检查,把网络结构打出来:
(152) UNKNOWN name=onnx::Add_892
output name=(Unnamed Layer* 152) [Constant]_output, shape=[], dtype=FP32
(153) Shuffle name=(Unnamed Layer* 153) [Shuffle]
input name=(Unnamed Layer* 152) [Constant]_output, shape=[], dtype=FP32
output name=(Unnamed Layer* 153) [Shuffle]_output, shape=[1x1x1x1], dtype=FP32
从上面的网络结构信息看,报错处第layer 153的输入是layer 152,而layer 152是个常量,这里tensor始终为FP16只能是和这个Constant有关了,回头去检查onnx,发现这个Constant是个很小的值9.999999974752427e-7,这才想起来,python代码里对应地方为了防止除数为0给tensor数据加了一个1e-6,原来就是这种常数输入值被tensorrt始终做了FP16量化,用Layer的setPrecision()和setOutputType()去阻止量化没用,另外还实验了下面的强制阻止量化代码也没有用:
void convertConstantLayerToFP32(nvinfer1::IConstantLayer* constLayer) {
nvinfer1::Weights weights = constLayer->getWeights();
if (weights.type == nvinfer1::DataType::kFLOAT) {
std::cout << " Constant already FP32" << std::endl;
return;
}
if (weights.type == nvinfer1::DataType::kHALF) {
std::cout << " Converting constant from FP16 to FP32" << std::endl;
int64_t count = weights.count;
float* fp32Values = new float[count];
// FP16 is just uint16_t in memory
const uint16_t* fp16Values = static_cast<const uint16_t*>(weights.values);
for (int64_t i = 0; i < count; i++) {
fp32Values[i] = fp16_to_fp32(fp16Values[i]);
}
nvinfer1::Weights fp32Weights;
fp32Weights.type = nvinfer1::DataType::kFLOAT;
fp32Weights.values = fp32Values;
fp32Weights.count = count;
constLayer->setWeights(fp32Weights);
}
}
最后只好把那个小量从python代码里删掉重新导出onnx然后再转换engine就顺利成功了,感觉这种情况是tensorrt里的处理常量的代码应该是写死了。将那模块不做FP16量化后转换出的engine的推理效果就基本上和python版原始模型的推理效果差不多了。
1306

被折叠的 条评论
为什么被折叠?



