深度学习c++部署高性能优化的技巧

最新推荐文章于 2024-10-01 06:57:32 发布

zsffuture

最新推荐文章于 2024-10-01 06:57:32 发布

阅读量741

点赞数

CC 4.0 BY-SA版权

分类专栏： c++ cuda 深度学习文章标签：深度学习 c++ 计算机视觉

本文链接：https://blog.youkuaiyun.com/weixin_42398658/article/details/125018367

本文探讨了在C++中实现深度学习模型部署时，如何高效地进行图片预处理的关键技术和策略，旨在提升计算机视觉应用的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

图片预处理的高性能实现

int input_batch = 1;
    int input_channel = 3;
    int input_height = 224;
    int input_width = 224;
    int input_numel = input_batch * input_channel * input_height * input_width;
    float* input_data_host = nullptr;
    float* input_data_device = nullptr;
    checkRuntime(cudaMallocHost(&input_data_host, input_numel * sizeof(float)));
    checkRuntime(cudaMalloc(&input_data_device, input_numel * sizeof(float)));

    ///////////////////////////////////////////////////
    // image to float
    auto image = cv::imread("dog.jpg");
    float mean[] = {0.406, 0.456, 0.485};
    float std[]  = {0.225, 0.224, 0.229};

    // 对应于pytorch的代码部分
    cv::resize(image, image, cv::Size(input_width, input_height));
    // opencv读取的图片数据为BRG，模型的输入是RGB，因此需要转换一次
    // BRG的图片排列格式是：BGRBGRBGR... -> BBBB....GGGG....RRRR....
    // 这里采用指针的方式，把数据转换同时对其进行归一化   
    int image_area = image.cols * image.rows;
    unsigned char* pimage = image.d