算能bmcv用于图像前处理

优雅的潮叭

已于 2025-03-11 13:49:05 修改

阅读量806

点赞数 10

文章标签： opencv 边缘计算 YOLO

于 2025-03-07 17:34:23 首次发布

本文链接：https://blog.youkuaiyun.com/Done_for_me/article/details/146096883

版权

前言

之前在文章中写过适配算能模型的一篇文章，有关算能SE7盒子适配yolov5的相关问题，其中给出了yolov5的前处理代码，此代码中前处理用的是原生opencvd的一些接口，所以不免会增添cpu的负担。其实盒子中自带的sophon-opencv并不是让你去使用原生接口的，而更倾向于bmcv。使用bmcv可以将前处理的一些操作放到推理卡上处理，能在一定程度上减轻cpu和内存的负担。

1、CV：：Mat转bm_image

bmcv api 均是围绕 bm_image 来进行的，一个 bm_image 对象对应于一张图片。bm_image 结构体定义如下:

struct bm_image {
    int width;
    int height;
    bm_image_format_ext image_format;
    bm_data_format_ext data_type;
    bm_image_private* image_private;
};

这里的bm_image 就类似于opencv的Mat,他们两个是可以互相转换的。
第一种方法：
新生成个Mat，默认创建设备内存，然后用copyTo()拷贝一次，把数据移到设备内存上，再重新用这个Mat来转成bm_image
第二种方法：
直接创建bm_image，然后用bm_image_copy_host_to_device,将Mat.data中的数据拷贝到bm_image的设备内存中。
第一种方法相关接口：

void Mat::create(int height, int width, int total, int _type, const size_t* _steps, void* _data, unsigned long addr, int fd, int id = 0)

void bmcv::uploadMat(Mat &mat)

还有个copyto()没有找到
bm_status_t bmcv::toBMI(Mat &m, bm_image *image, bool update = true)

第一种方法太过繁琐，本人使用了第二种方法：

1.创建bm_image 
bm_image bmimg;
  bm_image m_resized_imgs;

  bm_image_create(m_bmContext->handle(),img.rows,img.cols,FORMAT_BGR_PACKED,DATA_TYPE_EXT_1N_BYTE,&bmimg);
   bm_image_create(m_bmContext->handle(),input_h,input_w,FORMAT_RGB_PLANAR,DATA_TYPE_EXT_1N_BYTE,&m_resized_imgs);

bmimg为初始图片，m_resized_imgs为转换完图片。

FORMAT_BGR_PACKED 代表BGR紧凑排列，DATA_TYPE_EXT_1N_BYTE代表是uint8l类型

紧凑排列PACKED的意思是图片数据在内存中的排列是紧凑的，即第一个像素点的b、g、r像素值；第二个像素点的b、g、r这样的一维数据。

分开排列PLANAR的意思是先将b通道排列再将g通道排列，最后r通道排列，这种排列方式是前处理完了之后我们所需要的。
使用紧凑排列的原因是opencv读取的图片在内存中的存在方式就是这种，而且是bgr的。

2.将mat拷贝到bm_image中
void *buffer[1]={static_cast<void*>(img.data)};
auto ret=bm_image_copy_host_to_device(bmimg,buffer);

bm_image_copy_host_to_device接口如下：
bm_status_t bm_image_copy_host_to_device(
       bm_image image,
       void* buffers[]
);

函数需要一个void**输入，buffer中存放mat首地址。

3.计算resize与padding
bool isAlignWidth = false;
 float ratio;
  float r_w = (float) input_w/bmimg.width;
  float r_h = (float) input_h/bmimg.height;
  if (r_h > r_w){
    isAlignWidth = true;
    ratio = r_w;
  }
  else{
    isAlignWidth = false;
    ratio = r_h;
  }
   bmcv_padding_atrr_t padding_attr;
    memset(&padding_attr, 0, sizeof(padding_attr));
    padding_attr.dst_crop_sty = 0;
    padding_attr.dst_crop_stx = 0;
    padding_attr.padding_b = 114;
    padding_attr.padding_g = 114;
    padding_attr.padding_r = 114;
    padding_attr.if_memset = 1;
    if (isAlignWidth) {
      padding_attr.dst_crop_h = bmimg.height*ratio;
      padding_attr.dst_crop_w = input_w;

      int ty1 = (int)((input_h - padding_attr.dst_crop_h) / 2);
      padding_attr.dst_crop_sty = ty1;
      padding_attr.dst_crop_stx = 0;
    }else{
      padding_attr.dst_crop_h = input_h;
      padding_attr.dst_crop_w = bmimg.width*ratio;

      int tx1 = (int)((input_w - padding_attr.dst_crop_w) / 2);
      padding_attr.dst_crop_sty = 0;
      padding_attr.dst_crop_stx = tx1;
    }

ratio是取最小边与模型输入的比值，padding_attr是一个结构体

typedef struct bmcv_padding_atrr_s {
    unsigned int    dst_crop_stx;
    unsigned int    dst_crop_sty;
    unsigned int    dst_crop_w;
    unsigned int    dst_crop_h;
    unsigned char padding_r;
    unsigned char padding_g;
    unsigned char padding_b;
    int           if_memset;
} bmcv_padding_atrr_t;

目标小图的左上角顶点相对于 dst image 原点（左上角）的offset信息：dst_crop_stx 和 dst_crop_sty；
目标小图经resize后的宽高：dst_crop_w 和 dst_crop_h；
dst image 如果是RGB格式，各通道需要padding的像素值信息：padding_r、padding_g、padding_b，当if_memset=1时有效，如果是GRAY图像可以将三个值均设置为同一个值；
if_memset表示要不要在该api内部对dst image 按照各个通道的padding值做memset，仅支持RGB和GRAY格式的图像。

4.转换
    bmcv_rect_t crop_rect{0, 0, bmimg.width, bmimg.height};
     ret = bmcv_image_vpp_convert_padding(m_bmContext->handle(), batch_size_, bmimg, &m_resized_imgs,
        &padding_attr, &crop_rect, BMCV_INTER_LINEAR);

crop_rect是各个目标小图的坐标和宽高信息；bm1684x 支持BMCV_INTER_NEAREST， BMCV_INTER_LINEAR。默认为双线性插值。

5.开辟输入内存，并将转换玩完的图片与内存关联
bm_device_mem_t input_dev_mem;
  bm_image_get_contiguous_device_mem(batch_size_, &m_resized_imgs, &input_dev_mem);
  m_input_tensor->set_device_mem(&input_dev_mem);
  m_input_tensor->set_shape_by_dim(0, batch_size_);

6.释放图片内存
bm_image_destroy(bmimg);
bm_image_destroy(m_resized_imgs);

2、推理

代码如下（示例）：

int ret = m_bmNetwork->forward();

  if (ret)
  {
    std::cout << "推理失败\n";
    
  }

直接通过句柄调用forward函数进行推理

处理推理结果，batch_size_暂定为1
m_output_tensor = m_bmNetwork->outputTensor(0);
     float *ptr = (float *)m_output_tensor->get_cpu_data();
     for(int i=0;i<output_count;i++){
     int l=0;
     int r=0;
     for (int i = 0; i < batch_size_; i++){
      for(r;r<output_count;){
        if(*(ptr+r)==i&&*(ptr+r+2)!=0){
          r+=7;

        }else{
          memcpy(data_ptr_raw.data()+i*(output_count/batch_size_) ,   ptr+l,  (r-l) * sizeof(float));
          l=r;
          break;
        }
      }
     }
  }

3、后处理

const float *net_output = output;

  //int MAX_NUM_BOX = 200;
  //int step = 7;
  auto range_0_1 = [](float num) { return std::max(.0f, std::min(1.0f, num)); };

  const int img_w = img.cols;
  const int img_h = img.rows;
  const float model_input_w = model_size_[2];
  const float model_input_h = model_size_[3];

  float scaling_factors = std::min(1.0 * model_input_w / img_w, 1.0 * model_input_h / img_h);

  int scaled_w = scaling_factors * img_w;
  int scaled_h = scaling_factors * img_h;

   int box_step = 7;

  std::vector<std::shared_ptr<CNInferObject>> temp;
  auto obj_origen = std::make_shared<CNInferObject>();
    obj_origen->bbox.x = 0;
    obj_origen->bbox.y = 0;
    obj_origen->bbox.w =img.cols;
    obj_origen->bbox.h = img.rows;
  temp.push_back(obj_origen);


 for (int box_idx = 0; box_idx < 200; ++box_idx) {
   if(net_output[box_idx * box_step+2]==0||net_output[box_idx * box_step+2]>1)
     continue;

   float right = net_output[box_idx * box_step + 5];
   float bottom = net_output[box_idx * box_step + 6];
   float left = net_output[box_idx * box_step + 3]-right/2;
   float top = net_output[box_idx * box_step + 4]-bottom/2;
   right=right+left;
   bottom=top+bottom;

   // rectify
   left = (left - (model_input_w - scaled_w) / 2) / scaled_w;
   right = (right - (model_input_w - scaled_w) / 2) / scaled_w;
   top = (top - (model_input_h - scaled_h) / 2) / scaled_h;
   bottom = (bottom - (model_input_h - scaled_h) / 2) / scaled_h;
   left = range_0_1(left);
   right = range_0_1(right);
   top = range_0_1(top);
   bottom = range_0_1(bottom);



    auto obj = std::make_shared<CNInferObject>();
    int id_ = static_cast<int>(net_output[ box_idx * box_step + 1]);
    obj->id = std::to_string(id_);
    obj->score = net_output[ box_idx * box_step + 2];

        if(threshold.size()>id_)
    {
      if (obj->score<threshold[id_])
        continue;
    }
    else
    {
      if(obj->score<*std::min_element(std::begin(threshold), std::end(threshold)))
        continue;
    }

    int re_lo_fa = 0;
    obj->bbox.x = left;
    obj->bbox.y = top;
    obj->bbox.w = std::min(1.0f - obj->bbox.x, right - left);
    obj->bbox.h = std::min(1.0f - obj->bbox.y, bottom - top);
    //转绝对坐标
    obj->bbox.x = max(obj->bbox.x * img.cols,1.0f);
    obj->bbox.y = max(obj->bbox.y * img.rows,1.0f);
    obj->bbox.w = max(obj->bbox.w * img.cols,1.0f);
    obj->bbox.h = max(obj->bbox.h * img.rows,1.0f);
    obj->id_fa_lo = re_lo_fa;
    std::string name = model_name_;
    obj->model_name = name;
    // DLOG(INFO) << "obj->bbox.x " << obj->bbox.x << " "
    //     << "obj->bbox.y " << obj->bbox.y << " "
    //     << "obj->bbox.w " << obj->bbox.w << " "
    //     << "obj->bbox.h " << obj->bbox.h << " "
    //     <<"score"<<obj->score;
    if (obj->bbox.x <= 0 || obj->bbox.y <= 0 || obj->bbox.h <= 0 || obj->bbox.w <= 0 )
      continue;
      obj->id_lo = i;
          temp.emplace_back(obj);
   
    
  }