[实训题目EmoProfo]视频捕获优化——深度学习框架高效接口实现(一)

本文介绍基于C语言的深度学习框架Darknet的特点及优化实践,包括使用云相机思想处理图片、支持多摄像头及图片存储流程。通过分析Darknet的demo程序,详细解释其图像加载、网络预测、结果输出等过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

视频捕获优化——深度学习框架高效接口实现(一)


这一周工作的展开是根据上面提到的优化思路来进行的:

  • 使用云相机思想,将图片首先发送至一个专用服务器,这个服务器对图片进行简单处理,之后再由后端完成api的调用
  • 在拥有服务器的基础上,添加对多摄像头的支持
    • 将收到的图片先存储到映射到内存的目录中
    • 打开图片,完成处理、存储到永久存储

依照这些思路,我们首先需要选择一个高效的深度学习框架。经过讨论,我们最终决定使用darknet来实现我们的设想。

darknet特点简介

这是一个比较新的深度学习框架,完全由C语言编写,几乎没有依赖项。但是问题是这个框架虽然提供了Python接口,但是效率低下,不适合在高强度的工作下使用。

项目作者充满恶意,在代码中一行有价值的注释都没有,索性他的编码风格易于接受,简单清晰,命名方式符合规范,比较容易理解。我对他的代码进行了简单的阅读,提取出了框架中几个关键的地方

  • image结构体
    • 这个结构体名字非常的清楚,就是图片,不像其他深度学习框架一样起一个让人无法理解的名字(如:Blob、Tensor等),内容也非常简单:图片的长宽高通道数+图片数据的缓冲区(float)格式
    • 图片数据的存储方式:使用float格式存储数据,根据阅读源码,了解到这里的float是进行过归一化的。另外图片的映射方式也比较独特,一般的图形处理库都是按照列、行、通道的顺序映射,而image类是按照通道、列、行的顺序映射的,这样方便了向量向网络中的输入
  • network结构体
    • 网络的整体架构,定义了大量的超参数和网络层的指针 ,是识别、训练函数的操作基础
  • layer结构体
    • 定义了层的所有相关动态以及静态参数,参与识别和训练的相关计算
  • detection结构体
    • 存储了一次检测的结果,包含了目标框的位置和置信度

除了上述代码层面的内容外,还不得不提这个框架所生成/使用的一系列文件:

  • cfg文件:描述了一个网络的定义,那一层都有那些特点,等同于caffe框架的prototxt
  • weights文件:权值文件,一些二进制的权值列表,按照对应cfg的说明加载到内存当中就能形成一个完整的模型并完成识别和相关的训练工作
  • data文件:对数据集的描述,含有与数据集相关的文件的路径
    • train:训练集的文件名列表
    • valid:验证集的文件列表
    • names:相关类别的名称

demo程序解读

整个框架由于是C语言编写,没有面向对象,很难通过直接阅读框架源码看清每个结构的功能。所以我决定采取分析demo代码的方式来理解这个框架

这个demo的用法比较简单,包括了darknet的所有功能(网络训练、识别、测试等),方法就是在调用程序的时候给出选项参数,然后制定需要的文件、选择的gpu就可以了

int main(int argc, char **argv)
{
    //test_resize("data/bad.jpg");
    //test_box();
    //test_convolutional_layer();
    
    //首先是参数处理,如果参数不合法,将会打印用法并直接结束
    if(argc < 2){
        fprintf(stderr, "usage: %s <function>\n", argv[0]);
        return 0;
    }
    gpu_index = find_int_arg(argc, argv, "-i", 0);
    if(find_arg(argc, argv, "-nogpu")) {
        gpu_index = -1;
    }

#ifndef GPU
    gpu_index = -1;
#else
    //如果使用了gpu,需要根据参数完成gpu的选择
    if(gpu_index >= 0){
        cuda_set_device(gpu_index);
    }
#endif

    if (0 == strcmp(argv[1], "average")){
        average(argc, argv);
    } else if (0 == strcmp(argv[1], "yolo")){
        run_yolo(argc, argv);
    } else if (0 == strcmp(argv[1], "super")){
        run_super(argc, argv);
    } else if (0 == strcmp(argv[1], "lsd")){
        run_lsd(argc, argv);
    } else if (0 == strcmp(argv[1], "detector")){
        //识别方法的入口
        run_detector(argc, argv);
    } else if (0 == strcmp(argv[1], "detect")){
        //根据官方说明,这是简化的识别方法入口
        float thresh = find_float_arg(argc, argv, "-thresh", .5);
        char *filename = (argc > 4) ? argv[4]: 0;
        char *outfile = find_char_arg(argc, argv, "-out", 0);
        int fullscreen = find_arg(argc, argv, "-fullscreen");
        test_detector("cfg/coco.data", argv[2], argv[3], filename, thresh, .5, outfile, fullscreen);
    } else if (0 == strcmp(argv[1], "cifar")){
        run_cifar(argc, argv);
    } else if (0 == strcmp(argv[1], "go")){
        run_go(argc, argv);
    } else if (0 == strcmp(argv[1], "rnn")){
        run_char_rnn(argc, argv);
    } else if (0 == strcmp(argv[1], "coco")){
        run_coco(argc, argv);
    } else if (0 == strcmp(argv[1], "classify")){
        predict_classifier("cfg/imagenet1k.data", argv[2], argv[3], argv[4], 5);
    } else if (0 == strcmp(argv[1], "classifier")){
        run_classifier(argc, argv);
    } else if (0 == strcmp(argv[1], "regressor")){
        run_regressor(argc, argv);
    } else if (0 == strcmp(argv[1], "segmenter")){
        run_segmenter(argc, argv);
    } else if (0 == strcmp(argv[1], "art")){
        run_art(argc, argv);
    } else if (0 == strcmp(argv[1], "tag")){
        run_tag(argc, argv);
    } else if (0 == strcmp(argv[1], "3d")){
        composite_3d(argv[2], argv[3], argv[4], (argc > 5) ? atof(argv[5]) : 0);
    } else if (0 == strcmp(argv[1], "test")){
        test_resize(argv[2]);
    } else if (0 == strcmp(argv[1], "captcha")){
        run_captcha(argc, argv);
    } else if (0 == strcmp(argv[1], "nightmare")){
        run_nightmare(argc, argv);
    } else if (0 == strcmp(argv[1], "rgbgr")){
        rgbgr_net(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "reset")){
        reset_normalize_net(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "denormalize")){
        denormalize_net(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "statistics")){
        statistics_net(argv[2], argv[3]);
    } else if (0 == strcmp(argv[1], "normalize")){
        normalize_net(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "rescale")){
        rescale_net(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "ops")){
        operations(argv[2]);
    } else if (0 == strcmp(argv[1], "speed")){
        speed(argv[2], (argc > 3 && argv[3]) ? atoi(argv[3]) : 0);
    } else if (0 == strcmp(argv[1], "oneoff")){
        oneoff(argv[2], argv[3], argv[4]);
    } else if (0 == strcmp(argv[1], "oneoff2")){
        oneoff2(argv[2], argv[3], argv[4], atoi(argv[5]));
    } else if (0 == strcmp(argv[1], "print")){
        print_weights(argv[2], argv[3], atoi(argv[4]));
    } else if (0 == strcmp(argv[1], "partial")){
        partial(argv[2], argv[3], argv[4], atoi(argv[5]));
    } else if (0 == strcmp(argv[1], "average")){
        average(argc, argv);
    } else if (0 == strcmp(argv[1], "visualize")){
        visualize(argv[2], (argc > 3) ? argv[3] : 0);
    } else if (0 == strcmp(argv[1], "mkimg")){
        mkimg(argv[2], argv[3], atoi(argv[4]), atoi(argv[5]), atoi(argv[6]), argv[7]);
    } else if (0 == strcmp(argv[1], "imtest")){
        test_resize(argv[2]);
    } else {
        fprintf(stderr, "Not an option: %s\n", argv[1]);
    }
    return 0;
}

鉴于我们的应用,在这里我们只对识别方法进行研究,run_detector和test_detector都定义在detector.c中

首先看一下接口最简单的run_detector函数

void run_detector(int argc, char **argv)
{
    //这里直接把命令行参数传了进来,继续参数处理工作
    char *prefix = find_char_arg(argc, argv, "-prefix", 0);
    float thresh = find_float_arg(argc, argv, "-thresh", .5);	//获取阈值
    float hier_thresh = find_float_arg(argc, argv, "-hier", .5);
    int cam_index = find_int_arg(argc, argv, "-c", 0);	//获取输入设备
    int frame_skip = find_int_arg(argc, argv, "-s", 0);
    int avg = find_int_arg(argc, argv, "-avg", 3);
    if(argc < 4){
        fprintf(stderr, "usage: %s %s [train/test/valid] [cfg] [weights (optional)]\n", argv[0], argv[1]);
        return;
    }
    //设置gpu
    char *gpu_list = find_char_arg(argc, argv, "-gpus", 0);
    char *outfile = find_char_arg(argc, argv, "-out", 0);
    int *gpus = 0;
    int gpu = 0;
    int ngpus = 0;
    if(gpu_list){
        printf("%s\n", gpu_list);
        int len = strlen(gpu_list);
        ngpus = 1;
        int i;
        for(i = 0; i < len; ++i){
            if (gpu_list[i] == ',') ++ngpus;
        }
        gpus = calloc(ngpus, sizeof(int));
        for(i = 0; i < ngpus; ++i){
            gpus[i] = atoi(gpu_list);
            gpu_list = strchr(gpu_list, ',')+1;
        }
    } else {
        gpu = gpu_index;
        gpus = &gpu;
        ngpus = 1;
    }
	//设置结果显示
    int clear = find_arg(argc, argv, "-clear");
    int fullscreen = find_arg(argc, argv, "-fullscreen");
    int width = find_int_arg(argc, argv, "-w", 0);
    int height = find_int_arg(argc, argv, "-h", 0);
    int fps = find_int_arg(argc, argv, "-fps", 0);
    //int class = find_int_arg(argc, argv, "-class", 0);
	//确定网络模型相关的文件路径
    char *datacfg = argv[3];
    char *cfg = argv[4];
    char *weights = (argc > 5) ? argv[5] : 0;
    char *filename = (argc > 6) ? argv[6]: 0;
	//根据参数选择执行不同的函数
    if(0==strcmp(argv[2], "test")) test_detector(datacfg, cfg, weights, filename, thresh, hier_thresh, outfile, fullscreen);
    else if(0==strcmp(argv[2], "train")) train_detector(datacfg, cfg, weights, gpus, ngpus, clear);
    else if(0==strcmp(argv[2], "valid")) validate_detector(datacfg, cfg, weights, outfile);
    else if(0==strcmp(argv[2], "valid2")) validate_detector_flip(datacfg, cfg, weights, outfile);
    else if(0==strcmp(argv[2], "recall")) validate_detector_recall(cfg, weights);
    else if(0==strcmp(argv[2], "demo")) {
        list *options = read_data_cfg(datacfg);
        int classes = option_find_int(options, "classes", 20);
        char *name_list = option_find_str(options, "names", "data/names.list");
        char **names = get_labels(name_list);
        demo(cfg, weights, thresh, cam_index, filename, names, classes, frame_skip, prefix, avg, hier_thresh, width, height, fps, fullscreen);
    }
    //else if(0==strcmp(argv[2], "extract")) extract_detector(datacfg, cfg, weights, cam_index, filename, class, thresh, frame_skip);
    //else if(0==strcmp(argv[2], "censor")) censor_detector(datacfg, cfg, weights, cam_index, filename, class, thresh, frame_skip);
}

这个函数仅仅是继续处理参数,并没有直接写出相关逻辑。前面main函数中有一个参数分支直接调用了test_detector,这里的test分支同样调用了test_detector分支,我们下面就看看test_detector做了什么。

void test_detector(char *datacfg, char *cfgfile, char *weightfile, char *filename, float thresh, float hier_thresh, char *outfile, int fullscreen)
{
    list *options = read_data_cfg(datacfg);	//读入data文件
    char *name_list = option_find_str(options, "names", "data/names.list");	//从读入的data文件内容中找到name文件并读入
    char **names = get_labels(name_list);	//处理读入的name

    image **alphabet = load_alphabet();	//加载字母表
    network *net = load_network(cfgfile, weightfile, 0);	//加载权值文件,形成网络模型
    set_batch_network(net, 1);	//给net中的每一个layer结构体设置batch大小
    srand(2222222);
    double time;
    char buff[256];
    char *input = buff;
    float nms=.45;
    while(1){	//主循环开始,处理用户输入的一系列文件
        if(filename){	//获取用户制定的文件名
            strncpy(input, filename, 256);
        } else {
            printf("Enter Image Path: ");
            fflush(stdout);
            input = fgets(input, 256, stdin);
            if(!input) return;
            strtok(input, "\n");
        }
        image im = load_image_color(input,0,0);	//加载文件名
        image sized = letterbox_image(im, net->w, net->h);	//对图片进行处理,在比例不变的情况下,处理成网络模型输入层的大小
        //image sized = resize_image(im, net->w, net->h);
        //image sized2 = resize_max(im, net->w);
        //image sized = crop_image(sized2, -((net->w - sized2.w)/2), -((net->h - sized2.h)/2), net->w, net->h);
        //resize_network(net, sized.w, sized.h);
        layer l = net->layers[net->n-1];


        float *X = sized.data;
        time=what_time_is_it_now();
        network_predict(net, X);	//将处理过的图片输入到网络,完成识别
        printf("%s: Predicted in %f seconds.\n", input, what_time_is_it_now()-time);
        int nboxes = 0;
        detection *dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, 0, 1, &nboxes);	//获取识别结果
        //printf("%d\n", nboxes);
        //if (nms) do_nms_obj(boxes, probs, l.w*l.h*l.n, l.classes, nms);
        if (nms) do_nms_sort(dets, nboxes, l.classes, nms);	//使用nms作为阈值,去除检测结果中的重复目标
        draw_detections(im, dets, nboxes, thresh, names, alphabet, l.classes);	//绘制识别结果到一张新的图片
        free_detections(dets, nboxes);	//释放识别结果
        if(outfile){	//将结果图片保存
            save_image(im, outfile);
        }
        else{
            save_image(im, "predictions");
#ifdef OPENCV	//如果使用了opencv,可以直接将图片显示出来
            cvNamedWindow("predictions", CV_WINDOW_NORMAL); 
            if(fullscreen){
                cvSetWindowProperty("predictions", CV_WND_PROP_FULLSCREEN, CV_WINDOW_FULLSCREEN);
            }
            show_image(im, "predictions");
            cvWaitKey(0);
            cvDestroyAllWindows();
#endif
        }

        free_image(im);
        free_image(sized);
        if (filename) break;
    }
}

得出上面注释,是有根据的,下面我们仔细分析整个过程中的关键函数

  • load_alphabet

    image **load_alphabet()
    {
        int i, j;
        const int nsize = 8;
        image **alphabets = calloc(nsize, sizeof(image));
        for(j = 0; j < nsize; ++j){
            alphabets[j] = calloc(128, sizeof(image));
            for(i = 32; i < 127; ++i){
                char buff[256];
                sprintf(buff, "data/labels/%d_%d.png", i, j);
                alphabets[j][i] = load_image_color(buff, 0, 0);
            }
        }
        return alphabets;
    }
    

    很显然,得到的是一个图片数组,这些图片全部来自与data/labels/路径,我们可以看一下这个路径下的内容

Alphabet_pic

加载这样的图片原因很简单——输出的图上有目标框和目标的类别,这些图片就是用来拼写目标类别的

  • load_network

    network *load_network(char *cfg, char *weights, int clear)
    {
        network *net = parse_network_cfg(cfg);
        if(weights && weights[0] != 0){
            load_weights(net, weights);
        }
        if(clear) (*net->seen) = 0;
        return net;
    }
    

    这个过程首先按照权值文件内容构建了网络模型,然后调用了load_weights加载了权值文件

    其中parse_network_cfg的定义在parser.c中

    network *parse_network_cfg(char *filename)
    {
        list *sections = read_cfg(filename);
        node *n = sections->front;
        if(!n) error("Config file has no sections");
        network *net = make_network(sections->size - 1);
        net->gpu_index = gpu_index;
        size_params params;
    
        section *s = (section *)n->val;
        list *options = s->options;
        if(!is_network(s)) error("First section must be [net] or [network]");
        parse_net_options(options, net);
    
        params.h = net->h;
        params.w = net->w;
        params.c = net->c;
        params.inputs = net->inputs;
        params.batch = net->batch;
        params.time_steps = net->time_steps;
        params.net = net;
    
        size_t workspace_size = 0;
        n = n->next;
        int count = 0;
        free_section(s);
        fprintf(stderr, "layer     filters    size              input                output\n");
        while(n){
            params.index = count;
            fprintf(stderr, "%5d ", count);
            s = (section *)n->val;
            options = s->options;
            layer l = {0};
            LAYER_TYPE lt = string_to_layer_type(s->type);
            if(lt == CONVOLUTIONAL){
                l = parse_convolutional(options, params);
            }else if(lt == DECONVOLUTIONAL){
                l = parse_deconvolutional(options, params);
            }else if(lt == LOCAL){
                l = parse_local(options, params);
            }else if(lt == ACTIVE){
                l = parse_activation(options, params);
            }else if(lt == LOGXENT){
                l = parse_logistic(options, params);
            }else if(lt == L2NORM){
                l = parse_l2norm(options, params);
            }else if(lt == RNN){
                l = parse_rnn(options, params);
            }else if(lt == GRU){
                l = parse_gru(options, params);
            }else if (lt == LSTM) {
                l = parse_lstm(options, params);
            }else if(lt == CRNN){
                l = parse_crnn(options, params);
            }else if(lt == CONNECTED){
                l = parse_connected(options, params);
            }else if(lt == CROP){
                l = parse_crop(options, params);
            }else if(lt == COST){
                l = parse_cost(options, params);
            }else if(lt == REGION){
                l = parse_region(options, params);
            }else if(lt == YOLO){
                l = parse_yolo(options, params);
            }else if(lt == DETECTION){
                l = parse_detection(options, params);
            }else if(lt == SOFTMAX){
                l = parse_softmax(options, params);
                net->hierarchy = l.softmax_tree;
            }else if(lt == NORMALIZATION){
                l = parse_normalization(options, params);
            }else if(lt == BATCHNORM){
                l = parse_batchnorm(options, params);
            }else if(lt == MAXPOOL){
                l = parse_maxpool(options, params);
            }else if(lt == REORG){
                l = parse_reorg(options, params);
            }else if(lt == AVGPOOL){
                l = parse_avgpool(options, params);
            }else if(lt == ROUTE){
                l = parse_route(options, params, net);
            }else if(lt == UPSAMPLE){
                l = parse_upsample(options, params, net);
            }else if(lt == SHORTCUT){
                l = parse_shortcut(options, params, net);
            }else if(lt == DROPOUT){
                l = parse_dropout(options, params);
                l.output = net->layers[count-1].output;
                l.delta = net->layers[count-1].delta;
    #ifdef GPU
                l.output_gpu = net->layers[count-1].output_gpu;
                l.delta_gpu = net->layers[count-1].delta_gpu;
    #endif
            }else{
                fprintf(stderr, "Type not recognized: %s\n", s->type);
            }
            l.clip = net->clip;
            l.truth = option_find_int_quiet(options, "truth", 0);
            l.onlyforward = option_find_int_quiet(options, "onlyforward", 0);
            l.stopbackward = option_find_int_quiet(options, "stopbackward", 0);
            l.dontsave = option_find_int_quiet(options, "dontsave", 0);
            l.dontload = option_find_int_quiet(options, "dontload", 0);
            l.dontloadscales = option_find_int_quiet(options, "dontloadscales", 0);
            l.learning_rate_scale = option_find_float_quiet(options, "learning_rate", 1);
            l.smooth = option_find_float_quiet(options, "smooth", 0);
            option_unused(options);
            net->layers[count] = l;
            if (l.workspace_size > workspace_size) workspace_size = l.workspace_size;
            free_section(s);
            n = n->next;
            ++count;
            if(n){
                params.h = l.out_h;
                params.w = l.out_w;
                params.c = l.out_c;
                params.inputs = l.outputs;
            }
        }
        free_list(sections);
        layer out = get_network_output_layer(net);
        net->outputs = out.outputs;
        net->truths = out.outputs;
        if(net->layers[net->n-1].truths) net->truths = net->layers[net->n-1].truths;
        net->output = out.output;
        net->input = calloc(net->inputs*net->batch, sizeof(float));
        net->truth = calloc(net->truths*net->batch, sizeof(float));
    #ifdef GPU
        net->output_gpu = out.output_gpu;
        net->input_gpu = cuda_make_array(net->input, net->inputs*net->batch);
        net->truth_gpu = cuda_make_array(net->truth, net->truths*net->batch);
    #endif
        if(workspace_size){
            //printf("%ld\n", workspace_size);
    #ifdef GPU
            if(gpu_index >= 0){
                net->workspace = cuda_make_array(0, (workspace_size-1)/sizeof(float)+1);
            }else {
                net->workspace = calloc(1, workspace_size);
            }
    #else
            net->workspace = calloc(1, workspace_size);
    #endif
        }
        return net;
    }
    

    我们看到,这个函数的定义非常长,因为layer结构体的成员量就非常巨大,需要给每一个成员进行初始化就需要大量代码。

    整个函数最主要的部分就是一个循环,这个循环中,网络层依次构建,并被加入到网络中

    之后,是load_weights函数

    void load_weights(network *net, char *filename)
    {
        load_weights_upto(net, filename, 0, net->n);
    }
    
    void load_weights_upto(network *net, char *filename, int start, int cutoff)
    {
    #ifdef GPU
        if(net->gpu_index >= 0){
            cuda_set_device(net->gpu_index);
        }
    #endif
        fprintf(stderr, "Loading weights from %s...", filename);
        fflush(stdout);
        FILE *fp = fopen(filename, "rb");
        if(!fp) file_error(filename);
    
        int major;
        int minor;
        int revision;
        fread(&major, sizeof(int), 1, fp);
        fread(&minor, sizeof(int), 1, fp);
        fread(&revision, sizeof(int), 1, fp);
        if ((major*10 + minor) >= 2 && major < 1000 && minor < 1000){
            fread(net->seen, sizeof(size_t), 1, fp);
        } else {
            int iseen = 0;
            fread(&iseen, sizeof(int), 1, fp);
            *net->seen = iseen;
        }
        int transpose = (major > 1000) || (minor > 1000);
    
        int i;
        for(i = start; i < net->n && i < cutoff; ++i){
            layer l = net->layers[i];
            if (l.dontload) continue;
            if(l.type == CONVOLUTIONAL || l.type == DECONVOLUTIONAL){
                load_convolutional_weights(l, fp);
            }
            if(l.type == CONNECTED){
                load_connected_weights(l, fp, transpose);
            }
            if(l.type == BATCHNORM){
                load_batchnorm_weights(l, fp);
            }
            if(l.type == CRNN){
                load_convolutional_weights(*(l.input_layer), fp);
                load_convolutional_weights(*(l.self_layer), fp);
                load_convolutional_weights(*(l.output_layer), fp);
            }
            if(l.type == RNN){
                load_connected_weights(*(l.input_layer), fp, transpose);
                load_connected_weights(*(l.self_layer), fp, transpose);
                load_connected_weights(*(l.output_layer), fp, transpose);
            }
            if (l.type == LSTM) {
                load_connected_weights(*(l.wi), fp, transpose);
                load_connected_weights(*(l.wf), fp, transpose);
                load_connected_weights(*(l.wo), fp, transpose);
                load_connected_weights(*(l.wg), fp, transpose);
                load_connected_weights(*(l.ui), fp, transpose);
                load_connected_weights(*(l.uf), fp, transpose);
                load_connected_weights(*(l.uo), fp, transpose);
                load_connected_weights(*(l.ug), fp, transpose);
            }
            if (l.type == GRU) {
                if(1){
                    load_connected_weights(*(l.wz), fp, transpose);
                    load_connected_weights(*(l.wr), fp, transpose);
                    load_connected_weights(*(l.wh), fp, transpose);
                    load_connected_weights(*(l.uz), fp, transpose);
                    load_connected_weights(*(l.ur), fp, transpose);
                    load_connected_weights(*(l.uh), fp, transpose);
                }else{
                    load_connected_weights(*(l.reset_layer), fp, transpose);
                    load_connected_weights(*(l.update_layer), fp, transpose);
                    load_connected_weights(*(l.state_layer), fp, transpose);
                }
            }
            if(l.type == LOCAL){
                int locations = l.out_w*l.out_h;
                int size = l.size*l.size*l.c*l.n*locations;
                fread(l.biases, sizeof(float), l.outputs, fp);
                fread(l.weights, sizeof(float), size, fp);
    #ifdef GPU
                if(gpu_index >= 0){
                    push_local_layer(l);
                }
    #endif
            }
        }
        fprintf(stderr, "Done!\n");
        fclose(fp);
    }
    

    这个函数的套路和上面的parse_network_cfg很像,根据每层的定义,从权值文件中逐层加载权值

  • set_batch_network

    void set_batch_network(network *net, int b)
    {
        net->batch = b;
        int i;
        for(i = 0; i < net->n; ++i){
            net->layers[i].batch = b;
    #ifdef CUDNN
            if(net->layers[i].type == CONVOLUTIONAL){
                cudnn_convolutional_setup(net->layers + i);
            }
            if(net->layers[i].type == DECONVOLUTIONAL){
                layer *l = net->layers + i;
                cudnnSetTensor4dDescriptor(l->dstTensorDesc, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 1, l->out_c, l->out_h, l->out_w);
                cudnnSetTensor4dDescriptor(l->normTensorDesc, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 1, l->out_c, 1, 1); 
            }
    #endif
        }
    }
    

    这个函数非常的清楚,就是把每个层的batch大小设好,如果使用cudnn的话,则需要更多操作

  • load_image_color

    image load_image_color(char *filename, int w, int h)
    {
        return load_image(filename, w, h, 3);
    }
    ...
    image load_image(char *filename, int w, int h, int c) //这里可以搞个多线程
    {
    #ifdef OPENCV
        image out = load_image_cv(filename, c);
    #else
        image out = load_image_stb(filename, c);
    #endif
    
        if((h && w) && (h != out.h || w != out.w)){
            image resized = resize_image(out, w, h);
            free_image(out);
            out = resized;
        }
        return out;
    }
    ...
    image load_image_cv(char *filename, int channels)
    {
        IplImage* src = 0;
        int flag = -1;
        if (channels == 0) flag = -1;
        else if (channels == 1) flag = 0;
        else if (channels == 3) flag = 1;
        else {
            fprintf(stderr, "OpenCV can't force load with %d channels\n", channels);
        }
    
        if( (src = cvLoadImage(filename, flag)) == 0 )
        {
            fprintf(stderr, "Cannot load image \"%s\"\n", filename);
            char buff[256];
            sprintf(buff, "echo %s >> bad.list", filename);
            system(buff);
            return make_image(10,10,3);
            //exit(0);
        }
        image out = ipl_to_image(src);
        cvReleaseImage(&src);
        rgbgr_image(out);	//opencv的通道顺序是rgb,但是本框架使用的是bgr
        return out;
    }
    ...
    image ipl_to_image(IplImage* src)
    {
        int h = src->height;
        int w = src->width;
        int c = src->nChannels;
        image out = make_image(w, h, c);
        ipl_into_image(src, out);
        return out;
    }
    ...
    void ipl_into_image(IplImage* src, image im) //for循环还行。。。
    {
        unsigned char *data = (unsigned char *)src->imageData;
        int h = src->height;
        int w = src->width;
        int c = src->nChannels;
        int step = src->widthStep;
        int i, j, k;
    
        for(i = 0; i < h; ++i){
            for(k= 0; k < c; ++k){
                for(j = 0; j < w; ++j){
                    im.data[k*w*h + i*w + j] = data[i*step + j*c + k]/255.;
                }
            }
        }
    }
    
    

    上面的一系列函数完成的都是将读取图像,转化成image结构的函数。我们将使用opencv,所以图像的加载过程我们选择opencv系列函数,可以看出,加载过程经历了如下阶段

    • 使用opencv读入图片
    • 逐个像素地完成缓冲区的格式转换
    • 设置新image对象的长宽通道
    • 将获得的image结构体在有必要的情况下调整大小
    • 返回最终的image结构体
  • network_predict

    float *network_predict(network *net, float *input)
    {
        network orig = *net;
        net->input = input;
        net->truth = 0;
        net->train = 0;
        net->delta = 0;
        forward_network(net);
        float *out = net->output;
        *net = orig;
        return out;
    }
    ...
    void forward_network(network *netp)
    {
    #ifdef GPU
        if(netp->gpu_index >= 0){
            forward_network_gpu(netp);   
            return;
        }
    #endif
        network net = *netp;
        int i;
        for(i = 0; i < net.n; ++i){
            net.index = i;
            layer l = net.layers[i];
            if(l.delta){
                fill_cpu(l.outputs * l.batch, 0, l.delta, 1);
            }
            l.forward(l, net);
            net.input = l.output;
            if(l.truth) {
                net.truth = l.output;
            }
        }
        calc_network_cost(netp);
    }
    
    

    这个过程就是通过定义好的每一层的forward函数完成前想传播,这个函数作为layer结构体的一个指针,每一种层都有自己相应的函数定义,这里不再展示。这里是cpu的算法,gpu部分类似。

  • get_network_boxes

    detection *get_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, int *num)
    {
        detection *dets = make_network_boxes(net, thresh, num);
        fill_network_boxes(net, w, h, thresh, hier, map, relative, dets);
        return dets;
    }
    ...
    void fill_network_boxes(network *net, int w, int h, float thresh, float hier, int *map, int relative, detection *dets)
    {
        int j;
        for(j = 0; j < net->n; ++j){
            layer l = net->layers[j];
            if(l.type == YOLO){
                int count = get_yolo_detections(l, w, h, net->w, net->h, thresh, map, relative, dets);
                dets += count;
            }
            if(l.type == REGION){
                get_region_detections(l, w, h, net->w, net->h, thresh, map, hier, relative, dets);
                dets += l.w*l.h*l.n;
            }
            if(l.type == DETECTION){
                get_detection_detections(l, w, h, thresh, dets);
                dets += l.w*l.h*l.n;
            }
        }
    }
    
    //detection_layer.c
    void get_detection_detections(layer l, int w, int h, float thresh, detection *dets)
    {
        int i,j,n;
        float *predictions = l.output;
        //int per_cell = 5*num+classes;
        for (i = 0; i < l.side*l.side; ++i){
            int row = i / l.side;
            int col = i % l.side;
            for(n = 0; n < l.n; ++n){
                int index = i*l.n + n;
                int p_index = l.side*l.side*l.classes + i*l.n + n;
                float scale = predictions[p_index];
                int box_index = l.side*l.side*(l.classes + l.n) + (i*l.n + n)*4;
                box b;
                b.x = (predictions[box_index + 0] + col) / l.side * w;
                b.y = (predictions[box_index + 1] + row) / l.side * h;
                b.w = pow(predictions[box_index + 2], (l.sqrt?2:1)) * w;
                b.h = pow(predictions[box_index + 3], (l.sqrt?2:1)) * h;
                dets[index].bbox = b;
                dets[index].objectness = scale;
                for(j = 0; j < l.classes; ++j){
                    int class_index = i*l.classes;
                    float prob = scale*predictions[class_index+j];
                    dets[index].prob[j] = (prob > thresh) ? prob : 0;
                }
            }
        }
    }
    

    这一系列函数就是从网络的最后一层detection层取出结果,装入新构建的detection结构体中。通过获取数据的过程,我们可以看出来,网络前向传播后能够得到一系列框和每一个框在每一个类别的置信度,装入过程中用到了thresh,如果一个框在一个类别上的置信度不足thresh,那么这个置信度将被过滤掉

  • do_nms_sort

    //这个函数有极大的优化空间,虽然说用了快排,但是去除重复的过程非常浪费时间
    void do_nms_sort(detection *dets, int total, int classes, float thresh)
    {
        int i, j, k;
        k = total-1;
        //扫描所有结果,将objectness=0的detection结构放到数组最后,不参与计算(但参与资源回收)
        for(i = 0; i <= k; ++i){
            if(dets[i].objectness == 0){
                detection swap = dets[i];
                dets[i] = dets[k];
                dets[k] = swap;
                --k;
                --i;
            }
        }
        total = k+1;
    
        for(k = 0; k < classes; ++k){
        	//按照每一个类别进行排序
            for(i = 0; i < total; ++i){
                dets[i].sort_class = k;
            }
            qsort(dets, total, sizeof(detection), nms_comparator);
            //去除结果中的重复目标
            for(i = 0; i < total; ++i){
                if(dets[i].prob[k] == 0) continue;
                box a = dets[i].bbox;
                for(j = i+1; j < total; ++j){
                    box b = dets[j].bbox;
                    if (box_iou(a, b) > thresh){	//重复目标的判断原则:iou > thres
                        dets[j].prob[k] = 0;
                    }
                }
            }
        }
    }
    

    这个函数的功能比较复杂,目的是判断两个目标框中是否是同一个实际目标,判断标准是两个框的iou,阈值是thresh。过程如下

    • 按照每个类别的置信度,对所有detection结构体进行排序
    • 将置信度高的结构体与置信度低的所有疑似同类目标框的iou进行比较,如果iou超过阈值,那么认为两个框框中的是一个实际目标,那么后面置信度低的框被废弃

    试运行结果

    命令行参数

    ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
    

    结果图

结论

这个框架虽然通过C语言编写具有了相关速度优势,但是某些外围函数的性能惨不忍睹,考虑到我们项目的需求,后期还需要对这一部分代码进行相关优化

另外,我们希望使用python调用darknet完成我们的目标检测工作(人、人脸),darknet也提供了python接口,但是是通过cdll库完成的调用,只是将所有函数进行了简单的封装,使用复杂,性能低下,所以下一步的另一个目标是通过python扩展的方式,用C语言重新实现一套足够简单的python接口 ​

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值