YOLO---YOLOv3 with OpenCV安装与使用

本文详细介绍YoloV3与OpenCV3.4.2在Ubuntu环境下的安装步骤,包括环境配置、依赖安装、代码实现及常见问题解决,适用于希望在计算机视觉项目中应用深度学习目标检测的开发者。

Yolo v3+Opencv3.4.2安装记录

wp20180930

 

目录

 一、环境要求

(1)python版本的查看

(2)opencv版本的查看

二、文件下载

三、数据自测

四、问题与解决

(1)提示【ImportError: No module named 'cv2' Python3】??

(2)Ubuntu---python2和python3多版本共存与切换??

(3)重新进行opencv3.4.2安装??

(4)Ubuntu18.04下安装OpenCv依赖包libjasper-dev无法安装的问题??

(5)Ubuntu18.04下安装OpenCv时出现CMake Error: The source directory??

 五、文件的详细代码

(1)object_detection_yolo.py

(2)object_detection_yolo.cpp

(3)yolo_test3.py

 

正文

说明:本文是在已训练好的基础上,自测数据看结果的。下面的流程,记录一下自己的实践过程。主要参考:

1,https://blog.youkuaiyun.com/ling_xiobai/article/details/82082614  

2,https://blog.youkuaiyun.com/haoqimao_hard/article/details/82081285

3,https://hk.saowen.com/a/8c0f58aa3914c3bef46fb29eb40c77522b25fd7c0672fc9eadb2b3fdc2a8fbfb  

 

一、环境要求

本文是在Ubuntu(仅CPU)、Opencv3.4.2以上、Python3下进行测试的。如果需要,请自行配置相应的环境。

不管是用Python 2.7+还是 Python3+, 都需要用apt-get来安装Opencv所需要的包库等依赖。在开始正式安装之前, 需要弄清到底是要安装哪一个版本的,两个版本各有利弊。选择一个你看着顺眼的, 这个真没有什么特别的不同,如果觉得用着Python 3+舒服, 就选择 Python 3+; 用习惯Python 2.7+, 就装 Python 2.7+ 版本的。但是如果平时用 Python 来做一些CS相关的开发, 譬如: Machine Learning, Data Mining, NLP或者 Deep Learning, 可能会更倾向于选择 Python 2.7, 至少目前是这样的情况。 这些方面的大部分库和包都是 Python 2.7+, 譬如: NumPy, Scipyscikit-learn, 虽然社区里面都在努力地向 Python 3+ 迁移, 但是有那么一部分还是只能在 Python 2.7+下稳定工作的。

(1)Python版本的查看

个人Ubuntu18.04系统下,由于之前其他的工作需要,已经安装了 python2python3。所以,需要进行 python2python3自由切换,详见另外的笔记或者自行百度。

查看安装python的版本

方式一,$: ls /usr/bin/python*

方式二,$: python2 

    $: python3

 
  

 

(2)Opencv版本的查看

一般在安装python的时候,会安装一些opencv相关的依赖项,我们要想知道是否已经安装了opencv以及它的版本号,可以在终端下执行:pkg-config --modversion opencv

查看python是否支持opencv,可以打开pythonpython2或者python,在继续执行import cv2,看是否能正常运行,提示”>>>”python支持opencv

下图显示的是重新安装opencv3.4.2(详见后面的4问题与解决)后的显示结果,已经成功好用。

 
  

二、文件下载

需要下载yolov3.weights权重文件yolov3.cfg网络构建文件coco.namesxxx.jpgxxx.mp4文件以及其他的object_detection_yolo.cppobject_detection_yolo.py等文件。

下载链接,参考:

1,https://github.com/JackKoLing/opencv_deeplearning_practice/tree/master/pracice3_opencv_yolov3 

2,https://pan.baidu.com/s/12tI6iKTzdwYdJSxgBiyayQ#list/path=%2F&parentPath=%2F,密码:gfg1

 
  

三、数据自测

第二步后,运行一下命令:

$ cd /home/wp/opencv_DL/opencv3.4.2_yolov3

$ python3 object_detection_yolo.py --image=bird.jpg

$ python3 object_detection_yolo.py --video=run.mp4

执行命令后,就可以看到结果,并且结果保存在了同文件下了:

bird_yolo_out_py.jpgrun_yolo_out_py.avi

由于视频检测速度比较慢,进行改进一下,视频每帧取两张图片,修改为yolo_test3.py,可以稍微提高一点速度。

$ python3 yolo_test3.py --video=run.mp4

四、问题与解决

配置yoloOpencvPython环境时,出现的问题与解决。

(1) 提示【ImportError: No module named 'cv2' Python3】???

参考https://stackoverflow.com/questions/45643650/importerror-no-module-named-cv2-python3问题类似,但是通过提问中的解决方法,没有解决。自行下载opencv3.4.2安装包,进行了重新安装与配置,结果就好用了,但是用pkg-config --modversion opencv命令查看显示opencv3.2.0,原因不明。

 
  

 (2) Ubuntu---python2python3多版本共存与切换

可以参考https://blog.youkuaiyun.com/kan2016/article/details/81639292 和 https://www.cnblogs.com/hwlong/p/9216653.html

(2.1)若没有安装python,则可以使用pip(也可以anacanda)安装python

第一步,度娘ubuntu 安装pip

# 1. 更新系统包

sudo apt-get update

sudo apt-get upgrade

# 2. 安装Pip

sudo apt-get install python-pip

# 3. 检查 pip 是否安装成功

pip -V

其次,安装python

$ sudo apt install python      #安装python2,因为系统已经安装了python3

$ sudo apt install python-pip   #指定python2pip,使用为pip

$ sudo apt install python3-pip  #指定为python3pip,使用为pip3

接着,查看python是否安装成功。

 $ python --version

 $ python3 --version

 
  

 (2.2)ubuntu切换Python版本

我们可以使用 update-alternatives 来为整个系统更改 Python 版本。参考https://blog.youkuaiyun.com/cym_lmy/article/details/78315139https://www.cnblogs.com/hwlong/p/9216653.html(图文详情很好)。正常情况基于ubuntudebian开发的发行版本都支持。

首先,罗列出所有可用的 python 替代版本信息:

$ sudo update-alternatives --list python

update-alternatives: error: no alternatives

for python

如果出现以上所示的错误信息,则表示 Python 的替代版本尚未被 update-alternatives 命令识别。想解决这个问题,需要更新一下替代列表,将 python2.7 python3.6 放入其中。

 

打开终端分别输入下面两条命令:

$ sudo update-alternatives –install /usr/bin/python python /usr/bin/python2 1

$ sudo update-alternatives –install /usr/bin/python python /usr/bin/python3 2

 

如果需要重新切换回python只需要在终端输入:

$ sudo update-alternatives --config python

然后选者你需要的python版本,输入序号回车即可

再,终端输入:

$ python

如果无误,此时python版本应该切换到默认的python3了。

 

最后说明:移除替代版本方法。一旦我们的系统中不再存在某个 Python 的替代版本时,我们可以将其从 update-alternatives 列表中删除掉。例如,我们可以将列表中的 python2.7 版本移除掉。

$ sudo update-alternatives --remove python /usr/bin/python2.7

update-alternatives: removing manually selected alternative - switching python to auto mode

update-alternatives: using

/usr/bin/python3.4 to provide

/usr/bin/python (python)

in auto mode

 

 
  

 

(3) 重新进行opencv3.4.2安装???

解决Ubuntuopencv2opencv3多版本共存问题,可以参考

https://blog.youkuaiyun.com/Hansry/article/details/75309906https://blog.youkuaiyun.com/liuxiaodong400/article/details/81089058

这里,个人自己重新在python3下安装与配置opencv3.4.2

第一步,下载opencv源码。

opencv各版本下载地址,https://opencv.org/releases.html(官网)。点击sources源文件下载,本人下载的是3.4.2版本的。

第二步,解压opencv源码。

找到下载的opencv-3.4.2文件夹,进入后:

$ unzip opencv-3.4.2.zip

 

在解压好的文件夹中打开终端,创建文件夹并打开

mkdir build

cd build

 

第三步,安装OpenCV依赖文件。

这一步,也可以在第一步或者第二步之前完成。

$ sudo apt-get update

$ sudo apt-get upgrade

$ sudo apt-get install build-essential

$ sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev

$ sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev # 处理图像所需的包

$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev liblapacke-dev

$ sudo apt-get install libxvidcore-dev libx264-dev # 处理视频所需的包

$ sudo apt-get install libatlas-base-dev gfortran # 优化opencv功能

$ sudo apt-get install ffmpeg

每一步的解释可以参考https://blog.youkuaiyun.com/abcsunl/article/details/63686496

 

第四步,用Cmake配置opencv的编译环境。

首先,需要安装Cmake。如果安装过Cmake,省略这一步即可。

参考https://www.cnblogs.com/TooyLee/p/6052387.html,执行如下安装:

准备工作:官网下载cmake-3.11.4.tar.gzhttps://cmake.org/download/),这里注意下载的版本。解压后的文件夹需要包含bootstrap文件(本人下载了几个版本都没有,原来是下载的文件不对。下载最上面的就行了),如下:

 
  

 1.解压文件tar -xvf cmake-3.11.4.tar.gz,并修改文件权限chmod -R 777 cmake-3.11.4

2.检测gccg++是否安装,如果没有则需安装gcc-g++sudo apt-get install build-essential(或者直接执行这两条命令sudo apt-get install gcc,sudo apt-get install g++

3.进入cmake-3.6.3 进入命令 cd cmake-3.6.3

4.执行sudo ./bootstrap

5.执行sudo make

6.执行 sudo make install

7.执行 cmake –version,返回cmake版本信息,则说明安装成功。

 
  

其次,这里配置编译opencv (NVIDIA CUDA版本),执行如下命令:

cmake -D CMAKE_BUILD_TYPE=RELEASE \

    -D CMAKE_INSTALL_PREFIX=/home/wp/opencv3.4.2/install \

    -D INSTALL_PYTHON_EXAMPLES=ON \

    -D INSTALL_C_EXAMPLES=OFF \

    -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-3.2.0/modules \

    -D PYTHON3_EXECUTABLE=/usr/bin/python3 \

    -D PYTHON_INCLUDE_DIR=/usr/include/python3.6 \

    -D PYTHON_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so \

    -D PYTHON3_NUMPY_INCLUDE_DIRS=/usr/local/lib/python3.6/dist-packages/numpy/core/include \

    -D WITH_TBB=ON \

    -D WITH_V4L=ON \

    -D WITH_QT=ON \    

    -D WITH_GTK=ON \

    -D WITH_OPENGL=ON \

    -DBUILD_EXAMPLES=ON ...

但是这个时候,总是运行不过去,马上就碰到了个文件路径没有找到的问题,解决方法是去掉-D后面的空格,成功解决。在程序中输入:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local  PYTHON3_EXECUTABLE=/usr/bin/python3 PYTHON_INCLUDE_DIR=/usr/include/python3.6 PYTHON_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.6m.so PYTHON3_NUMPY_INCLUDE_DIRS=/usr/local/lib/python3.6/dist-packages/numpy/core/include ..

等待一会儿,配置完成。

 

第五步,opencv编译。

  $ cd build

 $ sudo make -j8

$ sudo make install

等待,编译完成。

 

第六步,opencv测试。

安装完成以后,重启下电脑。

如果导入cv2模块报错,运行下面代码:

$ sudo pip install opencv-python

方法一:打开python console,检测opencv的版本

import cv2

cv2.__version__

如果正确安装的话则会输出3.4.2

方法二:新建文件 test.py, 输入一下内容

import cv2

if __name__ == '__main__':

    print(cv2.__version__)

 
  

 (4) Ubuntu18.04下安装OpenCv依赖包libjasper-dev无法安装的问题???

可以参考https://blog.youkuaiyun.com/weixin_41053564/article/details/81254410解决问题。

ubuntu18.04系统上安装opencv但是在安装依赖包的过程中,有一个依赖包,libjasper-dev在使用命令:

$ sudo apt-get install libjaster-dev

提示:errorE: unable to locate libjasper-dev

则通过如下方式解决:

$ sudo add-apt-repository "deb http://security.ubuntu.com/ubuntu xenial-security main"

$ sudo apt update

$ sudo apt install libjasper-dev

【不好用,可改用$ sudo apt install libjasper1 libjasper-dev

这样,可以成功的解决问题,其中libjasper1libjasper-dev的依赖包。

 

(5) Ubuntu18.04下安装OpenCv时出现CMake Error: The source directory

Ubuntu环境下OpenCV编译时:CMake error the source directory does not exist,解决办法是:去掉-D后面的空格。可以参考https://blog.youkuaiyun.com/sparkexpert/article/details/70941449https://blog.youkuaiyun.com/wangleiwavesharp/article/details/80610529

 

五、文件的详细代码

(5.1object_detection_yolo.py

=============object_detection_yolo.py===========

#每一步详细解释见网址:https://www.learnopencv.com/deep-learning-based-object-detection-using-yolov3-with-opencv-python-c/

https://hk.saowen.com/a/8c0f58aa3914c3bef46fb29eb40c77522b25fd7c0672fc9eadb2b3fdc2a8fbfb 

# This code is written at BigVision LLC. It is based on the OpenCV project. It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html

 

# Usage example:  python3 object_detection_yolo.py --video=run.mp4

#                 python3 object_detection_yolo.py --image=bird.jpg

 

import cv2 as cv

import argparse

import sys

import numpy as np

import os.path

 

# Initialize the parameters

confThreshold = 0.5  #Confidence threshold

nmsThreshold = 0.4   #Non-maximum suppression threshold

inpWidth = 416       #Width of network's input image

inpHeight = 416      #Height of network's input image

 

parser = argparse.ArgumentParser(description='Object Detection using YOLO in OPENCV')

parser.add_argument('--image', help='Path to image file.')

parser.add_argument('--video', help='Path to video file.')

args = parser.parse_args()

        

# Load names of classes

classesFile = "coco.names";

classes = None

with open(classesFile, 'rt') as f:

    classes = f.read().rstrip('\n').split('\n')

 

# Give the configuration and weight files for the model and load the network using them.

modelConfiguration = "yolov3.cfg";

modelWeights = "yolov3.weights";

 

net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)

net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

 

# Get the names of the output layers

def getOutputsNames(net):

    # Get the names of all the layers in the network

    layersNames = net.getLayerNames()

    # Get the names of the output layers, i.e. the layers with unconnected outputs

    return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]

 

# Draw the predicted bounding box

def drawPred(classId, conf, left, top, right, bottom):

    # Draw a bounding box.

    cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)

    

    label = '%.2f' % conf

        

    # Get the label for the class name and its confidence

    if classes:

        assert(classId < len(classes))

        label = '%s:%s' % (classes[classId], label)

 

    #Display the label at the top of the bounding box

    labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)

    top = max(top, labelSize[1])

    cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED)

    cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,0), 1)

 

# Remove the bounding boxes with low confidence using non-maxima suppression

def postprocess(frame, outs):

    frameHeight = frame.shape[0]

    frameWidth = frame.shape[1]

 

    classIds = []

    confidences = []

    boxes = []

    # Scan through all the bounding boxes output from the network and keep only the

    # ones with high confidence scores. Assign the box's class label as the class with the highest score.

    classIds = []

    confidences = []

    boxes = []

    for out in outs:

        for detection in out:

            scores = detection[5:]

            classId = np.argmax(scores)

            confidence = scores[classId]

            if confidence > confThreshold:

                center_x = int(detection[0] * frameWidth)

                center_y = int(detection[1] * frameHeight)

                width = int(detection[2] * frameWidth)

                height = int(detection[3] * frameHeight)

                left = int(center_x - width / 2)

                top = int(center_y - height / 2)

                classIds.append(classId)

                confidences.append(float(confidence))

                boxes.append([left, top, width, height])

 

    # Perform non maximum suppression to eliminate redundant overlapping boxes with

    # lower confidences.

    indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)

    for i in indices:

        i = i[0]

        box = boxes[i]

        left = box[0]

        top = box[1]

        width = box[2]

        height = box[3]

        drawPred(classIds[i], confidences[i], left, top, left + width, top + height)

 

# Process inputs

winName = 'Deep learning object detection in OpenCV'

cv.namedWindow(winName, cv.WINDOW_NORMAL)

 

outputFile = "yolo_out_py.avi"

if (args.image):

    # Open the image file

    if not os.path.isfile(args.image):

        print("Input image file ", args.image, " doesn't exist")

        sys.exit(1)

    cap = cv.VideoCapture(args.image)

    outputFile = args.image[:-4]+'_yolo_out_py.jpg'

elif (args.video):

    # Open the video file

    if not os.path.isfile(args.video):

        print("Input video file ", args.video, " doesn't exist")

        sys.exit(1)

    cap = cv.VideoCapture(args.video)

    outputFile = args.video[:-4]+'_yolo_out_py.avi'

else:

    # Webcam input

    cap = cv.VideoCapture(0)

 

# Get the video writer initialized to save the output video

if (not args.image):

    vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 30, (round(cap.get(cv.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv.CAP_PROP_FRAME_HEIGHT))))

 

while cv.waitKey(1) < 0:

    

    # get frame from the video

    hasFrame, frame = cap.read()

    

    # Stop the program if reached end of video

    if not hasFrame:

        print("Done processing !!!")

        print("Output file is stored as ", outputFile)

        cv.waitKey(3000)

        break

 

    # Create a 4D blob from a frame.

    blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)

 

    # Sets the input to the network

    net.setInput(blob)

 

    # Runs the forward pass to get output of the output layers

    outs = net.forward(getOutputsNames(net))

 

    # Remove the bounding boxes with low confidence

    postprocess(frame, outs)

 

    # Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)

    t, _ = net.getPerfProfile()

    label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())

    cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))

 

    # Write the frame with the detection boxes

    if (args.image):

        cv.imwrite(outputFile, frame.astype(np.uint8));

    else:

        vid_writer.write(frame.astype(np.uint8))

 

    cv.imshow(winName, frame)

===============================结束============================

 

(5.object_detection_yolo.cpp

=============object_detection_yolo.cpp=============

// This code is written at BigVision LLC. It is based on the OpenCV project. It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html

 

// Usage example:  ./object_detection_yolo.out --video=run.mp4

//                 ./object_detection_yolo.out --image=bird.jpg

#include <fstream>

#include <sstream>

#include <iostream>

 

#include <opencv2/dnn.hpp>

#include <opencv2/imgproc.hpp>

#include <opencv2/highgui.hpp>

 

const char* keys =

"{help h usage ? | | Usage examples: \n\t\t./object_detection_yolo.out --image=dog.jpg \n\t\t./object_detection_yolo.out --video=run_sm.mp4}"

"{image i        |<none>| input image   }"

"{video v       |<none>| input video   }"

;

using namespace cv;

using namespace dnn;

using namespace std;

 

// Initialize the parameters

float confThreshold = 0.5; // Confidence threshold

float nmsThreshold = 0.4;  // Non-maximum suppression threshold

int inpWidth = 416;  // Width of network's input image

int inpHeight = 416; // Height of network's input image

vector<string> classes;

 

// Remove the bounding boxes with low confidence using non-maxima suppression

void postprocess(Mat& frame, const vector<Mat>& out);

 

// Draw the predicted bounding box

void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);

 

// Get the names of the output layers

vector<String> getOutputsNames(const Net& net);

 

int main(int argc, char** argv)

{

    CommandLineParser parser(argc, argv, keys);

    parser.about("Use this script to run object detection using YOLO3 in OpenCV.");

    if (parser.has("help"))

    {

        parser.printMessage();

        return 0;

    }

    // Load names of classes

    string classesFile = "coco.names";

    ifstream ifs(classesFile.c_str());

    string line;

    while (getline(ifs, line)) classes.push_back(line);

    

    // Give the configuration and weight files for the model

    String modelConfiguration = "yolov3.cfg";

    String modelWeights = "yolov3.weights";

 

    // Load the network

    Net net = readNetFromDarknet(modelConfiguration, modelWeights);

    net.setPreferableBackend(DNN_BACKEND_OPENCV);

    net.setPreferableTarget(DNN_TARGET_CPU);

    

    // Open a video file or an image file or a camera stream.

    string str, outputFile;

    VideoCapture cap;

    VideoWriter video;

    Mat frame, blob;

    

    try {

        

        outputFile = "yolo_out_cpp.avi";

        if (parser.has("image"))

        {

            // Open the image file

            str = parser.get<String>("image");

            ifstream ifile(str);

            if (!ifile) throw("error");

            cap.open(str);

            str.replace(str.end()-4, str.end(), "_yolo_out_cpp.jpg");

            outputFile = str;

        }

        else if (parser.has("video"))

        {

            // Open the video file

            str = parser.get<String>("video");

            ifstream ifile(str);

            if (!ifile) throw("error");

            cap.open(str);

            str.replace(str.end()-4, str.end(), "_yolo_out_cpp.avi");

            outputFile = str;

        }

        // Open the webcaom

        else cap.open(parser.get<int>("device"));

        

    }

    catch(...) {

        cout << "Could not open the input image/video stream" << endl;

        return 0;

    }

    

    // Get the video writer initialized to save the output video

    if (!parser.has("image")) {

        video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));

    }

    

    // Create a window

    static const string kWinName = "Deep learning object detection in OpenCV";

    namedWindow(kWinName, WINDOW_NORMAL);

 

    // Process frames.

    while (waitKey(1) < 0)

    {

        // get frame from the video

        cap >> frame;

 

        // Stop the program if reached end of video

        if (frame.empty()) {

            cout << "Done processing !!!" << endl;

            cout << "Output file is stored as " << outputFile << endl;

            waitKey(3000);

            break;

        }

        // Create a 4D blob from a frame.

        blobFromImage(frame, blob, 1/255.0, cvSize(inpWidth, inpHeight), Scalar(0,0,0), true, false);

        

        //Sets the input to the network

        net.setInput(blob);

        

        // Runs the forward pass to get output of the output layers

        vector<Mat> outs;

        net.forward(outs, getOutputsNames(net));

        

        // Remove the bounding boxes with low confidence

        postprocess(frame, outs);

        

        // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)

        vector<double> layersTimes;

        double freq = getTickFrequency() / 1000;

        double t = net.getPerfProfile(layersTimes) / freq;

        string label = format("Inference time for a frame : %.2f ms", t);

        putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));

        

        // Write the frame with the detection boxes

        Mat detectedFrame;

        frame.convertTo(detectedFrame, CV_8U);

        if (parser.has("image")) imwrite(outputFile, detectedFrame);

        else video.write(detectedFrame);

        

        imshow(kWinName, frame);

        

    }

    

    cap.release();

    if (!parser.has("image")) video.release();

 

    return 0;

}

 

// Remove the bounding boxes with low confidence using non-maxima suppression

void postprocess(Mat& frame, const vector<Mat>& outs)

{

    vector<int> classIds;

    vector<float> confidences;

    vector<Rect> boxes;

    

    for (size_t i = 0; i < outs.size(); ++i)

    {

        // Scan through all the bounding boxes output from the network and keep only the

        // ones with high confidence scores. Assign the box's class label as the class

        // with the highest score for the box.

        float* data = (float*)outs[i].data;

        for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)

        {

            Mat scores = outs[i].row(j).colRange(5, outs[i].cols);

            Point classIdPoint;

            double confidence;

            // Get the value and location of the maximum score

            minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);

            if (confidence > confThreshold)

            {

                int centerX = (int)(data[0] * frame.cols);

                int centerY = (int)(data[1] * frame.rows);

                int width = (int)(data[2] * frame.cols);

                int height = (int)(data[3] * frame.rows);

                int left = centerX - width / 2;

                int top = centerY - height / 2;

                

                classIds.push_back(classIdPoint.x);

                confidences.push_back((float)confidence);

                boxes.push_back(Rect(left, top, width, height));

            }

        }

    }

    

    // Perform non maximum suppression to eliminate redundant overlapping boxes with

    // lower confidences

    vector<int> indices;

    NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);

    for (size_t i = 0; i < indices.size(); ++i)

    {

        int idx = indices[i];

        Rect box = boxes[idx];

        drawPred(classIds[idx], confidences[idx], box.x, box.y,

                 box.x + box.width, box.y + box.height, frame);

    }

}

 

// Draw the predicted bounding box

void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)

{

    //Draw a rectangle displaying the bounding box

    rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 0, 255));

    

    //Get the label for the class name and its confidence

    string label = format("%.2f", conf);

    if (!classes.empty())

    {

        CV_Assert(classId < (int)classes.size());

        label = classes[classId] + ":" + label;

    }

    

    //Display the label at the top of the bounding box

    int baseLine;

    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

    top = max(top, labelSize.height);

    putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(255,255,255));

}

 

// Get the names of the output layers

vector<String> getOutputsNames(const Net& net)

{

    static vector<String> names;

    if (names.empty())

    {

        //Get the indices of the output layers, i.e. the layers with unconnected outputs

        vector<int> outLayers = net.getUnconnectedOutLayers();

        

        //get the names of all the layers in the network

        vector<String> layersNames = net.getLayerNames();

        

        // Get the names of the output layers in names

        names.resize(outLayers.size());

        for (size_t i = 0; i < outLayers.size(); ++i)

        names[i] = layersNames[outLayers[i] - 1];

    }

    return names;

}

==========================结束=========================

 

(5.3yolo_test3.py

=====================yolo_test3.py==============================

## -*- coding: utf-8 -*-

# This code is written at BigVision LLC. It is based on the OpenCV project. It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html

 

# Usage example:  python3 object_detection_yolo.py --video=run.mp4

#                 python3 object_detection_yolo.py --image=bird.jpg

 

import cv2 as cv

import argparse

import sys

import numpy as np

import os.path

import os

import time

 

# Initialize the parameters

confThreshold = 0.5  #Confidence threshold

nmsThreshold = 0.4   #Non-maximum suppression threshold

inpWidth = 416       #Width of network's input image

inpHeight = 416      #Height of network's input image

 

parser = argparse.ArgumentParser(description='Object Detection using YOLO in OPENCV')

parser.add_argument('--image', help='Path to image file.')

parser.add_argument('--video', help='Path to video file.')

args = parser.parse_args()

        

# Load names of classes

classesFile = "coco.names";

classes = None

with open(classesFile, 'rt') as f:

    classes = f.read().rstrip('\n').split('\n')

 

# Give the configuration and weight files for the model and load the network using them.

modelConfiguration = "yolov3.cfg";

modelWeights = "yolov3.weights";

 

net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)

net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

 

# Get the names of the output layers

def getOutputsNames(net):

    # Get the names of all the layers in the network

    layersNames = net.getLayerNames()

    # Get the names of the output layers, i.e. the layers with unconnected outputs

    return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]

 

# Draw the predicted bounding box

def drawPred(classId, conf, left, top, right, bottom):

    # Draw a bounding box.

    cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 2)

    

    label = '%.2f' % conf

        

    # Get the label for the class name and its confidence

    if classes:

        assert(classId < len(classes))

        label = '%s:%s' % (classes[classId], label)

 

    #Display the label at the top of the bounding box

    labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)

    top = max(top, labelSize[1])

    #cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED)

    #cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,0), 1)

 

# Remove the bounding boxes with low confidence using non-maxima suppression

def postprocess(frame, outs):

    frameHeight = frame.shape[0]

    frameWidth = frame.shape[1]

 

    classIds = []

    confidences = []

    boxes = []

    # Scan through all the bounding boxes output from the network and keep only the

    # ones with high confidence scores. Assign the box's class label as the class with the highest score.

    classIds = []

    confidences = []

    boxes = []

    for out in outs:

        for detection in out:

            scores = detection[5:]

            classId = np.argmax(scores)

            confidence = scores[classId]

            if confidence > confThreshold:

                center_x = int(detection[0] * frameWidth)

                center_y = int(detection[1] * frameHeight)

                width = int(detection[2] * frameWidth)

                height = int(detection[3] * frameHeight)

                left = int(center_x - width / 2)

                top = int(center_y - height / 2)

                classIds.append(classId)

                confidences.append(float(confidence))

                boxes.append([left, top, width, height])

 

    # Perform non maximum suppression to eliminate redundant overlapping boxes with

    # lower confidences.

    indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)

 

    for i in indices:

        i = i[0]

        box = boxes[i]

        left = box[0]

        top = box[1]

        width = box[2]

        height = box[3]

        drawPred(classIds[i], confidences[i], left, top, left + width, top + height)

 

# Process inputs

winName = 'Deep learning object detection in OpenCV'

cv.namedWindow(winName, cv.WINDOW_NORMAL)

 

outputFile = "yolo_out_py.jpg"

 

pic_number = 1

 

g = os.walk(r"./车辆图片") 

 

for path,dir_list,file_list in g:  

    for file_name in file_list:

 

        time.sleep(2)  

        path_name = os.path.join(path, file_name)

        print(path_name)

        print(file_name)

 

        frame = cv.imread(path_name)

        print(frame.shape)

 

        outputFile = str(pic_number) + '_yolo_out_py.jpg'

        pic_number += 1

 

     # Create a 4D blob from a frame.

        blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)

 

    # Sets the input to the network

        net.setInput(blob)

 

     # Runs the forward pass to get output of the output layers

        outs = net.forward(getOutputsNames(net))

 

     # Remove the bounding boxes with low confidence

        postprocess(frame, outs)

 

     # Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)

        t, _ = net.getPerfProfile()

        label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())

        cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))

 

     # Write the frame with the detection boxes

        cv.imwrite(outputFile, frame.astype(np.uint8));

        cv.imshow(winName, frame)

=======================================结束=================================================

转载于:https://www.cnblogs.com/carle-09/p/9752600.html

<think>好的,用户现在想了解如何在Windows上使用Python调用已经部署好的YOLO-World和SAM模型。之前我已经给过一个部署指南,现在用户进一步询问具体的调用方法,特别是参考了站内引用中的内容,需要结合这些引用信息来优化回答。首先,我需要回顾用户提供的引用内容。引用[1]提到在PyCharm中使用Python进行YOLOv8的测试,包括图片/视频预测。这可能意味着用户希望了解如何在IDE中编写代码调用模型。引用[2]涉及AI应用开发,包括YOLO-World,可能强调实际应用中的集成和开发技巧。引用[3]讨论了使用TensorRT加速部署YOLOv8,虽然用户现在的问题是关于调用已部署的模型,但可能可以提到加速方法作为优化建议。 接下来,我需要结合之前的回答,用户之前的问题是关于部署,现在转向调用。需要确保回答的结构清晰,分步骤说明如何在Python中调用这两个模型,可能包括初始化模型、推理、结果处理等。同时要注意引用中的信息,比如在PyCharm中编写代码,或者使用ONNX格式的模型,这些可以作为补充内容。 另外,用户可能关心代码示例的细节,比如路径处理、依赖库的导入,以及常见问题的解决方法。例如,Windows下的路径问题,CUDA是否可用,模型加载是否正确等。需要确保代码示例正确,并添加必要的注释和注意事项。 还要考虑用户可能的实际应用场景,比如实时检测、结合SAM的分割结果,是否需要多线程处理或者性能优化。例如引用[3]中的TensorRT加速,可以作为一个优化点提到,但用户当前问题可能更关注基础调用方法,所以可能需要放在优化建议或相关问题中。 需要检查是否有遗漏的步骤,比如环境准备是否已经完成,是否需要安装额外的库,比如OpenCV、PyTorch等。在代码示例中,要确保导入正确的模块,比如从ultralytics导入YOLOWorld,以及segment_anything的相关模块。最后,生成的相关问题需要调用模型相关,比如性能优化、输入类型、集成到其他应用等,参考用户之前的问题和引用内容,确保问题有针对性。</think>### 如何在Windows上使用Python调用已部署的YOLO-World和SAM模型 --- #### 一、环境验证准备 1. **确认Python环境** 确保已通过Conda创建并激活Python 3.8+环境[^1]: ```bash conda activate yolo_env ``` 2. **检查CUDA可用性** 在Python中验证PyTorchCUDA是否正常工作: ```python import torch print(torch.__version__) # 应显示安装版本(如2.0.1) print(torch.cuda.is_available()) # 需返回True ``` --- #### 二、YOLO-World调用方法 ##### 1. 模型初始化 ```python from ultralytics import YOLOWorld import cv2 # 加载预训练模型(支持动态类别设置) model = YOLOWorld("yolov8s-world.pt") model.set_classes(["cat", "dog", "tree"]) # 自定义检测类别[^3] ``` ##### 2. 执行推理 ```python # 单张图片推理 image = cv2.imread("test.jpg") results = model.predict(image, conf=0.5) # 置信度阈值 # 结果可视化 annotated_image = results[0].plot() cv2.imshow("Detection", annotated_image) cv2.waitKey(0) ``` ##### 3. 视频流处理 ```python cap = cv2.VideoCapture(0) # 摄像头输入 while cap.isOpened(): ret, frame = cap.read() if not ret: break results = model.predict(frame) cv2.imshow("Live", results[0].plot()) if cv2.waitKey(1) & 0xFF == ord(&#39;q&#39;): break cap.release() ``` --- #### 三、SAM模型调用方法 ##### 1. 加载权重文件 ```python from segment_anything import sam_model_registry, SamPredictor # 初始化SAM模型(需提前下载权重文件) sam_checkpoint = "sam_vit_b_01ec64.pth" model_type = "vit_b" sam = sam_model_registry[model_type](checkpoint=sam_checkpoint) predictor = SamPredictor(sam) ``` ##### 2. 基于YOLO检测结果的分割 ```python # 假设已通过YOLO获得检测框 yolo_boxes = results[0].boxes.xyxy.cpu().numpy() # 对每个检测目标执行分割 predictor.set_image(image) combined_masks = [] for box in yolo_boxes: masks, _, _ = predictor.predict(box=box) combined_masks.append(masks[0]) # 取置信度最高的掩码 # 可视化分割结果(示例) for mask in combined_masks: image[mask] = (0, 255, 0) # 绿色覆盖掩码区域 cv2.imshow("Segmented", image) cv2.waitKey(0) ``` --- #### 四、性能优化技巧 1. **TensorRT加速(需NVIDIA GPU)** 将模型转换为ONNX格式后使用TensorRT部署[^3]: ```bash yolo export model=yolov8s-world.pt format=onnx dynamic=True simplify=True ``` 2. **批处理推理** 同时处理多帧提升效率: ```python batch_images = [img1, img2, img3] # 图像列表 batch_results = model.predict(batch_images) ``` 3. **量化压缩** 使用PyTorch量化降低计算开销: ```python model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) ``` --- #### 五、常见问题解决 1. **路径问题** Windows路径需使用原始字符串或双反斜杠: ```python model_path = r"C:\Models\yolov8s-world.pt" ``` 2. **内存不足** 限制输入分辨率或使用轻量级模型: ```python results = model.predict(image, imgsz=640) # 降低分辨率 ``` 3. **多线程处理** 使用`ThreadPoolExecutor`实现异步推理: ```python from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor() as executor: future = executor.submit(model.predict, image) ``` --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值