最近有个项目需要做视觉自动化处理的工具,最后选用的软件为python,刚好这个机会进行系统学习。短时间学习,需要快速开发,所以记录要点步骤,防止忘记。
链接:
开源 python 应用 开发(一)python、pip、pyAutogui、python opencv安装-优快云博客
开源 python 应用 开发(二)基于pyautogui、open cv 视觉识别的工具自动化-优快云博客
开源 python 应用 开发(三)python语法介绍-优快云博客
开源 python 应用 开发(四)python文件和系统综合应用-优快云博客
推荐链接:
开源 Arkts 鸿蒙应用 开发(一)工程文件分析-优快云博客
开源 Arkts 鸿蒙应用 开发(二)封装库.har制作和应用-优快云博客
开源 Arkts 鸿蒙应用 开发(三)Arkts的介绍-优快云博客
开源 Arkts 鸿蒙应用 开发(四)布局和常用控件-优快云博客
开源 Arkts 鸿蒙应用 开发(五)控件组成和复杂控件-优快云博客
推荐链接:
开源 java android app 开发(一)开发环境的搭建-优快云博客
开源 java android app 开发(二)工程文件结构-优快云博客
开源 java android app 开发(三)GUI界面布局和常用组件-优快云博客
开源 java android app 开发(四)GUI界面重要组件-优快云博客
开源 java android app 开发(五)文件和数据库存储-优快云博客
开源 java android app 开发(六)多媒体使用-优快云博客
开源 java android app 开发(七)通讯之Tcp和Http-优快云博客
开源 java android app 开发(八)通讯之Mqtt和Ble-优快云博客
开源 java android app 开发(九)后台之线程和服务-优快云博客
开源 java android app 开发(十)广播机制-优快云博客
开源 java android app 开发(十一)调试、发布-优快云博客
开源 java android app 开发(十二)封库.aar-优快云博客
推荐链接:
开源C# .net mvc 开发(一)WEB搭建_c#部署web程序-优快云博客
开源 C# .net mvc 开发(二)网站快速搭建_c#网站开发-优快云博客
开源 C# .net mvc 开发(三)WEB内外网访问(VS发布、IIS配置网站、花生壳外网穿刺访问)_c# mvc 域名下不可訪問內網,內網下可以訪問域名-优快云博客
开源 C# .net mvc 开发(四)工程结构、页面提交以及显示_c#工程结构-优快云博客
开源 C# .net mvc 开发(五)常用代码快速开发_c# mvc开发-优快云博客
本章节内容如下:实现了使用 YOLOv3 (You Only Look Once version 3) 深度学习模型进行目标检测的功能。识别了香蕉、手机、夜景中的汽车和行人,第一次玩感觉挺有意思的。对图片有些要求自己更换图片,最好选高清大图。
1. YOLOv3 模型
2. 主要函数
3. 所有代码
4. 效果演示
一、YOLOv3 模型
使用预训练的 YOLOv3 模型(需要 yolov3.cfg 和 yolov3.weights 文件)
基于 COCO 数据集(80 个类别)
这个代码需要3个文件,分别是yolov3.cfg、yolov3.weights、coco.names网上很容易能搜到。
更换对应的 .cfg文件和.weights文件
链接:YOLO: Real-Time Object Detection
yolov3.cfg文件
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
# Downsample
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
# Downsample
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[shortcut]
from=-3
activation=linear
######################
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 61
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 36
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
coco.names文件
person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
sofa
pottedplant
bed
diningtable
toilet
tvmonitor
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
二、主要函数
2.1 load_yolo() 函数
功能:加载 YOLOv3 模型和相关文件
def load_yolo():
# 获取当前目录
current_dir = os.path.dirname(os.path.abspath(__file__))
# 构建完整路径
cfg_path = os.path.join(current_dir, "yolov3.cfg")
weights_path = os.path.join(current_dir, "yolov3.weights")
names_path = os.path.join(current_dir, "coco.names")
# 检查文件是否存在
if not all(os.path.exists(f) for f in [cfg_path, weights_path, names_path]):
raise FileNotFoundError("缺少YOLO模型文件,请确保yolov3.cfg、yolov3.weights和coco.names在脚本目录下")
# 加载网络
net = cv2.dnn.readNet(weights_path, cfg_path)
with open(names_path, "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
return net, classes, output_layers
2.2 detect_objects(img, net, output_layers) 函数
功能:对输入图像进行目标检测
def detect_objects(img, net, output_layers):
# 从图像创建blob
blob = cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(416, 416),
swapRB=True, crop=False)
# 设置输入并进行前向传播
net.setInput(blob)
outputs = net.forward(output_layers)
return outputs
2.3 get_box_dimensions(outputs, height, width) 函数
功能:处理网络输出,提取边界框信息
def get_box_dimensions(outputs, height, width):
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5: # 置信度阈值
# 计算边界框坐标
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# 矩形左上角坐标
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
return boxes, confidences, class_ids
2.4 draw_labels(boxes, confidences, class_ids, classes, img) 函数
功能:在图像上绘制检测结果
def draw_labels(boxes, confidences, class_ids, classes, img):
# 应用非极大值抑制
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
# 设置颜色
colors = np.random.uniform(0, 255, size=(len(classes), 3))
if len(indexes) > 0:
for i in indexes.flatten():
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i], 2))
color = colors[class_ids[i]]
# 绘制边界框和标签
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
cv2.putText(img, f"{label} {confidence}", (x, y - 5),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
return img
2.5 object_detection(image_path) 函数
功能:主函数,整合整个检测流程
def object_detection(image_path):
try:
net, classes, output_layers = load_yolo()
img = cv2.imread(image_path)
if img is None:
raise FileNotFoundError(f"无法加载图像: {image_path}")
height, width = img.shape[:2]
outputs = detect_objects(img, net, output_layers)
boxes, confidences, class_ids = get_box_dimensions(outputs, height, width)
img = draw_labels(boxes, confidences, class_ids, classes, img)
cv2.imshow("Object Detection", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
except Exception as e:
print(f"发生错误: {str(e)}")
三、所有代码
import cv2
import numpy as np
import os
def load_yolo():
# 获取当前目录
current_dir = os.path.dirname(os.path.abspath(__file__))
# 构建完整路径
cfg_path = os.path.join(current_dir, "yolov3.cfg")
weights_path = os.path.join(current_dir, "yolov3.weights")
names_path = os.path.join(current_dir, "coco.names")
# 检查文件是否存在
if not all(os.path.exists(f) for f in [cfg_path, weights_path, names_path]):
raise FileNotFoundError("缺少YOLO模型文件,请确保yolov3.cfg、yolov3.weights和coco.names在脚本目录下")
# 加载网络
net = cv2.dnn.readNet(weights_path, cfg_path)
with open(names_path, "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
return net, classes, output_layers
def detect_objects(img, net, output_layers):
# 从图像创建blob
blob = cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(416, 416),
swapRB=True, crop=False)
# 设置输入并进行前向传播
net.setInput(blob)
outputs = net.forward(output_layers)
return outputs
def get_box_dimensions(outputs, height, width):
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5: # 置信度阈值
# 计算边界框坐标
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# 矩形左上角坐标
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
return boxes, confidences, class_ids
def draw_labels(boxes, confidences, class_ids, classes, img):
# 应用非极大值抑制
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
# 设置颜色
colors = np.random.uniform(0, 255, size=(len(classes), 3))
if len(indexes) > 0:
for i in indexes.flatten():
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i], 2))
color = colors[class_ids[i]]
# 绘制边界框和标签
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
cv2.putText(img, f"{label} {confidence}", (x, y - 5),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
return img
def object_detection(image_path):
try:
net, classes, output_layers = load_yolo()
img = cv2.imread(image_path)
if img is None:
raise FileNotFoundError(f"无法加载图像: {image_path}")
height, width = img.shape[:2]
outputs = detect_objects(img, net, output_layers)
boxes, confidences, class_ids = get_box_dimensions(outputs, height, width)
img = draw_labels(boxes, confidences, class_ids, classes, img)
cv2.imshow("Object Detection", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
except Exception as e:
print(f"发生错误: {str(e)}")
if __name__ == "__main__":
# 使用示例
image_path = "myimg.jpg" # 替换为你的图片路径
object_detection(image_path)
四、效果演示
4.1 识别香蕉
4.2 识别手机
4.3 识别夜景中的汽车和人