OpenCV与AI深度学习 | 基于PyTorch实现Faster RCNN目标检测

最新推荐文章于 2025-09-14 09:02:29 发布

原创最新推荐文章于 2025-09-14 09:02:29 发布 · 1.2k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #深度学习 #opencv #pytorch #目标检测 #算法 #图像处理

深度学习拓展阅读同时被 2 个专栏收录

991 篇文章

订阅专栏

CV-目标跟踪专栏

11 篇文章

订阅专栏

本文来源公众号“OpenCV与AI深度学习”，仅用于学术分享，侵权删，干货满满。

原文链接：基于PyTorch实现Faster RCNN目标检测

我们将创建一个检测图像中对象的简单模型。我们将使用一个 PyTorch 训练的模型 Faster R-CNN，该模型具有 ResNet-50 主干和特征金字塔网络 (FPN)。该架构因其在对象检测任务中的有效性而受到广泛认可。

torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

模型细分：

torchvision.models.detection =>提供预先训练的对象检测模型的 PyTorch 模块。
fasterrcnn ->指的是 Faster R-CNN（基于区域的卷积神经网络），一种先进的物体检测模型。
resnet50 ->模型的主干，用于从输入图像中提取特征。它是一个 ResNet-50，一个具有 50 层的卷积神经网络，专为特征提取而设计。
fpn->对主干网的增强，通过组合主干网多层的特征，允许模型有效地处理不同规模的对象。
pretrained=True ->加载在COCO（上下文中的常见对象）数据集上预先训练的模型，其中包含 80 个对象类别。

导入需求库

# Import PyTorch core library
import torch

# Import torchvision library for computer vision tasks
import torchvision

# Import transforms module from torchvision for image preprocessing
from torchvision import transforms as T

# Import NumPy for numerical operations and array manipulations
import numpy as np

# Import Python Imaging Library (PIL) for working with images
from PIL import Image

# Import OpenCV for image processing and computer vision tasks
import cv2

# Import cv2_imshow from Google Colab patches to display images in Colab
from google.colab.patches import cv2_imshow

加载模型和评估

# Load the pre-trained Faster R-CNN model with a ResNet-50 backbone and Feature Pyramid Network (FPN) from torchvision
# This model is designed for object detection tasks and is pre-trained on the COCO dataset
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# Set the model to evaluation mode
# This disables certain layers like dropout and batch normalization that behave differently during training
model.eval()

从网络下载图片（可选）

要下载图片，请搜索并复制图片链接地址。然后将其粘贴到这里。

# download the image for prediction !wget 'https://cdn.sanity.io/images/uqxwe2qj/production/4ee9fb18bdc214aefebf7859557a6611125c3841-760x426.png?q=80&auto=format&fit=clip&w=760'

加载目标图像

# Open the image file located at the specified path using the PIL library# The '.convert('RGB')' ensures the image is converted to RGB format, # which is useful for consistent processing in models that expect 3-channel color images.image_path = '/content/4ee9fb18bdc214aefebf7859557a6611125c3841-760x426.png?q=80&auto=format&fit=clip&w=760.1'Ig = Image.open(image_path).convert('RGB')  # Ig now holds the processed image

将图像转换为 PyTorch 张量

# Define a transformation to convert the image into a PyTorch tensor
# T.ToTensor() normalizes the image pixel values to the range [0, 1]
# and converts the image from a PIL image (or numpy array) to a PyTorch tensor.
transform = T.ToTensor()

# Apply the transformation to the previously loaded image (Ig)
# The result (img) is now a PyTorch tensor representing the image,
# which can be directly fed into a deep learning model for processing.
img = transform(Ig)

禁用梯度计算以提高推理过程中的效率

# Disable gradient calculation to improve efficiency during inference
# `torch.no_grad()` ensures no gradients are computed or stored,
# reducing memory usage and speeding up the process since we're not training the model.
with torch.no_grad():
    # Pass the tensor image (wrapped in a list) into the pre-trained model
    # The model processes the image and generates predictions, such as bounding boxes, labels, and scores.
    pred = model([img])

提取预测的框、标签和分数

# Extract and display the keys from the prediction dictionary
# `pred[0]` represents the predictions for the first image in the batch
# The keys provide information about the types of outputs, such as bounding boxes, labels, and scores.
pred_keys = pred[0].keys()

# Output the keys to understand the structure of the predictions
print(pred_keys)

将框、标签、分数存储到变量中

# Extract and store the bounding boxes, labels, and scores from the prediction result
# `pred[0]` contains the prediction results for the first image in the batch

# `pred[0]['boxes']`: Extracts the bounding box coordinates for detected objects
# `pred[0]['labels']`: Extracts the class labels (IDs) of the detected objects
# `pred[0]['scores']`: Extracts the confidence scores for each detection

bboxes = pred[0]['boxes']   # Bounding box coordinates [x_min, y_min, x_max, y_max]
labels = pred[0]['labels']  # Class labels for each detected object
scores = pred[0]['scores']  # Confidence scores for each prediction

定义准确度分数阈值

# Define the accuracy score threshold and extract the number of high-confidence detections
# `scores > 0.9` filters the predictions with confidence greater than 90%

# `torch.argwhere(scores > 0.9)`: Finds the indices of the scores that are greater than 0.9
# `.shape[0]`: Returns the number of indices (i.e., the number of high-confidence detections)

num = torch.argwhere(scores > 0.9).shape[0]  # Number of detections with confidence > 90%

定义列表——COCO 数据集标签

# List of object class names based on the COCO dataset labels
# These are the names of the classes the Faster R-CNN model can recognize, such as people, vehicles, animals, etc.

coco_names = [
    "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
    "traffic light", "fire hydrant", "street sign", "stop sign", "parking meter", "bench", "bird",
    "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "hat", "backpack",
    "umbrella", "shoe", "eye glasses", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
    "bottle", "plate", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
    "potted plant", "bed", "mirror", "dining table", "window", "desk", "toilet", "door", "tv", "laptop",
    "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "blender", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush", "hair brush"
]

使用 OpenCV 读取图像

# Read the image using OpenCV
igg = cv2.imread('/content/4ee9fb18bdc214aefebf7859557a6611125c3841-760x426.png?q=80&auto=format&fit=clip&w=760.1')

# Loop through each detection (based on the number of predictions with score > 0.9)
for i in range(num):
    # Extract the bounding box coordinates and convert them to integers
    x1, y1, x2, y2 = bboxes[i].numpy().astype('int')

    # Try to get the class name using the label; subtract 1 as class IDs start from 1
    try:
        class_name = coco_names[labels[i].numpy() - 1]
    except IndexError:
        # In case of an out-of-bounds index (e.g., if label is invalid), mark as "Unknown"
        class_name = "Unknown"

    # Draw a rectangle around the detected object with green color and thickness of 2
    igg = cv2.rectangle(igg, (x1, y1), (x2, y2), (0, 255, 0), 2)

    # Put the class name above the rectangle in red color, font size .5, and thickness 1
    igg = cv2.putText(igg, class_name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1, cv2.LINE_AA)

显示带有检测到的物体和标签的图像

# Show the image with detected objects and labels
cv2_imshow(igg)

源码下载：

https://github.com/bmyadav91/object-detection-using-faster-rcnn-pytorch/blob/main/Obj

THE END !

文章结束，感谢阅读。您的点赞，收藏，评论是我继续更新的动力。大家有推荐的公众号可以评论区留言，共同学习，一起进步。