24、从零开始训练Faster R-CNN-优快云博客

本文链接：https://blog.youkuaiyun.com/nice1/article/details/154816907

从零开始训练Faster R-CNN

在使用TFOD API训练Faster R-CNN时，合理的目录结构和配置文件的设置至关重要，以下将详细介绍相关内容。

1. 目录结构设置

在LISA数据集目录下，建议创建以下目录用于实验和数据存储：

$ cd lisa
$ mkdir records experiments
$ mkdir experiments/training experiments/evaluation experiments/exported_model

各目录作用如下：
- records目录 ：存储三个重要文件。
- training.record ：用于训练的序列化图像数据集，包含图像、边界框和标签。
- testing.record ：用于测试的图像、边界框和标签。
- classes.pbtxt ：包含类标签名称及其唯一整数ID的纯文本文件。
- experiments目录 ：存放训练实验所需的所有文件，其下有三个子目录。
- training ：存储用于指导TFOD API训练模型的特殊管道配置文件、用于微调的预训练模型以及训练期间创建的任何模型检查点。
- evaluation ：存储TFOD API提供的评估脚本生成的日志，可使用TensorBoard工具绘制训练过程图。
- exported_model ：存储训练完成后导出的最终冻结权重模型。

2. 配置文件设置

2.1 setup.sh文件

#!/bin/sh
export PYTHONPATH=$PYTHONPATH:/home/adrian/models/research:/home/adrian/models/research/slim

此文件的作用是更新 PYTHONPATH 变量，以包含TFOD API的导入路径。需要将路径更新为你自己系统上克隆TFOD API仓库的位置，且 PYTHONPATH 应在一行中。每次使用TFOD API时，需在shell中执行以下命令：

$ source setup.sh

这样可以确保TensorFlow导入成功。

2.2 lisa_config.py配置文件

# import the necessary packages
import os

# initialize the base path for the LISA dataset
BASE_PATH = "lisa"

# build the path to the annotations file
ANNOT_PATH = os.path.sep.join([BASE_PATH, "allAnnotations.csv"])

# build the path to the output training and testing record files,
# along with the class labels file
TRAIN_RECORD = os.path.sep.join([BASE_PATH,
                                 "records/training.record"])
TEST_RECORD = os.path.sep.join([BASE_PATH,
                                "records/testing.record"])
CLASSES_FILE = os.path.sep.join([BASE_PATH,
                                 "records/classes.pbtxt"])

# initialize the test split size
TEST_SIZE = 0.25

# initialize the class labels dictionary
CLASSES = {"pedestrianCrossing": 1, "signalAhead": 2, "stop": 3}

此配置文件定义了LISA数据集的基本路径、注释文件路径、输出训练和测试记录文件路径以及类标签文件路径。在训练Faster R-CNN时，仅关注三个类：行人横道、前方信号和停车标志。

3. TensorFlow注释类

在使用TFOD API时，需要构建一个包含图像和其关联边界框的数据集。为了保持构建脚本的整洁和代码的可重用性，我们创建一个 TFAnnotation 类来封装对象检测数据点的TensorFlow格式编码。

3.1 导入必要的包

# import the necessary packages
from object_detection.utils.dataset_util import bytes_list_feature
from object_detection.utils.dataset_util import float_list_feature
from object_detection.utils.dataset_util import int64_list_feature
from object_detection.utils.dataset_util import int64_feature
from object_detection.utils.dataset_util import bytes_feature

3.2 定义TFAnnotation类

class TFAnnotation:
    def __init__(self):
        # initialize the bounding box + label lists
        self.xMins = []
        self.xMaxs = []
        self.yMins = []
        self.yMaxs = []
        self.textLabels = []
        self.classes = []
        self.difficult = []

        # initialize additional variables, including the image
        # itself, spatial dimensions, encoding, and filename
        self.image = None
        self.width = None
        self.height = None
        self.encoding = None
        self.filename = None

    def build(self):
        # encode the attributes using their respective TensorFlow
        # encoding function
        w = int64_feature(self.width)
        h = int64_feature(self.height)
        filename = bytes_feature(self.filename.encode("utf8"))
        encoding = bytes_feature(self.encoding.encode("utf8"))
        image = bytes_feature(self.image)
        xMins = float_list_feature(self.xMins)
        xMaxs = float_list_feature(self.xMaxs)
        yMins = float_list_feature(self.yMins)
        yMaxs = float_list_feature(self.yMaxs)
        textLabels = bytes_list_feature(self.textLabels)
        classes = int64_list_feature(self.classes)
        difficult = int64_list_feature(self.difficult)

        # construct the TensorFlow-compatible data dictionary
        data = {
            "image/height": h,
            "image/width": w,
            "image/filename": filename,
            "image/source_id": filename,
            "image/encoded": image,
            "image/format": encoding,
            "image/object/bbox/xmin": xMins,
            "image/object/bbox/xmax": xMaxs,
            "image/object/bbox/ymin": yMins,
            "image/object/bbox/ymax": yMaxs,
            "image/object/class/text": textLabels,
            "image/object/class/label": classes,
            "image/object/difficult": difficult,
        }

        # return the data dictionary
        return data

该类的 __init__ 方法初始化了边界框、标签列表以及图像的相关属性， build 方法将这些属性编码为TensorFlow格式并构建数据字典。

4. 构建LISA + TensorFlow数据集

为了使用TFOD API训练网络，需要将图像和注释转换为TensorFlow记录格式。以下是 build_lisa_records.py 文件的代码：

# import the necessary packages
from config import lisa_config as config
from pyimagesearch.utils.tfannotation import TFAnnotation
from sklearn.model_selection import train_test_split
from PIL import Image
import tensorflow as tf
import os

def main(_):
    # open the classes output file
    f = open(config.CLASSES_FILE, "w")

    # loop over the classes
    for (k, v) in config.CLASSES.items():
        # construct the class information and write to file
        item = ("item {\n"
                "\tid: " + str(v) + "\n"
                "\tname: ’" + k + "’\n"
                "}\n")
        f.write(item)

    # close the output classes file
    f.close()

    # initialize a data dictionary used to map each image filename
    # to all bounding boxes associated with the image, then load
    # the contents of the annotations file
    D = {}
    rows = open(config.ANNOT_PATH).read().strip().split("\n")

    # loop over the individual rows, skipping the header
    for row in rows[1:]:
        # break the row into components
        row = row.split(",")[0].split(";")
        (imagePath, label, startX, startY, endX, endY, _) = row
        (startX, startY) = (float(startX), float(startY))
        (endX, endY) = (float(endX), float(endY))

        # if we are not interested in the label, ignore it
        if label not in config.CLASSES:
            continue

        # build the path to the input image, then grab any other
        # bounding boxes + labels associated with the image
        p = os.path.sep.join([config.BASE_PATH, imagePath])
        b = D.get(p, [])

        # build a tuple consisting of the label and bounding box,
        # then update the list and store it in the dictionary
        b.append((label, (startX, startY, endX, endY)))
        D[p] = b

    # create training and testing splits from our data dictionary
    (trainKeys, testKeys) = train_test_split(list(D.keys()),
                                             test_size=config.TEST_SIZE, random_state=42)

    # initialize the data split files
    datasets = [
        ("train", trainKeys, config.TRAIN_RECORD),
        ("test", testKeys, config.TEST_RECORD)
    ]

    # loop over the datasets
    for (dType, keys, outputPath) in datasets:
        # initialize the TensorFlow writer and initialize the total
        # number of examples written to file
        print("[INFO] processing ’{}’...".format(dType))
        writer = tf.python_io.TFRecordWriter(outputPath)
        total = 0

        # loop over all the keys in the current set
        for k in keys:
            # load the input image from disk as a TensorFlow object
            encoded = tf.gfile.GFile(k, "rb").read()
            encoded = bytes(encoded)

            # load the image from disk again, this time as a PIL
            # object
            pilImage = Image.open(k)
            (w, h) = pilImage.size[:2]

            # parse the filename and encoding from the input path
            filename = k.split(os.path.sep)[-1]
            encoding = filename[filename.rfind(".") + 1:]

            # initialize the annotation object used to store
            # information regarding the bounding box + labels
            tfAnnot = TFAnnotation()
            tfAnnot.image = encoded
            tfAnnot.encoding = encoding
            tfAnnot.filename = filename
            tfAnnot.width = w
            tfAnnot.height = h

            # loop over the bounding boxes + labels associated with
            # the image
            for (label, (startX, startY, endX, endY)) in D[k]:
                # TensorFlow assumes all bounding boxes are in the
                # range [0, 1] so we need to scale them
                xMin = startX / w
                xMax = endX / w
                yMin = startY / h
                yMax = endY / h

                # update the bounding boxes + labels lists
                tfAnnot.xMins.append(xMin)
                tfAnnot.xMaxs.append(xMax)
                tfAnnot.yMins.append(yMin)
                tfAnnot.yMaxs.append(yMax)
                tfAnnot.textLabels.append(label.encode("utf8"))
                tfAnnot.classes.append(config.CLASSES[label])
                tfAnnot.difficult.append(0)

                # increment the total number of examples
                total += 1

            # encode the data point attributes using the TensorFlow
            # helper functions
            features = tf.train.Features(feature=tfAnnot.build())
            example = tf.train.Example(features=features)

            # add the example to the writer
            writer.write(example.SerializeToString())

        # close the writer and print diagnostic information to the
        # user
        writer.close()
        print("[INFO] {} examples saved for ’{}’".format(total,
                                                         dType))

# check to see if the main thread should be started
if __name__ == "__main__":
    tf.app.run()

执行以下命令构建LISA记录文件：

$ time python build_lisa_records.py

执行结果示例：

[INFO] processing ’train’...
[INFO] 2876 examples saved for ’train’
[INFO] processing ’test’...
[INFO] 955 examples saved for ’test’
real    0m4.879s
user    0m3.117s
sys     0m2.580s

整个流程的mermaid流程图如下：

graph LR
    A[设置目录结构] --> B[配置文件设置]
    B --> C[创建TFAnnotation类]
    C --> D[构建LISA + TensorFlow数据集]
    D --> E[执行build_lisa_records.py]

通过以上步骤，我们完成了Faster R-CNN训练前的准备工作，包括目录结构设置、配置文件配置、注释类的创建以及数据集的构建。这些步骤为后续的训练过程奠定了基础。

从零开始训练Faster R-CNN（续）

5. 详细步骤解析

在前面的内容中，我们已经完成了Faster R-CNN训练前的一系列准备工作。接下来，我们对各个步骤进行更详细的解析，帮助大家更好地理解每一步的作用和原理。

5.1 目录结构的重要性

5.2 配置文件的作用

setup.sh文件 ：更新 PYTHONPATH 变量，使得Python能够正确导入TFOD API相关的模块。如果不进行此设置，在执行脚本时，TensorFlow的导入操作将会失败。具体操作步骤如下：
1. 打开 setup.sh 文件，将文件中的路径更新为你自己系统上克隆TFOD API仓库的位置。
2. 在终端中执行 source setup.sh 命令，使配置生效。
lisa_config.py文件 ：定义了LISA数据集的各种路径和参数，包括数据集的基本路径、注释文件路径、输出记录文件路径以及类标签信息。这些配置信息在后续的数据处理和模型训练过程中起到了关键作用。

5.3 TFAnnotation类的实现原理

TFAnnotation 类的主要目的是将对象检测数据点封装为TensorFlow格式。其实现过程可以分为以下几个步骤：
1. 初始化属性 ：在 __init__ 方法中，初始化边界框、标签列表以及图像的相关属性，如宽度、高度、编码格式和文件名等。
2. 属性编码 ：在 build 方法中，使用TensorFlow提供的编码函数将各个属性编码为特定的格式，如 int64_feature 、 bytes_feature 等。
3. 构建数据字典 ：将编码后的属性组合成一个数据字典，该字典的键值对符合TensorFlow对单个数据点的期望格式。

5.4 构建LISA + TensorFlow数据集的流程

构建数据集的过程可以概括为以下几个主要步骤：
1. 生成类标签文件 ：遍历配置文件中定义的类标签，将其以特定的格式写入 classes.pbtxt 文件。
2. 读取注释文件 ：将注释文件中的每一行解析为图像路径、标签和边界框坐标等信息，并将其存储在一个字典中，确保每个图像对应的所有边界框和标签信息都被正确关联。
3. 划分训练集和测试集 ：使用 train_test_split 函数将图像路径划分为训练集和测试集，确保同一图像的所有边界框都属于同一数据集。
4. 处理每个数据集 ：对于训练集和测试集，分别执行以下操作：
- 读取图像文件，将其编码为TensorFlow对象，并获取图像的宽度和高度。
- 初始化 TFAnnotation 对象，设置图像的相关属性。
- 遍历图像对应的所有边界框和标签，将其归一化到[0, 1]范围内，并添加到 TFAnnotation 对象中。
- 使用 TFRecordWriter 将 TFAnnotation 对象编码为TensorFlow记录文件。

6. 常见问题及解决方案

在实际操作过程中，可能会遇到一些常见问题，以下是一些问题及对应的解决方案：

问题描述	解决方案
执行 `build_lisa_records.py` 脚本时出现导入错误	检查 `setup.sh` 文件中的 `PYTHONPATH` 是否正确设置，确保TensorFlow能够找到TFOD API相关的模块。执行 `source setup.sh` 命令更新环境变量。
训练过程中出现内存不足的问题	减少批量大小（batch size），或者使用更小的预训练模型。也可以考虑使用分布式训练来提高内存利用率。
生成的记录文件内容为空	检查注释文件的格式是否正确，确保所有的图像路径和边界框坐标都能被正确解析。同时，检查配置文件中的路径是否指向正确的文件。

7. 总结与展望

通过以上一系列的步骤，我们完成了从零开始训练Faster R-CNN的前期准备工作。从目录结构的设置到配置文件的编写，再到数据集的构建，每一个环节都至关重要。这些准备工作为后续的模型训练提供了坚实的基础。

在未来的工作中，我们可以进一步优化训练过程，例如调整模型的超参数、尝试不同的预训练模型，以提高模型的性能。同时，我们还可以将训练好的模型应用到实际场景中，如交通标志检测、目标跟踪等，实现更广泛的应用。

整个训练流程的详细步骤列表如下：
1. 设置目录结构，创建必要的目录和子目录。
2. 配置 setup.sh 和 lisa_config.py 文件，确保Python能够正确导入TFOD API相关的模块，并定义好数据集的路径和参数。
3. 创建 TFAnnotation 类，封装对象检测数据点的TensorFlow格式编码。
4. 编写 build_lisa_records.py 脚本，将图像和注释转换为TensorFlow记录文件。
5. 执行 build_lisa_records.py 脚本，生成训练和测试记录文件。
6. 检查生成的记录文件，确保其内容正确。
7. 准备好预训练模型和训练配置文件，开始进行Faster R-CNN的训练。

希望本文能够帮助大家更好地理解从零开始训练Faster R-CNN的过程，为大家在目标检测领域的研究和实践提供一些参考。