SSD: Signle Shot Detector 用于自然场景文字检测

原文链接:http://blog.youkuaiyun.com/u010167269/article/details/52851667

原文错误修正:xml文件中的name要与labelmap.prototxt文件中的name一致,均改为text或object。

前言

之前我在 论文阅读:SSD: Single Shot MultiBox Detector 中,讲了这个最新的 Object Detection 算法

既然 SSD 是用来检测物体的,那么可不可以将 SSD 用来检测自然场景图像中的文字呢?答案肯定是可以的~

同时,受到浙大 solace_hyh 同学的 ssd-plate_detection 工作,这篇文章记录我自己将 SSD 用于文字检测的过程。

全部的代码上传到 Github 了:https://github.com/chenxinpeng/SSD_scene-text-detection代码质量不太高,还请高手指点 。^_^


准备与转换数据集

ICDAR 2011 数据集训练集共有 229 张图像,我将其分为 159 张、70张图像两部分。前者用作训练,后者用于训练时进行测试。

下面就是要将这些图像,转换成 lmdb 格式,用于 caffe 训练;将文字区域的标签,转换为 Pascal VOC 的 XML 格式。

将 ground truth 转换为 Pascal VOC XML 文件

先将 ICDAR 2011 给定的 gt_**.txt 标签文件转换为 Pascal VOC XML 格式。

先看下原来的 gt_**.txt 格式,如下图,有一张原始图像:

下面是其 ground truth 文件:

<code class="hljs bash has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">158</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">412</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">182</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Footpath"</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">442</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">128</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">501</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">170</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"To"</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">393</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">198</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">488</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">240</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"and"</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">63</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">200</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">363</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">242</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Colchester"</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">71</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">271</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">383</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">313</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"Greenstead"</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>

ground truth 文件格式为: xmin, ymin, xmax, ymax, label 。同时,要注意,这里的坐标系是如下摆放: 

将 ground truth 的 txt 文件转换为 Pascal VOC 的 XML 格式的代码如下:

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#! /usr/bin/python</span>

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os, sys
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> glob
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> PIL <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Image

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ICDAR 图像存储位置</span>
src_img_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/media/chenxp/Datadisk/ocr_dataset/ICDAR2011/train-textloc"</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ICDAR 图像的 ground truth 的 txt 文件存放位置</span>
src_txt_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/media/chenxp/Datadisk/ocr_dataset/ICDAR2011/train-textloc"</span>

img_Lists = glob.glob(src_img_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/*.jpg'</span>)

img_basenames = [] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># e.g. 100.jpg</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> img_Lists:
    img_basenames.append(os.path.basename(item))

img_names = [] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># e.g. 100</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> img_basenames:
    temp1, temp2 = os.path.splitext(item)
    img_names.append(temp1)

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> img <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> img_names:
    im = Image.open((src_img_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + img + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.jpg'</span>))
    width, height = im.size

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># open the crospronding txt file</span>
    gt = open(src_txt_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/gt_'</span> + img + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.txt'</span>).read().splitlines()

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># write in xml file</span>
    os.mknod(src_txt_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + img + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.xml'</span>)
    xml_file = open((src_txt_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + img + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.xml'</span>), <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'w'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'<annotation>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    <folder>VOC2007</folder>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    <filename>'</span> + str(img) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.jpg'</span> + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</filename>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    <size>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <width>'</span> + str(width) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</width>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <height>'</span> + str(height) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</height>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <depth>3</depth>\n'</span>)
    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    </size>\n'</span>)

    <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># write the region of text on xml file</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> img_each_label <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> gt:
        spt = img_each_label.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">','</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    <object>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <name>text</name>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <pose>Unspecified</pose>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <truncated>0</truncated>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <difficult>0</difficult>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        <bndbox>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'            <xmin>'</span> + str(spt[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</xmin>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'            <ymin>'</span> + str(spt[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</ymin>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'            <xmax>'</span> + str(spt[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>]) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</xmax>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'            <ymax>'</span> + str(spt[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>]) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</ymax>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'        </bndbox>\n'</span>)
        xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'    </object>\n'</span>)

    xml_file.write(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'</annotation>'</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li><li style="box-sizing: border-box; padding: 0px 5px;">57</li><li style="box-sizing: border-box; padding: 0px 5px;">58</li></ul>

x上面代码运行结果是得到如下的 XML 文件,同样用上面的 100.jpg 图像示例,其转换结果如下:

<code class="language-xml hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">annotation</span>></span>
    <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">folder</span>></span>VOC2007<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">folder</span>></span>
    <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">filename</span>></span>100.jpg<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">filename</span>></span>
    <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">size</span>></span>
        <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">width</span>></span>640<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">width</span>></span>
        <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">height</span>></span>480<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">height</span>></span>
        <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">depth</span>></span>3<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">depth</span>></span>
    <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">size</span>></span>
    <span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"><<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">object</span>></span>
    ......
<span class="hljs-tag" style="color: rgb(0, 102, 102); box-sizing: border-box;"></<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">annotation</span>></span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li></ul>

上面代码生成的 XML 文件,与图像文件存储在一个地方。

生成训练图像与 XML 标签的位置文件

这一步,按照 SSD 训练的需求,将图像位置,及其对应的 XML 文件位置写入一个 txt 文件,供训练时读取,一个文件名称叫做:trainval.txt 文件,另一个叫做:test.txt 文件。形式如下:

<code class="hljs lasso has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">106.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">106.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">203.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">203.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">258.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">258.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">122.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">122.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">103.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">103.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">213.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">213.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
scenetext/JPEGImages/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">149.</span>jpg scenetext/Annotations/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">149.</span><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">xml</span>
<span class="hljs-attribute" style="box-sizing: border-box;">...</span><span class="hljs-attribute" style="box-sizing: border-box;">...</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

生成的代码如下:

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#! /usr/bin/python</span>

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os, sys
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> glob

trainval_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/home/chenxp/data/VOCdevkit/scenetext/trainval"</span>
test_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/home/chenxp/data/VOCdevkit/scenetext/test"</span>

trainval_img_lists = glob.glob(trainval_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/*.jpg'</span>)
trainval_img_names = []
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> trainval_img_lists:
    temp1, temp2 = os.path.splitext(os.path.basename(item))
    trainval_img_names.append(temp1)

test_img_lists = glob.glob(test_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/*.jpg'</span>)
test_img_names = []
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> test_img_lists:
    temp1, temp2 = os.path.splitext(os.path.basename(item))
    test_img_names.append(temp1)

dist_img_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"scenetext/JPEGImages"</span>
dist_anno_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"scenetext/Annotations"</span>

trainval_fd = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/home/chenxp/caffe/data/scenetext/trainval.txt"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'w'</span>)
test_fd = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/home/chenxp/caffe/data/scenetext/test.txt"</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'w'</span>)

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> trainval_img_names:
    trainval_fd.write(dist_img_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + str(item) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.jpg'</span> + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">' '</span> + dist_anno_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + str(item) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.xml\n'</span>)

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> test_img_names:
    test_fd.write(dist_img_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + str(item) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.jpg'</span> + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">' '</span> + dist_anno_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/'</span> + str(item) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'.xml\n'</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li></ul>

生成 test name size 文本文件

这一步,SSD 还需要一个名叫:test_name_size.txt 的文件,里面记录训练图像、测试图像的图像名称、height、width。内容形式如下:

<code class="hljs lasso has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">106</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">203</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">258</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">318</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">122</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">103</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span>
<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">320</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">640</span> <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">480</span>
<span class="hljs-attribute" style="box-sizing: border-box;">...</span><span class="hljs-attribute" style="box-sizing: border-box;">...</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

生成这个文本文件的代码如下:

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#! /usr/bin/python</span>

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> os, sys
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> glob
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> PIL <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> Image

img_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"/home/chenxp/data/VOCdevkit/scenetext/JPEGImages"</span>

img_lists = glob.glob(img_dir + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/*.jpg'</span>)

test_name_size = open(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'/home/chenxp/caffe/data/scenetext/test_name_size.txt'</span>, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'w'</span>)

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> item <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> img_lists:
    img = Image.open(item)
    width, height = img.size
    temp1, temp2 = os.path.splitext(os.path.basename(item))
    test_name_size.write(temp1 + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">' '</span> + str(height) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">' '</span> + str(width) + <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\n'</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul>

准备标签映射文件 labelmap

这个 prototxt 文件是记录 label 与 name 之间的对应关系的,内容如下:

<code class="language-prototxt hljs css has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">item</span> <span class="hljs-rules" style="box-sizing: border-box;">{
  <span class="hljs-rule" style="box-sizing: border-box;"><span class="hljs-attribute" style="box-sizing: border-box;">name</span>:<span class="hljs-value" style="box-sizing: border-box; color: rgb(0, 102, 102);"> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"none_of_the_above"</span>
  label: <span class="hljs-number" style="box-sizing: border-box;">0</span>
  display_name: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"background"</span>
</span></span></span>}
<span class="hljs-tag" style="color: rgb(0, 0, 0); box-sizing: border-box;">item</span> <span class="hljs-rules" style="box-sizing: border-box;">{
  <span class="hljs-rule" style="box-sizing: border-box;"><span class="hljs-attribute" style="box-sizing: border-box;">name</span>:<span class="hljs-value" style="box-sizing: border-box; color: rgb(0, 102, 102);"> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"object"</span>
  label: <span class="hljs-number" style="box-sizing: border-box;">1</span>
  display_name: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"text"</span>
</span></span></span>}</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li></ul>

我的 prototxt 文件名称,被我重命名为:labelmap_voc.prototxt


生成 lmdb 数据库

准备好上述的几个文本文件,将其放置在如下位置:

<code class="language-bash hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">/home/chenxp/caffe/data/scenetext</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

这时候,需要修改调用 SSD 源码中提供的 create_data.sh 脚本文件(我将文件重命名为:create_data_scenetext.sh):

<code class="language-bash hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">cur_dir=$(<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">cd</span> $( dirname <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">${BASH_SOURCE[0]}</span> ) && <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">pwd</span> )
root_dir=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$cur_dir</span>/../..

<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">cd</span> <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$root_dir</span>

redo=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
data_root_dir=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$HOME</span>/data/VOCdevkit"</span>
dataset_name=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"scenetext"</span>
mapfile=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$root_dir</span>/data/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$dataset_name</span>/labelmap_voc_scenetext.prototxt"</span>
anno_<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">type</span>=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"detection"</span>
db=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"lmdb"</span>
min_dim=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
max_dim=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
width=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>
height=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>

extra_cmd=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"--encode-type=jpg --encoded"</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">if</span> [ <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$redo</span> ]
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span>
  extra_cmd=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$extra_cmd</span> --redo"</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">fi</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> subset <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> test trainval
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">do</span>
  python <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$root_dir</span>/scripts/create_annoset.py --anno-type=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$anno_type</span> --label-map-file=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$mapfile</span> \
  --min-dim=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$min_dim</span> --max-dim=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$max_dim</span> --resize-width=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$width</span> --resize-height=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$height</span> \
  --check-label <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$extra_cmd</span> <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$data_root_dir</span> <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$root_dir</span>/data/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$dataset_name</span>/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$subset</span>.txt \
  <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$data_root_dir</span>/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$dataset_name</span>/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$db</span>/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$dataset_name</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"_"</span><span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$subset</span><span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"_"</span><span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$db</span> examples/<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$dataset_name</span>
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">done</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li></ul>

上面的 bash 脚本会自动将训练的 ICDAR 2011 的图像文件与对应 label 转换为 lmdb 文件。转换后的文件位置可参见上面脚本的内容,我的位置为:

<code class="language-bash hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">/home/chenxp/caffe/examples/scenetext_trainval_lmdb
/home/chenxp/caffe/examples/scenetext_test_lmdb</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>


训练模型

将 SSD 用于自己的检测任务,是需要 Fine-tuning a pretrained network 的。

具体的,需要加载 SSD 作者提供的 VGG_ILSVRC_16_layers_fc_reduced.caffemodel,在这个预训练的模型上,继续用我们的数据训练。

下载下来后,放在如下位置下面:

<code class="language-bash hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">/home/chenxp/caffe/models/VGGNet</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

之后,修改作者提供的训练 Python 代码:ssd_pascal.py,这份代码会自动创建训练所需要的如下几个文件:

  • deploy.prototxt
  • solver.prototxt
  • trainval.prototxt
  • test.prototxt

我们需要按照自己的情况,修改如下几处地方:

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Modify the job name if you want.</span>
job_name = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"SSD_{}"</span>.format(resize)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># The name of the model. Modify it if you want.</span>
model_name = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"VGG_VOC0712_{}"</span>.format(job_name)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Directory which stores the model .prototxt file.</span>
save_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"models/VGGNet/VOC0712/{}"</span>.format(job_name)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Directory which stores the snapshot of models.</span>
snapshot_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"models/VGGNet/VOC0712/{}"</span>.format(job_name)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Directory which stores the job script and log file.</span>
job_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"jobs/VGGNet/VOC0712/{}"</span>.format(job_name)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Directory which stores the detection results.</span>
output_result_dir = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/data/VOCdevkit/results/VOC2007/{}/Main"</span>.format(os.environ[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'HOME'</span>], job_name)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># model definition files.</span>
train_net_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/train.prototxt"</span>.format(save_dir)
test_net_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/test.prototxt"</span>.format(save_dir)
deploy_net_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/deploy.prototxt"</span>.format(save_dir)
solver_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/solver.prototxt"</span>.format(save_dir)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># snapshot prefix.</span>
snapshot_prefix = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/{}"</span>.format(snapshot_dir, model_name)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># job script path.</span>
job_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"{}/{}.sh"</span>.format(job_dir, model_name)

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Stores the test image names and sizes. Created by data/VOC0712/create_list.sh</span>
name_size_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"data/VOC0712/test_name_size.txt"</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># The pretrained model. We use the Fully convolutional reduced (atrous) VGGNet.</span>
pretrain_model = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel"</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># Stores LabelMapItem.</span>
label_map_file = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"data/VOC0712/labelmap_voc.prototxt"</span>

num_classes = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">21</span>

num_test_image = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4952</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li></ul>


我的训练参数

其实还需要修改一些,如训练时的参数。因为一开始若直接用作者 ssd_pascal.py 文件中的默认的 solver.prototxt 参数,会出现如下情况: 

这里写图片描述

跑着跑着,loss 就变成 nan 了,发散了,不收敛。

我调试了一段时间,我的 solver.prototxt 参数设置如下,可保证收敛:

<code class="language-prototxt hljs http has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-attribute" style="box-sizing: border-box;">base_lr</span>: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">0.0001</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

其余参数可看自己设置。学习率一定要小,原先的 0.001 就会发散。

训练结束: 

这里写图片描述

可以看见,最后的测试精度为 0.776573,感觉 SSD 效果还可以。

我自己训练好的模型,上传到云端了:链接:http://share.weiyun.com/1c544de66be06ea04774fd11e820a780 (密码:ERid5Y)

这个需要在下一阶段的测试中用到。


用训练好的 model 进行 predict

SSD 的作者也给我们写好了 predict 的代码,我们只需要该参数就可以了。

用 jupyter notebook 打开 ~/caffe/examples/ssd_detect.ipynb 文件,这是作者为我们写好的将训练好的 caffemodel 用于检测的文件。

指定好 caffemodeldeploy.txt,详细的看我上传的代码吧。

测试几张图像,结果如下: 

这里写图片描述

这里写图片描述

这里写图片描述


参考

  1. ECCV2016 Paper: 《SSD: Single Shot MultiBox Detector》
  2. SSD 源代码
  3. SSD-plate_detection from solace_hyh
  4. SSD框架训练自己的数据集
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值