移动人工智能项目（二）-优快云博客

原文：annas-archive.org/md5/24329c3bf9f3c672f8b5b2fa4cd1802e

译者：飞龙

协议：CC BY-NC-SA 4.0

第四章：创建一个机器视觉移动应用程序来分类花的种类

在这一章中，我们将利用我们在前几章中学到的理论知识来创建一个可以分类特定花种的移动应用程序。通过使用您的移动摄像头对着花拍照，该应用程序将分析图像并尽力猜测出那种花的种类。这是我们把对卷积神经网络（CNN）的理解付诸实践的地方。我们还将学习更多关于使用 TensorFlow 以及一些工具如 TensorBoard 的内容。但在我们深入研究之前，让我们先谈谈一些事情。

在本章中，我们使用一些可能不为所有人熟悉的术语，因此让我们确保我们对它们的含义有一致的理解。

在本章中，我们将涵盖以下主题：

CoreML 与 TensorFlow Lite 的对比
什么是 MobileNet
用于图像分类的数据集
创建您自己的图像数据集
使用 TensorFlow 构建模型
运行 TensorBoard

CoreML 与 TensorFlow Lite 的对比

在机器学习领域，有两个努力（截至撰写本文时）正在进行，旨在改善移动 AI 体验。而不是将 AI 或 ML 处理转移到云端和数据中心，更快的选择是在设备本身上处理数据。为了做到这一点，模型必须已经预先训练好，这意味着它可能并不完全训练用于您要使用的目的。

在这个领域，苹果的努力（iOS）称为Core ML，而谷歌的（Android）称为TensorFlow Lite。让我们简要讨论一下两者。

CoreML

Apple 的 CoreML 框架提供了大量的神经网络类型。这使得开发人员可以在开发应用程序时尝试不同的设计。摄像头和麦克风数据只是可以用于诸如图像识别、自然语言处理等领域的两个可以利用的区域。有几个预训练模型开发人员可以直接使用，并根据需要进行调整。

TensorFlow Lite

TensorFlow Lite 是 TensorFlow 的本地设备版本，意味着它设计用于在您的移动设备上运行。截至撰写本文时，它仍处于预发布状态，因此很难与 CoreML 进行直接比较。我们需要等待并看看最终提供的功能。目前，只需知道在移动设备上有两个选项可供选择的本地 AI 和机器学习。

什么是 MobileNet？

在深入之前，我们先来谈谈你在本章中会经常听到的一个术语——MobileNets。你可能会问，什么是 MobileNet？简而言之，它是一种专门为移动设备和嵌入式视觉应用设计的架构。在这些设备上，处理这类任务的计算能力有限，因此迫切需要一种比桌面环境中使用的解决方案更好的方法。

MobileNet架构由 Google 提出，简要来说：

使用深度可分离卷积。与使用普通卷积的神经网络相比，这显著减少了参数的数量，结果就是所谓的轻量级深度神经网络。
深度卷积，随后是点卷积，替代了正常的卷积过程。

为了简化问题，我们将本章分为以下两个部分：

图像分类数据集：在这一节中，我们将探索可用于图像分类的各种数据集（所有这些数据集都可以在线获得）。我们还将讨论如何在必要时创建我们自己的数据集。
使用 TensorFlow 构建模型：在这一节中，我们将使用 TensorFlow 来训练我们的分类模型。我们通过使用一个名为MobileNet的预训练模型来实现这一点。MobileNets 是一系列为 TensorFlow 设计的移动优先计算机视觉模型，旨在在考虑设备上有限资源的情况下最大化准确性。
此外，我们将研究如何将输出模型转换为 .tflite 格式，该格式可用于其他移动或嵌入式设备。TFLite 代表 TensorFlow Lite。你可以通过任何互联网搜索引擎了解更多关于 TensorFlow Lite 的信息。

图像分类数据集

对于我们的花卉分类示例，我们将使用牛津大学的视觉几何组（VGG）图像数据集。该数据集可以通过以下链接访问：www.robots.ox.ac.uk/~vgg/data/。

VGG 是曾在以往的 ImageNet 竞赛中获胜的部门。像 VGG14 和 VGG16 这样的预训练模型是由该部门构建的，它们分别在 2014 年和 2016 年获得了胜利。这些数据集被 VGG 用于训练和评估他们所构建的模型。

花卉数据集可以在页面的精细识别数据集部分找到，此外还有纹理和宠物数据集。点击“Flower Category Datasets”，或使用以下链接访问 VGG 的花卉数据集，www.robots.ox.ac.uk/~vgg/data/flowers/。

在这里，你可以找到两个数据集，一个包含 17 种不同的花卉，另一个包含 102 种不同的花卉。你可以根据它们在教程中的易用性，或者根据你所能使用的处理方法选择其中的一个。

使用更大的数据集意味着训练时间会更长，训练前的数据处理时间也会更长；因此，我们建议你谨慎选择。

这里是你将在此处找到的图像子集。正如你所看到的，文件夹名称与我们在本章稍后会用到的完全一致：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/390e5a15-74f2-47e8-801f-91e7d469466f.png

除了我们上面提到的图像外，下面是一些额外的链接，若你将来需要类似分类用途的图像数据，可以使用它们：

CVonline 数据集: homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
CVpapers 数据集: www.cvpapers.com/datasets.html
图像数据集: wiki.fast.ai/index.php/Image_Datasets
深度学习数据集: deeplearning.net/datasets/
COCO 数据集: cocodataset.org/#home
ImageNet 数据集: www.image-net.org/
开放图像数据集: storage.googleapis.com/openimages/web/index.html
Kaggle 数据集: www.kaggle.com/datasets?sortBy=relevance&group=featured&search=image
开放数据集: skymind.ai/wiki/open-datasets
维基百科: en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research#Object_detection_and_recognition

使用 Google 图片创建你自己的图像数据集

假设因为某种原因，我们需要确定一张图片是什么狗，但电脑上没有现成的图片。我们该怎么办呢？或许最简单的方法是打开 Google Chrome 并在线搜索图片。

以 Doberman 犬为例，假设我们对 Doberman 犬感兴趣。只需打开 Google Chrome 并搜索doberman的图片，如下所示：

搜索 Doberman 犬的图片: 搜索后，得到以下结果：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/49739228-8671-4818-9de8-7aaadbb7b33a.png

打开 JavaScript 控制台: 你可以在 Chrome 的右上角菜单中找到 JavaScript 控制台：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/95960d28-7610-4dc0-9354-dd2383ef520a.png

点击“更多工具”，然后选择“开发者工具”：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/21a212a0-9ecb-4c29-b2c0-c3ffff13b4c6.png

确保选择“控制台”标签页，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/741be916-0f66-4f48-b82f-2c6a18247ca8.png

使用 JavaScript：继续向下滚动，直到你认为已经有足够的图像用于你的用例。完成后，返回到开发者工具中的 Console 标签，然后复制并粘贴以下脚本：

//the jquery  is pulled down in the JavaScript console
var script = document.createElement('script');
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);
//Let us get the URLs
var urls = $('.rg_di .rg_meta').map(function() { return JSON.parse($(this).text()).ou; });
// Now, we will write the URls one per line to file
var textToSave = urls.toArray().join('\n');
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:attachment/text,' + encodeURI(textToSave);
hiddenElement.target = '_blank';
hiddenElement.download = 'urls.txt';
hiddenElement.click();

这段代码会收集所有图像的 URL，并将它们保存到名为urls.txt的文件中，该文件位于你的默认Downloads目录。

使用 Python 下载图像：现在，我们将使用 Python 从urls.txt读取图像的 URL，并将所有图像下载到一个文件夹中：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/bfeb9d0e-f904-4a4b-9ab9-f98a96ce1561.png

这可以通过以下步骤轻松完成：

打开 Python 笔记本，复制并粘贴以下代码以下载图像：

# We will start by importing the required pacages
from imutils import paths
import argparse
import requests
import cv2
import os

导入后，开始构造参数，并且构造后解析参数非常重要：

ap = argparse.ArgumentParser()
ap.add_argument("-u", "--urls", required=True,
help="path to file containing image URLs")
ap.add_argument("-o", "--output", required=True,
help="path to output directory of images")
args = vars(ap.parse_args())

下一步包括从输入文件中获取 URL 列表，并计算下载的图像总数：

rows = open(args["urls"]).read().strip().split("\n")
total = 0
# URLs are looped in
for url in rows:
try:
# Try downloading the image
r = requests.get(url, timeout=60)
#The image is then saved to the disk
p = os.path.sep.join([args["output"], "{}.jpg".format(
str(total).zfill(8))])
f = open(p, "wb")
f.write(r.content)
f.close()
#The counter is updated
print("[INFO] downloaded: {}".format(p))
total += 1

在下载过程中，需要处理抛出的异常：

print("[INFO] error downloading {}...skipping".format(p))

下载的图像路径需要循环遍历：

for imagePath in paths.list_images(args["output"])

现在，决定图像是否应该被删除，并据此初始化：

delete = False

需要加载图像。让我们尝试执行此操作：

image = cv2.imread(imagePath)

如果我们未能正确加载图像，由于图像为None，则应该将其从磁盘中删除：

if image is None:
delete = True

此外，如果 OpenCV 无法加载图像，这意味着图像已损坏，应当删除该图像：

except:
print("Except")
delete = True

最后进行检查，查看图像是否已被删除：

if delete:
print("[INFO] deleting {}".format(imagePath))
os.remove(imagePath)

完成后，让我们将此笔记本下载为 Python 文件并命名为image_download.py。确保将urls.txt文件放置在与你刚刚创建的 Python 文件相同的文件夹中。这一点非常重要。
接下来，我们需要执行刚刚创建的 Python 文件。我们将通过使用命令行来执行，如下所示（确保path变量指向你的 Python 位置）：

Image_download.py --urls urls.txt --output Doberman

执行此命令后，图像将被下载到名为 Doberman 的文件夹中。完成后，你应该能看到所有在 Google Chrome 中查看到的杜宾犬图像，类似于以下所示的图像：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/06057770-7f65-4faf-9b99-c9c6f2bdbb0c.png

选择所需的文件夹以保存图像，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/8e2f30d2-cac4-4cba-8228-62723d73a617.png

就这样，我们现在拥有了一个充满杜宾犬图像的文件夹。相同的方法可以应用于创建任何其他类型类别的文件夹。

可能会有一些来自 Google 图像结果的图像是不需要的。确保浏览图像并移除任何不想要的图像。

从视频创建自定义数据集的替代方法

有时我们通过互联网找到的图像可能无法满足我们的需求，或者我们根本找不到任何图像。这可能是由于数据的独特性、当前的用例、版权限制、所需分辨率等原因造成的。在这种情况下，另一种方法是记录需要的物体的视频，提取符合要求的视频帧，并将每一帧保存为单独的图像。我们该如何操作呢？

假设我们有一种皮肤病，无法在网上找到相关信息。我们需要对这种皮肤病进行分类。然而，为了做到这一点，我们需要一张该皮肤病的图像。因此，我们可以拍摄这张皮肤病的录像，并将视频文件保存为一个文件。为了讨论的方便，我们假设我们将视频保存为文件名myvideo.mp4。

完成后，我们可以使用以下 Python 脚本将视频分解为图像，并将其保存到一个文件夹中。此函数将接受视频文件的路径，根据频率将视频分解为帧，并将相应的图像保存到指定的输出位置。以下是该函数的完整代码：

import sys
import argparse
import os
import cv2
import numpy as np
print(cv2.__version__)

这个函数接受视频文件的路径，根据频率将视频分解为帧，并将相应的图像保存到指定的输出位置：

def extractImages(pathIn, pathOut):
count = 0
vidcap = cv2.VideoCapture(pathIn)
success,image = vidcap.read()
success = True
while success:
vidcap.set(cv2.CAP_PROP_POS_MSEC,(count*10)) # Adjust frequency of frames here
success,image = vidcap.read()
print ('Read a new frame: ', success)
#Once we identify the last frame, stop there
image_last = cv2.imread("frame{}.png".format(count-1))
if np.array_equal(image,image_last):
break
cv2.imwrite( os.path.join("frames","frame{:d}.jpg".format(count)), image) # save frame as JPEG file
count = count + 1
pathIn = "myvideo.mp4"
pathOut = ""
extractImages(pathIn, pathOut)

如上所述，这将在当前文件夹中根据设置的频率保存视频的每一帧。运行此脚本后，您将创建好您的图像数据集，并可以使用所需的图像。

使用 TensorFlow 构建模型

现在，我们已经了解了获取所需图像的几种方法，或者在没有图像的情况下创建我们自己的图像，接下来我们将使用 TensorFlow 为我们的花卉用例创建分类模型：

创建文件夹结构：首先，让我们为我们的花卉分类用例创建所需的文件夹结构。首先，创建一个名为image_classification的主文件夹。在image_classification文件夹内，创建两个文件夹：images和tf_files。images文件夹将包含模型训练所需的图像，而tf_files文件夹将在运行时保存所有生成的 TensorFlow 特定文件。
下载图像：接下来，我们需要下载适用于我们用例的特定图像。以花卉为例，我们的图像将来自我们之前讨论过的 VGG 数据集页面。

请随意使用您自己的数据集，但请确保每个类别都有单独的文件夹。将下载的图像数据集放在images文件夹内。

例如，完整的文件夹结构将如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/2f389c5e-de53-43c8-b807-d68aacb9c7aa.png

创建 Python 脚本：在这一步，我们将创建构建模型所需的 TensorFlow 代码。在主image_classification文件夹中创建一个名为retrain.py的 Python 文件。

完成这些后，以下代码块应被复制并使用。我们将过程分解为几个步骤，以便描述发生了什么：

以下代码块是完整的脚本内容，应该放入retrain.py中：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import collections
from datetime import datetime
import hashlib
import os.path
import random
import re
import sys
import tarfile
import numpy as np
from six.moves import urllib
import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import tensor_shape
from tensorflow.python.platform import gfile
from tensorflow.python.util import compat
FLAGS = None
MAX_NUM_IMAGES_PER_CLASS = 2 ** 27 - 1 # ~134M

接下来，我们需要准备图像，以便它们可以进行训练、验证和测试：

result = collections.OrderedDict()
sub_dirs = [
os.path.join(image_dir,item)
for item in gfile.ListDirectory(image_dir)]
sub_dirs = sorted(item for item in sub_dirs
if gfile.IsDirectory(item))
for sub_dir in sub_dirs:

我们要做的第一件事是从存储图像的目录路径中检索图像。我们将使用这些图像，通过您之前下载并安装的模型来创建模型图。

下一步是通过创建所谓的瓶颈文件来初始化瓶颈。瓶颈是一个非正式术语，用来指代最终输出层之前的那一层，该层负责实际的分类。（TensorFlow Hub 将其称为图像特征向量。）这一层经过训练，输出的值足够让分类器使用，以便区分它被要求识别的所有类别。这意味着它必须是图像的有意义且紧凑的总结，因为它必须包含足够的信息，让分类器能够在一小组值中做出正确的选择。

每个图像都需要有瓶颈值，这是非常重要的。如果每个图像的瓶颈值不可用，我们将不得不手动创建它们，因为这些值在未来训练图像时会被需要。强烈建议缓存这些值，以便以后加快处理速度。因为每个图像在训练过程中都会被多次重复使用，并且计算每个瓶颈值会花费大量时间，所以将这些瓶颈值缓存到磁盘上可以避免重复计算，从而加速过程。默认情况下，瓶颈值会存储在/tmp/bottleneck目录中（除非作为参数指定了新的目录）。

当我们检索瓶颈值时，我们将基于缓存中存储的图像文件名来检索它们。如果对图像进行了扭曲处理，可能会在检索瓶颈值时遇到困难。启用扭曲的最大缺点是瓶颈缓存不再有用，因为输入图像永远不会被完全重复使用。这直接导致了训练过程时间的延长，因此强烈建议在对模型基本满意时再启用扭曲处理。如果您遇到问题，我们已经在本书的 GitHub 仓库中提供了一种方法来获取带有扭曲的图像的瓶颈值。

请注意，我们首先将扭曲的图像数据转化为 NumPy 数组。

接下来，我们需要对图像进行推理。这需要一个训练好的目标检测模型，并通过使用两个内存副本来完成。

我们的下一步是对图像进行失真处理。失真处理如裁剪、缩放和亮度是以百分比的形式给出的，这些百分比值控制每种失真在每个图像上应用的程度。合理的做法是从每种失真值 5 或 10 开始，然后通过实验确定哪些对模型有帮助，哪些没有。

接下来，我们需要基于准确性和损失来总结我们的模型。我们将使用 TensorBoard 可视化工具进行分析。如果你还不知道，TensorFlow 提供了一套名为 TensorBoard 的可视化工具，它可以帮助你可视化 TensorFlow 图，绘制执行过程中的变量，并展示其他数据，如通过图的图像。以下是一个 TensorBoard 仪表盘的示例：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/be6eec08-7056-441f-b2e3-5263c485e200.png

我们的下一步是将模型保存到文件中，并设置一个目录路径，用于写入 TensorBoard 的摘要。

在这一点上，我们需要指出create_model_info函数，它将返回模型信息。在下面的示例中，我们处理的是 MobileNet 和 Inception_v3 架构。稍后你将看到我们如何处理这些架构之外的其他架构：

def create_model_info(architecture):
architecture = architecture.lower()
if architecture == 'inception_v3':
# pylint: disable=line-too-long
data_url = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
# pylint: enable=line-too-long
bottleneck_tensor_name = 'pool_3/_reshape:0'
bottleneck_tensor_size = 2048
input_width = 299
input_height = 299
input_depth = 3
resized_input_tensor_name = 'Mul:0'
model_file_name = 'classify_image_graph_def.pb'
input_mean = 128
input_std = 128
elif architecture.startswith('mobilenet_'):
parts = architecture.split('_')
if len(parts) != 3 and len(parts) != 4:
tf.logging.error("Couldn't understand architecture name '%s'",
architecture)
return None
version_string = parts[1]
if (version_string != '1.0' and version_string != '0.75' and
version_string != '0.50' and version_string != '0.25'):
tf.logging.error(
""""The Mobilenet version should be '1.0', '0.75', '0.50', or '0.25',
but found '%s' for architecture '%s'""",
version_string, architecture)
return None
size_string = parts[2]
if (size_string != '224' and size_string != '192' and
size_string != '160' and size_string != '128'):
tf.logging.error(
"""The Mobilenet input size should be '224', '192', '160', or '128',
but found '%s' for architecture '%s'""",
size_string, architecture)
return None
if len(parts) == 3:
is_quantized = False

如果上述参数返回为 false，意味着我们遇到了一个意外的架构。如果发生这种情况，我们需要执行以下代码块来获取结果。在此示例中，我们处理的既不是 MobileNet 也不是 Inception_V3，默认将使用 MobileNet 的版本 1：

else:
if parts[3] != 'quantized':
tf.logging.error(
"Couldn't understand architecture suffix '%s' for '%s'", parts[3],
architecture)
return None
is_quantized = True
data_url = 'http://download.tensorflow.org/models/mobilenet_v1_'
data_url += version_string + '_' + size_string + '_frozen.tgz'
bottleneck_tensor_name = 'MobilenetV1/Predictions/Reshape:0'
bottleneck_tensor_size = 1001
input_width = int(size_string)
input_height = int(size_string)
input_depth = 3
resized_input_tensor_name = 'input:0'
if is_quantized:
model_base_name = 'quantized_graph.pb'
else:
model_base_name = 'frozen_graph.pb'
model_dir_name = 'mobilenet_v1_' + version_string + '_' + size_string
model_file_name = os.path.join(model_dir_name, model_base_name)
input_mean = 127.5
input_std = 127.5
else:
tf.logging.error("Couldn't understand architecture name '%s'", architecture)
raise ValueError('Unknown architecture', architecture)
return {
'data_url': data_url,
'bottleneck_tensor_name': bottleneck_tensor_name,
'bottleneck_tensor_size': bottleneck_tensor_size,
'input_width': input_width,
'input_height': input_height,
'input_depth': input_depth,
'resized_input_tensor_name': resized_input_tensor_name,
'model_file_name': model_file_name,
'input_mean': input_mean,
'input_std': input_std,
}
==============================================================

另一个重要的事项是，我们需要在处理后解码图像的 JPEG 数据。下面的add_jpeg_decoding函数是一个完整的代码片段，通过调用tf.image.decode_jpeg函数来实现这一功能：

def add_jpeg_decoding(input_width, input_height, input_depth, input_mean,
input_std):
jpeg_data = tf.placeholder(tf.string, name='DecodeJPGInput')
decoded_image = tf.image.decode_jpeg(jpeg_data, channels=input_depth)
decoded_image_as_float = tf.cast(decoded_image, dtype=tf.float32)
decoded_image_4d = tf.expand_dims(decoded_image_as_float, 0)
resize_shape = tf.stack([input_height, input_width])
resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
resized_image = tf.image.resize_bilinear(decoded_image_4d,
resize_shape_as_int)
offset_image = tf.subtract(resized_image, input_mean)
mul_image = tf.multiply(offset_image, 1.0 / input_std)
return jpeg_data, mul_image

这里是我们的main函数，展示了它的全部内容。基本上，我们做了以下操作：

设置我们的日志级别为INFO
准备文件系统以供使用
创建我们的模型信息
下载并提取我们的数据

def main(_):
tf.logging.set_verbosity(tf.logging.INFO)
prepare_file_system()
model_info = create_model_info(FLAGS.architecture)
if not model_info:
tf.logging.error('Did not recognize architecture flag')
return -1
maybe_download_and_extract(model_info['data_url'])
graph, bottleneck_tensor, resized_image_tensor = (
create_model_graph(model_info))
image_lists = create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage,
FLAGS.validation_percentage)

上述retrain.py文件可以作为本书附带资源进行下载。

运行 TensorBoard

要运行 TensorBoard，请使用以下命令：

tensorboard --logdir=path/to/log-directory

其中logdir指向存储序列化数据的目录。如果该目录包含子目录，并且这些子目录也包含序列化数据，TensorBoard 将可视化所有这些运行的数据显示。一旦 TensorBoard 开始运行，请在浏览器中访问localhost:6006来查看 TensorBoard 及其相关数据。

对于那些想要深入了解 TensorBoard 的读者，请查看以下教程：www.tensorflow.org/tensorboard/r1/summaries。

总结

在这一章中，我们在这个小章节中完成了很多内容。我们首先理解了可用于图像分类的各种数据集，以及如果我们找不到符合要求的图像时，如何获取或创建图像。接着，我们将章节分为两个不同的部分。在第一部分，我们学习了如何创建我们自己的图像数据集。在第二部分，我们学习了如何使用 TensorFlow 构建模型。

在下一章，我们将通过使用各种 TensorFlow 库来进一步扩展我们的 TensorFlow 知识，构建一个机器学习模型，该模型将预测汽车的车身损伤。

第五章：使用 TensorFlow 构建一个预测汽车损坏的 ML 模型

在本章中，我们将建立一个系统，通过分析照片使用迁移学习来检测车辆的损坏程度。这样的解决方案将有助于降低保险索赔成本，并简化车主的流程。如果系统被正确实施，在理想的情况下，用户将上传一组损坏车辆的照片，照片将经过损伤评估，保险索赔将自动处理。

在实施这种用例的完美解决方案中涉及许多风险和挑战。首先，存在多种未知条件可能导致车辆损坏。我们不了解室外环境、周围物体、区域内的光线以及事故前车辆的质量。通过所有这些障碍并找出问题的共同解决方案是具有挑战性的。这是任何基于计算机视觉的场景中的常见问题。

在本章中，我们将涵盖以下主题：

迁移学习基础知识
图像数据集收集
设置一个 Web 应用程序
训练我们自己的 TensorFlow 模型
搭建一个消费模型的 Web 应用程序

迁移学习基础知识

为了实现汽车损坏预测系统，我们将基于 TensorFlow 构建我们自己的机器学习（ML）模型，用于车辆数据集。现代识别模型需要数百万个参数。我们需要大量时间和数据来从头开始训练新模型，以及数百个图形处理单元（GPUs）或张量处理单元（TPUs）运行数小时。

通过使用已经训练好的现有模型和在其上重新训练我们自己的分类器，迁移学习使这项任务变得更加容易。在我们的示例中，我们将使用MobileNet模型的特征提取能力。即使我们不能达到 100%的准确率，这在许多情况下仍然有效，特别是在手机上，我们没有重型资源的情况下。我们甚至可以在典型的笔记本电脑上轻松训练这个模型数小时，即使没有 GPU。该模型是在配备 2.6 GHz 英特尔 i5 处理器和 8 GB 内存的 MacBook Pro 上构建的。

在深度学习中，迁移学习是最流行的方法之一，其中一个为一项任务开发的模型被重用于另一个不同任务的模型上。在基于计算机视觉的任务或基于自然语言处理（NLP）的任务中，我们可以利用预训练模型作为第一步，前提是我们拥有非常有限的计算资源和时间。

在典型的基于计算机视觉的问题中，神经网络尝试在其初始级别层中检测边缘，在中间级别层中检测形状，并在最终级别层中检测更具体的特征。通过迁移学习，我们将使用初始和中间级别的层，并仅重新训练最终级别的层。

举个例子，如果我们有一个训练用于识别苹果的模型，我们可以将其重用于检测水瓶。在初始层中，模型已被训练识别物体，因此我们只需重新训练最后几层。这样，我们的模型就能学到如何区分水瓶与其他物体。这个过程可以通过下图看到：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/72e3df71-eb01-438a-b39c-4ccd737203f9.png

通常，我们需要大量数据来训练我们的模型，但大多数时候我们没有足够的相关数据。这时迁移学习就派上用场了，它允许你用很少的数据来训练模型。

如果你之前的分类器是使用 TensorFlow 开发并训练的，你可以重复使用相同的模型，并重新训练其中一些层以适应新的分类器。这是完全可行的，但前提是从旧任务中学到的特征具有更通用的性质。例如，你不能将为文本分类器开发的模型直接用于图像分类任务。此外，两个模型的输入数据大小必须一致。如果大小不匹配，我们需要添加一个额外的预处理步骤来调整输入数据的大小。

迁移学习方法

让我们深入探讨迁移学习的不同方法。可能有不同的名称用于描述这些方法，但概念保持一致：

使用预训练模型：目前有很多预训练模型可以满足你基本的深度学习研究需求。在本书中，我们使用了很多预训练模型，并从中得出我们的结果。
训练一个可重用的模型：假设你想解决问题 A，但你没有足够的数据来实现目标。为了解决这个问题，我们有另一个问题 B，其中有足够的数据。在这种情况下，我们可以为问题 B 开发一个模型，并将该模型作为问题 A 的起点。是否需要重用所有层或仅重用某些层，取决于我们所解决问题的类型。
特征提取：通过深度学习，我们可以提取数据集的特征。大多数时候，特征是由开发者手工设计的。神经网络有能力学习哪些特征需要传递，哪些特征不需要传递。例如，我们只会使用初始层来检测特征的正确表示，而不会使用输出层，因为它可能过于特定于某个特定任务。我们将简单地将数据输入网络，并使用其中一个中间层作为输出层。

有了这个，我们将开始使用 TensorFlow 构建我们的模型。

构建 TensorFlow 模型

构建你自己的自定义模型需要遵循一个逐步的过程。首先，我们将使用 TensorFlow Hub 来通过预训练模型输入图像。

要了解更多关于 TensorFlow Hub 的信息，请参考www.tensorflow.org/hub。

安装 TensorFlow

在写这本书时，TensorFlow r1.13 版本已经发布。同时，2.0.0 版本也处于 Alpha 阶段，但我们将使用稳定版本。TensorFlow Hub 依赖于可以通过pip安装的 TensorFlow 库，如下所示：

$ pip install tensorflow
$ pip install tensorflow-hub

当tensorflow库安装完成后，我们需要在训练过程开始之前收集我们的图像数据集。在开始训练之前，我们需要考虑很多因素。

训练图像

在本节中，我们将收集图像并将其按类别整理在各自的文件夹中。

选择自己图像数据集的几个常见步骤如下：

首先，你需要为每个想要识别的图像类别至少准备 100 张照片。模型的准确性与数据集中的图像数量成正比。
你需要确保图像集中的图像更具相关性。例如，如果你拍摄了一组背景单一的图像，比如所有图像的背景都是白色且拍摄于室内，而用户需要识别具有干扰背景（例如拍摄于户外、背景五颜六色）的物体，那么这样做并不会提高准确性。
选择具有多样背景的图像。例如，如果你只选择了具有两种背景颜色的图像，那么你的预测将倾向于这两种颜色，而不是图像中的物体。
尝试将较大的类别拆分为更小的子类。例如，你可以使用“猫”，“狗”或“老虎”来代替“动物”。
确保选择所有包含你想识别的物体的输入图像。例如，如果你有一个识别狗的应用，我们就不会使用汽车、建筑物或山脉的图片作为输入图像。在这种情况下，最好为无法识别的图像设置一个独立的分类器。
确保正确标记图像。例如，将花朵标记为茉莉花时，图片中可能会包含整株植物或背景中有人物。当输入图像中有干扰物体时，我们算法的准确性会有所不同。假设你从 Google 图片搜索中获取了一些食物图片，这些图片具有可重复使用的许可，因此在收集图像用于训练模型时，务必确保这些图像符合许可要求。你可以通过在 Google 图片搜索中输入关键词，并根据可重复使用的使用权筛选图片来实现。点击搜索栏下方的工具按钮即可找到此选项。

我们在本章中收集了一些互联网图片用于教学目的。详细信息将在下一节中讨论。

构建我们自己的模型

在这里，我们将使用 TensorFlow 构建自己的机器学习模型，分析车辆的损坏程度。我们需要小心选择数据集，因为它在损伤评估阶段起着至关重要的作用。以下是我们将遵循的构建模型的步骤：

查找损坏车辆的图像数据集。
根据损坏程度对图像进行分类。首先，我们需要识别图中的物体实际上是一辆车。为此，我们需要有两类图像集，一类包含有车的图像，另一类不包含车。然后，我们还需要三个类别来确定车的损坏等级，分别是高、中、低三个等级。确保每个类别下至少有 1,000 张图像。数据集准备好后，我们就可以开始训练模型了。
我们将使用 TensorFlow 训练我们的模型。
我们将构建一个 Web 应用程序来分析车辆的损坏程度。
更新结果。

使用我们自己的图像重新训练

我们现在将使用retrain.py脚本，该脚本位于我们的项目目录中。

使用curl下载此脚本，如下所示：

mkdir -/Chapter5/images
cd -/Chapter5/images
curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/ retrain.py
python retrain.py --image_dir ./images/

在训练开始之前，有几个参数必须传递给训练脚本并查看。

一旦数据集准备好，我们需要着手改善结果。我们可以通过调整学习过程中的步骤数来实现这一点。

最简单的方法是使用以下代码：

--how_many_training_steps = 4000

当步骤数增加时，准确率的提高速度会变慢，并且准确率在达到某个点之后将停止提高。你可以通过实验来决定什么对你来说最有效。

架构

MobileNet 是一个较小、低功耗、低延迟的模型，旨在满足移动设备的限制。在我们的应用中，我们从 MobileNet 数据集中选择了以下架构作为参数之一，如下代码所示，用于在构建模型时获得更好的准确性基准：

--architecture=" mobilenet_v2_1.4_224"

网络的功率和延迟随着乘法累加（MACs）的数量而增长，MACs 衡量的是融合的乘法和加法操作的数量，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/6155f1ac-e985-414e-b7db-c94dd8fa8b2d.png

你可以从github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet下载模型。

扭曲

我们可以通过在训练过程中提供困难的输入图像来提高结果。训练图像可以通过随机裁剪、亮度调节和形变等方式生成。这将有助于生成一个有效的训练数据集。

然而，启用扭曲存在一个缺点，因为瓶颈缓存没有用处。因此，输入图像没有被重用，导致训练时间增加。这里有多种方式启用扭曲，如下所示：

--random_crop
--random_scale
--random_brightness

这在所有情况下并不一定有用。例如，在数字分类系统中，它没有什么帮助，因为翻转和扭曲图像在生成可能的输出时并不合理。

超参数

我们可以尝试更多的参数，看看额外的参数是否有助于提高结果。

按照以下项目符号中给出的形式指定它们。超参数的解释如下：

--learning_rate：这个参数控制在训练过程中最终层的更新。如果这个值较小，训练将需要更多的时间。不过，这不一定总是能帮助提高准确性。
--train_batch_size：这个参数帮助控制在训练过程中用于估算最终层更新的图像数量。一旦图像准备好，脚本会将它们分成三个不同的集合。最大的集合用于训练。这种划分主要有助于防止模型识别输入图像中不必要的模式。如果一个模型使用特定背景模式进行训练，当面对带有新背景的图像时，它就无法给出正确的结果，因为它记住了输入图像中的不必要信息。这就是过拟合。
--testing_percentage 和 --validation_percentage 标志：为了避免过拟合，我们将 80%的数据保留在主训练集中。这些数据中的 10%用于在训练过程中进行验证，最后的 10%用于测试模型。
--validation_batch_size：我们可以看到验证的准确性在每次迭代中有所波动。

如果你是新手，你可以在不修改这些参数的情况下运行默认值。让我们开始构建我们的模型。为此，我们需要训练图像数据。

图像数据集收集

对于我们的实验，我们需要汽车在良好状态和损坏状态下的数据集。如果你有符合隐私政策的数据源，那么这里是一个很好的起点。否则，我们需要找到一种方法来在数据集上构建我们的模型。现在有多个公开的数据集可供使用。如果没有类似数据模型的现有参考，我们需要开始构建自己的数据集，因为这可能是一个耗时且重要的步骤，能够帮助我们获得更好的结果。

我们将使用一个简单的 Python 脚本从 Google 下载图像。只要确保你筛选的是可重用的图像。我们不鼓励使用那些带有不可重用许可证的图片。

使用 Python 脚本，我们将从 Google 拉取并保存图像，然后使用一个库来完成相同的任务。这一步是构建任何机器学习模型的最基础步骤之一。

我们将使用一个叫做Beautiful Soup的 Python 库来从互联网抓取图像。

Beautiful Soup 简介

Beautiful Soup 是一个 Python 库，用于从 HTML 和 XML 文件中提取数据。它在涉及抓取的项目中非常有用。使用这个库，我们可以导航、搜索和修改 HTML 和 XML 文件。

这个库解析你提供的任何内容，并对数据进行树形遍历。你可以要求库找到所有 URL 匹配 google.com 的链接，找到所有类为 bold 的链接，或者找到所有包含粗体文本的表头。

有几个特性使它非常有用，具体如下：

Beautiful Soup 提供了一些简单的方法和 Pythonic 风格的习惯用法，用于遍历、搜索和修改解析树。解析树是一个工具包，用于解剖文档并提取所需内容。我们可以减少编写应用程序的代码量。
Beautiful Soup 自动将传入的文档转换为 Unicode，并将传出的文档转换为 UTF-8。除非文档没有指定编码且 Beautiful Soup 无法检测到任何编码，否则我们无需考虑编码问题。然后，我们只需要指定原始编码。
Beautiful Soup 可以与流行的 Python 解析器一起使用，如 lxml (lxml.de/) 和 html5lib (github.com/html5lib/)，并允许你尝试不同的解析策略，或者在灵活性和速度之间做出权衡。
Beautiful Soup 通过提取所需信息来节省你的时间，从而让你的工作更轻松。

这是代码的简单版本：

import argparse
import json
import itertools
import logging
import re
import os
import uuid
import sys
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
#logger will be useful for your debugging need
def configure_logging():
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
handler.setFormatter(
logging.Formatter('[%(asctime)s %(levelname)s %(module)s]: %(message)s'))
logger.addHandler(handler)
return logger
logger = configure_logging()

设置用户代理以避免 403 错误代码：


REQUEST_HEADER = {
'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"}
def get_soup(url, header):
response = urlopen(Request(url, headers=header))
return BeautifulSoup(response, 'html.parser')
# initialize place for links
def get_query_url(query):
return "https://www.google.co.in/search?q=%s&source=lnms&tbm=isch" % query
# pull out specific data through navigating into source data tree
def extract_images_from_soup(soup):
image_elements = soup.find_all("div", {"class": "rg_meta"})
metadata_dicts = (json.loads(e.text) for e in image_elements)
link_type_records = ((d["ou"], d["ity"]) for d in metadata_dicts)
return link_type_records

传入你想提取的图像数量。默认情况下，Google 提供 100 张图像：


def extract_images(query, num_images):
url = get_query_url(query)
logger.info("Souping")
soup = get_soup(url, REQUEST_HEADER)
logger.info("Extracting image urls")
link_type_records = extract_images_from_soup(soup)
return itertools.islice(link_type_records, num_images)
def get_raw_image(url):
req = Request(url, headers=REQUEST_HEADER)
resp = urlopen(req)
return resp.read()

保存所有下载的图像及其扩展名，如以下代码块所示：

def save_image(raw_image, image_type, save_directory):
extension = image_type if image_type else 'jpg'
file_name = str(uuid.uuid4().hex) + "." + extension
save_path = os.path.join(save_directory, file_name)
with open(save_path, 'wb+') as image_file:
image_file.write(raw_image)
def download_images_to_dir(images, save_directory, num_images):
for i, (url, image_type) in enumerate(images):
try:
logger.info("Making request (%d/%d): %s", i, num_images, url)
raw_image = get_raw_image(url)
save_image(raw_image, image_type, save_directory)
except Exception as e:
logger.exception(e)
def run(query, save_directory, num_images=100):
query = '+'.join(query.split())
logger.info("Extracting image links")
images = extract_images(query, num_images)
logger.info("Downloading images")
download_images_to_dir(images, save_directory, num_images)
logger.info("Finished")
#main method to initiate the scrapper
def main():
parser = argparse.ArgumentParser(description='Scrape Google images')
#change the search term here
parser.add_argument('-s', '--search', default='apple', type=str, help='search term')

在这里更改图像数量参数。默认情况下它设置为 1，如以下代码所示：

parser.add_argument('-n', '--num_images', default=1, type=int, help='num images to save')
#change path according to your need
parser.add_argument('-d', '--directory', default='/Users/karthikeyan/Downloads/', type=str, help='save directory')
args = parser.parse_args()
run(args.search, args.directory, args.num_images)
if __name__ == '__main__':
main()

将脚本保存为 Python 文件，然后通过执行以下命令运行代码：

python imageScrapper.py --search "alien" --num_images 10 --directory "/Users/Karthikeyan/Downloads"

使用更好的库进行 Google 图像抓取，包括更多可配置的选项。我们将使用 github.com/hardikvasa/google-images-download。

这是一个命令行 Python 程序，用于在 Google 图像上搜索关键词或关键短语，并可选择将图像下载到你的计算机。你也可以从另一个 Python 文件调用此脚本。

这是一个小型且可以立即运行的程序。如果你只想每个关键词下载最多 100 张图像，它不需要任何依赖。如果你想要每个关键词超过 100 张图像，那么你需要安装 Selenium 库以及 ChromeDriver。详细说明在 故障排除 部分提供。

你可以使用一个拥有更多实用选项的库。

如果你偏好基于命令行的安装方式，可以使用以下代码：

$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download && sudo python setup.py install

另外，你也可以通过 pip 安装这个库：

$ pip install google_images_download

如果通过 pip 安装或使用 命令行语言解释器 (CLI) 安装，可以使用以下命令：

$ googleimagesdownload [Arguments...]

如果是从 github.com 的 UI 下载的，请解压下载的文件，进入 google_images_download 目录，并使用以下命令之一：

$ python3 google_images_download.py [Arguments...] 
$ python google_images_download.py [Arguments...]

如果您想从另一个 Python 文件中使用此库，请使用以下命令：

from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
 absolute_image_paths = response.download({<Arguments...>})

您可以直接从命令行传递参数，如下所示，或者通过配置文件传递参数。

您可以通过配置文件传递多个记录。以下示例由两组记录组成。代码将遍历每一条记录，并根据传递的参数下载图片。

以下是配置文件的示例：

{
 "Records": [
 {
 "keywords": "apple",
 "limit": 55,
 "color": "red",
 "print_urls": true
 },
 {
 "keywords": "oranges",
 "limit": 105,
 "size": "large",
 "print_urls": true
 }
 ]
 }

示例

如果您从另一个 Python 文件调用此库，以下是来自 Google 的示例代码：

_images_download import google_images_download 

#importing the library

response = google_images_download.googleimagesdownload() 

#class instantiation

arguments = {"keywords":"apple, beach, cat","limit":30,"print_urls":True} #creating list of arguments
paths = response.download(arguments) #passing the arguments to the function
print(paths)

#printing absolute paths of the downloaded images

如果您是通过配置文件传递参数，只需传递 config_file 参数，并指定您的 JSON 文件名：

$ googleimagesdownload -cf example.json

以下是使用关键词和限制参数的简单示例：

$ googleimagesdownload --keywords "apple, beach, cat" --limit 20

使用后缀关键词可以指定主关键词后的词语。例如，如果关键词是 car，而后缀关键词是 red 和 blue，则会先搜索红色的汽车，再搜索蓝色的汽车：

$ googleimagesdownload --k "car" -sk 'yellow,blue,green' -l 10

要使用简化命令，请使用以下代码：

$ googleimagesdownload -k "apple, beach, cat" -l 20

要下载具有特定图像扩展名或格式的图片，请使用以下代码：

$ googleimagesdownload --keywords "logo" --format svg

要为图片使用颜色过滤器，请使用以下代码：

$ googleimagesdownload -k "playground" -l 20 -co red

要使用非英语关键词进行图片搜索，请使用以下代码：

$ googleimagesdownload -k "<https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/5a19f409-e391-4710-bbeb-9b051bbcb914.png>" -l 5

要从 Google 图片链接下载图片，请使用以下代码：

$ googleimagesdownload -k "sample" -u <google images page URL>

要将图片保存到特定的主目录（而不是 Downloads）中，请使用以下代码：

$ googleimagesdownload -k "boat" -o "boat_new"

要下载图像 URL 中的单张图片，请使用以下代码：

$ googleimagesdownload --keywords "baloons" --single_image <URL of the images>

要下载具有大小和类型约束的图片，请使用以下代码：

$ googleimagesdownload --keywords "baloons" --size medium --type animated

要下载具有特定使用权的图片，请使用以下代码：

$ googleimagesdownload --keywords "universe" --usage_rights labeled-for-reuse

要下载具有特定颜色类型的图片，请使用以下代码：

$ googleimagesdownload --keywords "flowers" --color_type black-and-white

要下载具有特定纵横比的图片，请使用以下代码：

$ googleimagesdownload --keywords "universe" --aspect_ratio panoramic

要下载与您提供的图像 URL 中的图片相似的图像（即反向图片搜索），请使用以下代码：

$ googleimagesdownload -si <image url> -l 10

要根据给定关键词从特定网站或域名下载图片，请使用以下代码：

$ googleimagesdownload --keywords "universe" --specific_site google.com

图片将下载到它们各自的子目录中，位于您所在文件夹的主目录内（无论是您提供的目录，还是 Downloads）。

现在，我们需要开始准备我们的数据集。

数据集准备

我们需要构建四个不同的数据集。对于汽车损坏检测，我们将考虑所有可能的输入。它可以是一辆状况良好的车，或一辆不同损坏程度的车，或者也可以是与车无关的图像。

我们将按照以下截图中所示的方式操作：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/3e462e68-395a-4695-822d-8bfb139e9001.png

这是用于识别严重损坏汽车的数据集：

googleimagesdownload -k "heavily damaged car" -sk 'red,blue,white,black,green,brown,pink,yellow' -l 500

下面是一些为识别严重损坏的红色汽车所捕获的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/bd32cffc-c38c-44d6-a8c1-063d1ca5623a.png

这是一些捕获到的有严重损坏的蓝色汽车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/35715172-9404-41d5-9fa0-fcd066f68962.png

我们还拥有另一组轻微损坏的汽车图像：

googleimagesdownload -k "car dent" -sk 'red,blue,white,black,green,brown,pink,yellow' -l 500

这是一些捕获到的有凹痕的红色汽车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/d7311d72-0cb5-46c0-8ff6-be07fa51b7e1.png

这是一些捕获到的有凹痕的蓝色汽车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/e734ec97-4ad5-46ba-a45e-bb5070a9b8c3.png

以下命令可用于检索没有任何损坏的普通汽车数据集：

googleimagesdownload -k "car" -l 500

这是一些捕获到的红色汽车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/1f2f86a4-57d8-488f-b6c5-527c5e76b905.png

这是一些捕获到的蓝色汽车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/a0f7a6b5-c7c8-4c31-b52d-4aa8d34ca420.png

以下命令可用于检索不属于汽车的随机物体：

googleimagesdownload -k "bike,flight,home,road,tv" -l 500

这是一些捕获到的自行车的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/1190176f-354a-409a-955b-3d1cde7e9751.png

这是一些捕获到的航班的示例图片：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/5707db31-bdf6-4b65-8f52-28af4fbb343f.png

一旦每个数据集有了 500 张图片，就可以开始训练了。在理想条件下，每个数据集至少应有 1,000 张图片。

我们面临的主要问题是去除噪声数据。对于我们的示例，我们将手动进行这一操作。以下是我们列出的一些示例图像，它们可能是噪声数据，不提供有效输入，因此无法用于构建数据模型：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/c882ff8e-5bae-45bf-8cab-56de998bc6e6.png

一旦我们准备好了所有的图像数据集，就可以开始处理我们的四大类了。目前，所有图像都按颜色和类别分开，如下面的截图所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/fa51d73a-9148-4428-bfea-bc12de51692d.png

我们将把它们分为损坏汽车、有凹痕的汽车、汽车和非汽车：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/9fdc8aec-4e65-47a3-a5ae-82556b39d479.png

运行训练脚本

在讨论完所有与参数相关的细节后，我们可以开始使用下载的脚本进行训练：

python retrain.py \
--bottleneck_dir=./ \
--how_many_training_steps=4000 \
--model_dir=./ \
--output_graph=./retrained_graph.pb \
--output_labels=retrained_labels.txt \
--architecture=" mobilenet_v2_1.4_224" \
--image_dir=/Users/karthikeyan/Documents/ /book/Chapter5/images

根据我们的处理器性能以及图像数量，脚本训练可能会需要更长时间。对我来说，50 个不同类别的汽车，每个类别包含 10,000 张图片，训练花费了超过 10 小时。一旦脚本完成，我们将在输出中得到 TensorFlow 模型。

设置一个 Web 应用程序

我们将使用Flask框架来构建一个简单的应用程序，以检测汽车的损坏。

想了解更多关于 Flask 的信息，请参考www.fullstackpython.com/flask.html。

我们在这里不会深入讲解 Flask 的基础知识。相反，我们只是将我们的模型与 Flask 中现有的文件上传示例结合起来。

文件的结构如下面的截图所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/b4e1b059-5fba-4642-a47a-815e114ef66e.png

这里是app.py中的内容列表：

import os
import glob
from classify import prediction
import tensorflow as tf
import thread
import time
from flask import Flask, render_template, request, redirect, url_for, send_from_directory,flash
from werkzeug import secure_filename
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = 'uploads/'
app.config['ALLOWED_EXTENSIONS'] = set(['jpg', 'jpeg'])
app.config['SECRET_KEY'] = '7d441f27d441f27567d441f2b6176a'
def allowed_file(filename):
return '.' in filename and \
filename.rsplit('.', 1)[1] in app.config['ALLOWED_EXTENSIONS']
@app.route('/')
def index():
return render_template('index.html')
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
filename = str(len(os.listdir(app.config['UPLOAD_FOLDER']))+1)+'.jpg'
file_name_full_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
file.save(file_name_full_path)
return render_template('upload_success.html')
@app.route('/uploads/<filename>')
def uploaded_file(filename):
return send_from_directory(app.config['UPLOAD_FOLDER'],
filename)
@app.route('/claim', methods=['POST'])
def predict():
list_of_files = glob.glob('/Users/karthikeyan/Documents/code/play/acko/cardamage/Car-Damage-Detector/uploads/*.jpg') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)
image_path = latest_file

下一段代码帮助我们打印输出：

#print(max(glob.glob(r'uploads\*.jpg'), key=os.path.getmtime))
with tf.Graph().as_default():
human_string, score= prediction(image_path)
print('model one value' + str(human_string))
print('model one value' + str(score))
if (human_string == 'car'):
label_text = 'This is not a damaged car with confidence ' + str(score) + '%. Please upload a damaged car image'
print(image_path)
return render_template('front.html', text = label_text, filename="http://localhost:5000/uploads/"+os.path.basename(image_path))
elif (human_string == 'low'):
label_text = 'This is a low damaged car with '+ str(score) + '% confidence.'
print(image_path)

打印图像路径后，继续执行以下代码：


return render_template('front.html', text = label_text, filename="http://localhost:5000/uploads/"+os.path.basename(image_path))
elif (human_string == 'high'):
label_text = 'This is a high damaged car with '+ str(score) + '% confidence.'
print(image_path)
return render_template('front.html', text = label_text, filename="http://localhost:5000/uploads/"+os.path.basename(image_path))
elif (human_string == 'not'):
label_text = 'This is not the image of a car with confidence ' + str(score) + '%. Please upload the car image.'
print(image_path)
return render_template('front.html', text = label_text, filename="http://localhost:5000/uploads/"+os.path.basename(image_path))
def cleanDirectory(threadName,delay):

while 循环从这里开始：

while True:
time.sleep(delay)
print ("Cleaning Up Directory")
filelist = [ f for f in (os.listdir(app.config['UPLOAD_FOLDER'])) ]
for f in filelist:
#os.remove("Uploads/"+f)
os.remove(os.path.join(app.config['UPLOAD_FOLDER'], f))
if __name__ == '__main__':
try:
_thread.start_new_thread( cleanDirectory, ("Cleaning Thread", 99999999, ) )
except:
print("Error: unable to start thread" )
app.run()
Classify.py does the model classification using TensorFlow.
import tensorflow as tf
import sys
import os
import urllib

禁用 TensorFlow 编译警告：

os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
def prediction(image_path):
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
print(image_path)
label_lines = [line.rstrip() for line
in tf.gfile.GFile(r"./models/tf_files/retrained_labels.txt")]
with tf.gfile.FastGFile(r"./models/tf_files/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:

一旦将 image_data 作为输入传递给图表，我们就会得到第一次预测：

softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
count = 1
human_string = label_lines[node_id]
score = predictions[0][node_id]
print(count)
count += 1
print('%s (score = %.5f)' % (human_string, score))
score = (round((score * 100), 2))
return human_string,score

控制器 Python 文件与前端 HTML 文件排布在一起：

 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta charset="utf-8">
 <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
 <meta name="description" content="">
 <meta name="author" content="Karthikeyan NG">
 <title>Damage Estimator</title>
 <!-- Bootstrap core CSS -->
 <link href="{{ url_for('static', filename='vendor/bootstrap/css/bootstrap.min.css') }}" rel="stylesheet"/>
 <!-- Custom fonts for this template -->
 <link href="{{ url_for('static', filename='vendor/font-awesome/css/font-awesome.min.css') }}" rel="stylesheet" type="text/css"/>
 <link href='https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800' rel='stylesheet' type='text/css'>
 <link href='https://fonts.googleapis.com/css?family=Merriweather:400,300,300italic,400italic,700,700italic,900,900italic' rel='stylesheet' type='text/css'>
 <!-- Plugin CSS -->
 <link href="{{ url_for('static', filename='vendor/magnific-popup/magnific-popup.css') }}" rel="stylesheet" />
 <!-- Custom styles for this template -->
 <link href="{{ url_for('static', filename='css/creative.min.css') }}" rel="stylesheet" />
 </head>
 <body id="page-top">
 <!-- Navigation -->
 <nav class="navbar navbar-expand-lg navbar-light fixed-top" id="mainNav">
 <a class="navbar-brand" href="#page-top">Damage Estimator</a>
 <button class="navbar-toggler navbar-toggler-right" type="button" data-toggle="collapse" data-target="#navbarResponsive" aria-controls="navbarResponsive" aria-expanded="false" aria-label="Toggle navigation">
 <span class="navbar-toggler-icon"></span>
 </button>
 <div class="collapse navbar-collapse" id="navbarResponsive">
 </div>
 </nav>
 <section class="bg-primary" id="about">
 <div class="container">
 <div class="row">
 <div class="col-lg-8 mx-auto text-center">
 <h2 class="section-heading text-white">Do you have a damaged vehicle?</h2>
 <hr class="light">
 <p class="text-faded">Machine Learning allows for a classification process that is automated and makes lesser error. Besides risk group classification, Deep Learning algorithms can be applied to images of vehicle damage, allowing for automated claim classification.</p>
 <br/>
 <div class="contr"><h4 class="section-heading text-white">Select the file (image) and Upload</h4></div>
 <br/>
 <form action="upload" method="post" enctype="multipart/form-data">
 <div class="form-group">
 <input type="file" name="file" class="file">
 <div class="input-group col-xs-12">
 <span class="input-group-addon"><i class="glyphicon glyphicon-picture"></i></span>
 <input type="text" class="form-control input-lg" disabled placeholder="Upload Image">
 <span class="input-group-btn">
 <button class="browse btn btn-primary input-lg" type="button"><i class="glyphicon glyphicon-search"></i> Browse</button>
 </span>
 </div>
 </div>
 <input type="submit" class="btn btn-primary" value="Upload"><br /><br />
 </form>
 </div>
 </div>
 </section>

接着上一个脚本，让我们为核心 JavaScript 设置 Bootstrap：

 <!-- Bootstrap core JavaScript -->
 <script src="img/jquery.min.js') }}"></script>
 <script src="img/popper.min.js') }}"></script>
 <script src="img/bootstrap.min.js') }}"></script>
 <!-- Plugin JavaScript -->
 <script src="img/jquery.easing.min.js') }}"></script>
 <script src="img/scrollreveal.min.js') }}"></script>
 <script src="img/jquery.magnific-popup.min.js') }}"></script>
 <!-- Custom scripts for this template -->
 <script src="img/creative.min.js') }}"></script>
 <script>
 $(document).on('click', '.browse', function(){
 var file = $(this).parent().parent().parent().find('.file');
 file.trigger('click');
 });
 $(document).on('change', '.file', function(){
 $(this).parent().find('.form-control').val($(this).val().replace(/C:\\fakepath\\/i, ''));
 });
 </script>
 </body>
 </html>

你可以直接从 GitHub 仓库拉取文件的其余内容。一旦文件结构准备好，你可以通过命令行运行应用程序，如下所示：

$ python app.py

现在，启动你的浏览器并访问 http://localhost:5000/：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/70376108-e081-4f77-985a-fdc6d7e28fe1.png

以下是应用程序中的一些截图。

这是运行应用程序后的主页：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/900aed2c-449d-4a99-8f6e-f12f7d17c09c.png

这是上传图片后的屏幕：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/fa01449b-871f-418b-b211-b2e022243fc1.png

这是显示一辆轻微受损汽车的截图：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/6df28f3f-8d2a-40d8-b8a9-8d0dc8300130.png

由于我们的数据集规模非常小，上面的截图中的数据可能不准确。

以下是一个截图，显示了一个预测汽车的模型，但该模型没有显示出汽车：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/16421a34-bccf-4996-821e-a67347a7d33d.png

总结

在本章节中，我们已经学习了如何从零开始构建一个模型，并使用 TensorFlow 进行训练。

拥有这些知识后，我们可以在接下来的章节中开始构建更多基于 Android 和 iOS 的应用程序。

第六章：PyTorch 在 NLP 和 RNN 上的实验

在本章中，我们将深入研究 PyTorch 库在 自然语言处理（NLP）和其他实验中的应用。然后，我们将把开发的模型转换为可以在 Android 或 iOS 应用中使用的格式，使用 TensorFlow 和 CoreML。

在本章中，我们将覆盖以下主题：

PyTorch 特性和安装简介
在 PyTorch 中使用变量
构建我们自己的模型网络
分类 递归神经网络（RNN）
自然语言处理

PyTorch

PyTorch 是一个基于 Python 的库，用于执行与 GPU 相关的科学计算操作。它通过加速实验来帮助运行生产级生态系统并分布式训练库。它还提供了两个高级特性：张量计算和基于磁带的自动求导系统构建神经网络。

PyTorch 的特性

PyTorch 提供了一个端到端的深度学习系统。它的特点如下：

Python 使用：PyTorch 不仅仅是 C++ 框架的 Python 绑定。它深度集成于 Python，因此可以与其他流行的库和框架一起使用。
工具和库：它在计算机视觉和强化学习领域拥有一个活跃的研究人员和开发者社区。
灵活的前端：包括易于使用的混合模式支持，在急切模式下加速速度并实现无缝切换到图模式，以及在 C++ 运行时的功能性和优化。
云支持：支持所有主要的云平台，允许使用预构建的镜像进行无缝开发和扩展，以便能够作为生产级应用运行。
分布式训练：包括通过原生支持异步执行操作和点对点（p2p）通信来优化性能，这样我们可以同时访问 C++ 和 Python。
原生支持 ONNX：我们可以将模型导出为标准的 Open Neural Network Exchange (ONNX) 格式，以便在其他平台、运行时和可视化工具中访问。

安装 PyTorch

在编写本书时，有一个稳定版本的 PyTorch 可用，即 1.0。如果你想亲自体验最新的代码库，还可以选择使用每日预览构建版。你需要根据你的包管理器安装相应的依赖项。Anaconda 是推荐的包管理器，它会自动安装所有依赖项。LibTorch 仅适用于 C++。以下是安装 PyTorch 时可用的安装选项网格：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/17ee8f45-5596-405d-9255-31d5f2fc0fd4.png

上面的截图指定了在编写本书时使用的包网格。你可以根据硬件配置的可用性选择任何一个包网格。

要安装 PyTorch 并启动 Jupyter Notebook，请运行以下命令：

python --version
sudo brew install python3
brew install python3
pip3 install --upgrade pip
pip3 install jupyter
jupyter notebook

PyTorch 的安装过程如下图所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/c6d1e025-b6a4-4a84-b77c-fc04da37914e.png

当你启动 Jupyter Notebook 时，一个新的浏览器会话会打开，显示一个空白的笔记本，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/63b602b7-9b98-4f82-b22f-ed7fe087acb3.png

让我们先了解一下 PyTorch 的基础。

PyTorch 基础

现在 PyTorch 已经安装完成，我们可以开始实验了。我们将从 torch 和 numpy 开始。

从顶部菜单创建一个新的笔记本，并包含以下代码：

# first basic understanding on PyTorch 
# book: AI for Mobile application projects

import torch
import numpy as np

# convert numpy to tensor or vise versa
numpy_data = np.arange(8).reshape((2, 4))
torch_data = torch.from_numpy(numpy_data)
#convert tensor to array
tensor2array = torch_data.numpy()

#Print the results
print
(
 '\nnumpy array:', numpy_data, # [[0 1 2 3], [4 5 6 7]]
 '\ntorch tensor:', torch_data, # 0 1 2 3\n 4 5 6 7 [torch.LongTensor of size 2x3]
 '\ntensor to array:', tensor2array, # [[0 1 2 3], [4 5 6 7]]
)

现在，让我们进行一些数学运算：

# abs method on numpy
numpy_data = [-1, -2, 1, 2]
tensor = torch.FloatTensor(numpy_data) # 32-bit floating point

#print the results
print
(
 '\nabs',
 '\nnumpy: ', np.abs(numpy_data), # [1 2 1 2]
 '\ntorch: ', torch.abs(tensor) # [1 2 1 2]
)

# sin method on numpy
#print the results
print
(
 '\nsin',
 '\nnumpy: ', np.sin(numpy_data), # [-0.84147098 -0.90929743 0.84147098 0.90929743]
 '\ntorch: ', torch.sin(tensor) # [-0.8415 -0.9093 0.8415 0.9093]
)

让我们计算均值方法并打印结果：


#print the results
print
(
 '\nmean',
 '\nnumpy: ', np.mean(data), # 0.0
 '\ntorch: ', torch.mean(tensor) # 0.0
)

# matrix multiplication with numpy
numpy_data = [[1,2], [3,4]]
tensor = torch.FloatTensor(numpy_data) # 32-bit floating point
# correct method and print the results
print(
 '\nmatrix multiplication (matmul)',
 '\nnumpy: ', np.matmul(numpy_data, numpy_data), # [[7, 10], [15, 22]]
 '\ntorch: ', torch.mm(tensor, tensor) # [[7, 10], [15, 22]]
)

以下代码展示了数学运算的输出：

numpy array: [[0 1 2 3]
 [4 5 6 7]] 
torch tensor: tensor([[0, 1, 2, 3],
        [4, 5, 6, 7]]) 
tensor to array: [[0 1 2 3]
 [4 5 6 7]]

abs 
numpy:  [1 2 1 2] 
torch:  tensor([1., 2., 1., 2.])

sin 
numpy:  [-0.84147098 -0.90929743  0.84147098  0.90929743] 
torch:  tensor([-0.8415, -0.9093,  0.8415,  0.9093])

mean 
numpy:  0.0 
torch:  tensor(0.)

matrix multiplication (matmul) 
numpy:  [[ 7 10]
 [15 22]] 
torch:  tensor([[ 7., 10.],
        [15., 22.]])

现在，让我们来看看如何在 PyTorch 中使用不同的变量。

在 PyTorch 中使用变量

torch 中的变量用于构建计算图。每当一个变量被计算时，它都会构建一个计算图。这个计算图用于连接所有的计算步骤（节点），最终当误差反向传播时，会同时计算所有变量的修改范围（梯度）。相比之下，tensor 并不具备这种能力。我们将通过一个简单的例子来探讨这种差异：

import torch
from torch.autograd import Variable

# Variable in torch is to build a computational graph,
# So torch does not have placeholder, torch can just pass variable to the computational graph.

tensor = torch.FloatTensor([[1,2,3],[4,5,6]]) # build a tensor
variable = Variable(tensor, requires_grad=True) # build a variable, usually for compute gradients

print(tensor) # [torch.FloatTensor of size 2x3]
print(variable) # [torch.FloatTensor of size 2x3]

# till now the tensor and variable looks similar.
# However, the variable is a part of the graph, it's a part of the auto-gradient.

#Now we will calculate the mean value on tensor(X²)
t_out = torch.mean(tensor*tensor)

#Now we will calculate the mean value on variable(X²)
v_out = torch.mean(variable*variable)

现在，我们将打印所有参数的结果：


#print the results
print(t_out)
print(v_out) 
#result will be 7.5

v_out.backward() # backpropagation from v_out
# v_out = 1/4 * sum(variable*variable)
# the gradients with respect to the variable, 

#Let's print the variable gradient

print(variable.grad)
'''
 0.5000 1.0000
 1.5000 2.0000
'''

print("Resultant data in the variable: "+str(variable)) # this is data in variable

"""
Variable containing:
 1 2
 3 4
We will consider the variable as a FloatTensor
[torch.FloatTensor of size 2x2]
"""

print(variable.data) # this is data in tensor format
"""
 1 2
 3 4
We will consider the variable as FloatTensor
[torch.FloatTensor of size 2x2]
"""

#we will print the result in the numpy format
print(variable.data.numpy()) 
"""
[[ 1\. 2.]
 [ 3\. 4.]]
"""

以下是前面代码块的输出：

tensor([[1., 2., 3.],
        [4., 5., 6.]])
tensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)
tensor(15.1667)
tensor(15.1667, grad_fn=<MeanBackward1>)
tensor([[0.3333, 0.6667, 1.0000],
        [1.3333, 1.6667, 2.0000]])
Data in the variabletensor([[1., 2., 3.],
        [4., 5., 6.]], requires_grad=True)
tensor([[1., 2., 3.],
        [4., 5., 6.]])
[[1\. 2\. 3.]
 [4\. 5\. 6.]]

现在，让我们尝试使用 matplotlib 在图表上绘制数据。

在图表上绘制值

让我们做一个简单的程序，将值绘制在图表上。为此，使用以下代码：

#This line is necessary to print the output inside jupyter notebook
%matplotlib inline

import torch
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.autograd import Variable

# dummy data for the example
#lets declare linspace
x = torch.linspace(-5, 5, 200) # x data (tensor), shape=(100, 1)
x = Variable(x)
#call numpy array to plot the results 
x_np = x.data.numpy()

以下代码块列出了一些激活方法：


#RelU function
y_relu = torch.relu(x).data.numpy()
#sigmoid method
y_sigmoid = torch.sigmoid(x).data.numpy()
#tanh method
y_tanh = torch.tanh(x).data.numpy()
#softplus method
y_softplus = F.softplus(x).data.numpy() # there's no softplus in torch
# y_softmax = torch.softmax(x, dim=0).data.numpy() softmax is an activation function and it deals with probability

使用 matplotlib 激活函数：


#we will plot the activation function with matplotlib
plt.figure(1, figsize=(8, 6))
plt.subplot(221)
plt.plot(x_np, y_relu, c='red', label='relu')
plt.ylim((-1, 5))
plt.legend(loc='best')

#sigmoid activation function
plt.subplot(222)
plt.plot(x_np, y_sigmoid, c='red', label='sigmoid')
plt.ylim((-0.2, 1.2))
plt.legend(loc='best')

#tanh activation function
plt.subplot(223)
plt.plot(x_np, y_tanh, c='red', label='tanh')
plt.ylim((-1.2, 1.2))
plt.legend(loc='best')

#softplus activation function
plt.subplot(224)
plt.plot(x_np, y_softplus, c='red', label='softplus')
plt.ylim((-0.2, 6))
plt.legend(loc='best')

#call the show method to draw the graph on screen
plt.show()

让我们在图表上绘制这些值，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/8b0e9e83-cf7f-4146-b058-5e9f304ce5e4.png

请注意，前面代码的第一行是必须的，用于在 Jupyter Notebook 中绘制图表。如果你是直接从终端运行 Python 文件，可以省略代码的第一行。

构建我们自己的模型网络

在这一部分，我们将通过一步步的示例使用 PyTorch 构建我们自己的网络。

让我们从线性回归开始，作为起点。

线性回归

线性回归可能是任何人在学习机器学习时接触的第一个方法。线性回归的目标是找到一个或多个特征（自变量）与一个连续的目标变量（因变量）之间的关系，这可以在以下代码中看到。

导入所有必要的库并声明所有必要的变量：

%matplotlib inline

#Import all the necessary libraries
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

#we will define data points for both x-axis and y-axis
# x data (tensor), shape=(100, 1)
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) 
# noisy y data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size()) 

# torch can only train on Variable, so convert them to Variable
# x, y = Variable(x), Variable(y)

# plt.scatter(x.data.numpy(), y.data.numpy())
# plt.show()

我们将定义线性回归类，并运行一个简单的 nn 来解释回归：


class Net(torch.nn.Module):
 def __init__(self, n_feature, n_hidden, n_output):
 super(Net, self).__init__()
 self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
 self.predict = torch.nn.Linear(n_hidden, n_output) # output layer

def forward(self, x):
 x = F.relu(self.hidden(x)) # activation function for hidden layer
 x = self.predict(x) # linear output
 return x

net = Net(n_feature=1, n_hidden=10, n_output=1) # define the network
print(net) # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss

plt.ion() # something about plotting

for t in range(200):
 prediction = net(x) # input x and predict based on x
 loss = loss_func(prediction, y) # must be (1\. nn output, 2\. target)
 optimizer.zero_grad() # clear gradients for next train
 loss.backward() # backpropagation, compute gradients
 optimizer.step() # apply gradients
 if t % 50 == 0:

现在我们将看到如何绘制图表并展示学习过程：

     plt.cla()
     plt.scatter(x.data.numpy(), y.data.numpy())
     plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
     plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 20, 'color': 'black'})
     plt.pause(0.1)

plt.ioff()
plt.show()

让我们将这段代码的输出绘制到图表上，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/61351240-2e18-4920-986a-c70d07c5c39a.png

最终的图表如下所示，其中损失（即预测输出与实际输出之间的偏差）为 0.01：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/2f453783-0ea5-4706-b70b-d9108fe28426.png

现在，我们将开始使用 PyTorch 进行更深入的应用案例。

分类

分类问题运行神经网络模型以对输入进行分类。例如，它将衣物的图像分类为裤子、上衣和衬衫。当我们向分类模型提供更多输入时，它将预测输出的结果值。

一个简单的示例是将电子邮件过滤为垃圾邮件或非垃圾邮件。分类要么根据训练集预测分类标签，要么在分类新数据时使用的分类属性来预测分类标签（类别标签）。有许多分类模型，如朴素贝叶斯、随机森林、决策树和逻辑回归。

在这里，我们将处理一个简单的分类问题。为此，使用以下代码：

%matplotlib inline

import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

# make fake data
n_data = torch.ones(100, 2)
x0 = torch.normal(2*n_data, 1) # class0 x data (tensor), shape=(100, 2)
y0 = torch.zeros(100) # class0 y data (tensor), shape=(100, 1)
x1 = torch.normal(-2*n_data, 1) # class1 x data (tensor), shape=(100, 2)
y1 = torch.ones(100) # class1 y data (tensor), shape=(100, 1)
x = torch.cat((x0, x1), 0).type(torch.FloatTensor) # shape (200, 2) FloatTensor = 32-bit floating
y = torch.cat((y0, y1), ).type(torch.LongTensor) # shape (200,) LongTensor = 64-bit integer

class Net(torch.nn.Module):
 def __init__(self, n_feature, n_hidden, n_output):
 super(Net, self).__init__()
 self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
 self.out = torch.nn.Linear(n_hidden, n_output) # output layer

def forward(self, x):
 x = F.relu(self.hidden(x)) # activation function for hidden layer
 x = self.out(x)
 return x

net = Net(n_feature=2, n_hidden=10, n_output=2) # define the network
print(net) # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.02)
loss_func = torch.nn.CrossEntropyLoss() # the target label is NOT an one-hotted

plt.ion() # something about plotting

for t in range(100):
 out = net(x) # input x and predict based on x
 loss = loss_func(out, y) # must be (1\. nn output, 2\. target), the target label is NOT one-hotted

optimizer.zero_grad() # clear gradients for next train
 loss.backward() # backpropagation, compute gradients
 optimizer.step() # apply gradients

if t % 10 == 0:

现在，让我们绘制图表并显示学习过程：

 plt.cla()
 prediction = torch.max(out, 1)[1]
 pred_y = prediction.data.numpy()
 target_y = y.data.numpy()
 plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=pred_y, s=100, lw=0, cmap='RdYlGn')
 accuracy = float((pred_y == target_y).astype(int).sum()) / float(target_y.size)
 plt.text(1.5, -4, 'Accuracy=%.2f' % accuracy, fontdict={'size': 20, 'color': 'red'})
 plt.pause(0.1)

plt.ioff()
plt.show()

上述代码的输出如下：

Net(
  (hidden): Linear(in_features=2, out_features=10, bias=True)
  (out): Linear(in_features=10, out_features=2, bias=True)
)

我们将只从输出中选取几个图形，如以下截图所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/521a242d-fe44-4f20-ab09-3e8314c22e47.png

你可以看到随着迭代步骤数的增加，准确度水平也有所提升：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/6356bef0-a6b8-4200-9586-b482aedd91a3.png

我们可以在执行的最后一步达到 1.00 的准确度水平：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/eab78e53-d218-4ff3-b0bf-898dc7358898.png

使用 torch 构建简单神经网络

当需要启发式方法来解决问题时，神经网络是必不可少的。让我们通过以下示例来探索一个基本的神经网络：

import torch
import torch.nn.functional as F

# replace following class code with an easy sequential network
class Net(torch.nn.Module):
 def __init__(self, n_feature, n_hidden, n_output):
 super(Net, self).__init__()
 self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
 self.predict = torch.nn.Linear(n_hidden, n_output) # output layer

def forward(self, x):
 x = F.relu(self.hidden(x)) # activation function for hidden layer
 x = self.predict(x) # linear output
 return x

net1 = Net(1, 10, 1)

以下是构建网络的最简单且最快的方法：

 net2 = torch.nn.Sequential(
 torch.nn.Linear(1, 10),
 torch.nn.ReLU(),
 torch.nn.Linear(10, 1)
)

print(net1) # net1 architecture
"""
Net (
 (hidden): Linear (1 -> 10)
 (predict): Linear (10 -> 1)
)
"""

print(net2) # net2 architecture
"""
Sequential (
 (0): Linear (1 -> 10)
 (1): ReLU ()
 (2): Linear (10 -> 1)
)
"""

上述代码的输出如下：

Net(
  (hidden): Linear(in_features=1, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)
Sequential(
  (0): Linear(in_features=1, out_features=10, bias=True)
  (1): ReLU()
  (2): Linear(in_features=10, out_features=1, bias=True)
)

Out[1]:
'\nSequential (\n  (0): Linear (1 -> 10)\n  (1): ReLU ()\n  (2): Linear (10 -> 1)\n)\n'

在网络上保存和重新加载数据

让我们看看一个保存网络数据然后恢复数据的示例：

%matplotlib inline

import torch
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

# fake data
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1)

# The code below is deprecated in Pytorch 0.4\. Now, autograd directly supports tensors
# x, y = Variable(x, requires_grad=False), Variable(y, requires_grad=False)

def save():
 # save net1
 net1 = torch.nn.Sequential(
 torch.nn.Linear(1, 10),
 torch.nn.ReLU(),
 torch.nn.Linear(10, 1)
 )
 optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)
 loss_func = torch.nn.MSELoss()

for t in range(100):
 prediction = net1(x)
 loss = loss_func(prediction, y)
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()

# plot result
 plt.figure(1, figsize=(10, 3))
 plt.subplot(131)
 plt.title('Net1')
 plt.scatter(x.data.numpy(), y.data.numpy())
 plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)

保存网络的两种方式：

 torch.save(net1, 'net.pkl') # save entire net
 torch.save(net1.state_dict(), 'net_params.pkl') # save only the parameters

def restore_net():
 # restore entire net1 to net2
 net2 = torch.load('net.pkl')
 prediction = net2(x)

# plot result
 plt.subplot(132)
 plt.title('Net2')
 plt.scatter(x.data.numpy(), y.data.numpy())
 plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)

def restore_params():
 # restore only the parameters in net1 to net3
 net3 = torch.nn.Sequential(
 torch.nn.Linear(1, 10),
 torch.nn.ReLU(),
 torch.nn.Linear(10, 1)
 )

# copy net1's parameters into net3
 net3.load_state_dict(torch.load('net_params.pkl'))
 prediction = net3(x)

绘制结果：

# plot result
 plt.subplot(133)
 plt.title('Net3')
 plt.scatter(x.data.numpy(), y.data.numpy())
 plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
 plt.show()

# save net1
save()

# restore entire net (may slow)
restore_net()

# restore only the net parameters
restore_params()

代码的输出将类似于以下图表所示的图形：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/221a7acc-e365-426a-ab76-8f37e4ffff1f.png

批量运行

Torch 帮助你通过DataLoader来组织数据。我们可以使用它通过批量训练来打包数据。我们可以将自己的数据格式（例如 NumPy 数组或其他格式）加载到 Tensor 中，并进行包装。

以下是一个数据集的示例，其中随机数以批量的形式被引入数据集并进行训练：

import torch
import torch.utils.data as Data

torch.manual_seed(1) # reproducible

BATCH_SIZE = 5

x = torch.linspace(1, 10, 10) # this is x data (torch tensor)
y = torch.linspace(10, 1, 10) # this is y data (torch tensor)

torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
 dataset=torch_dataset, # torch TensorDataset format
 batch_size=BATCH_SIZE, # mini batch size
 shuffle=True, # random shuffle for training
 num_workers=2, # subprocesses for loading data
)

def show_batch():
 for epoch in range(3): # train entire dataset 3 times
 for step, (batch_x, batch_y) in enumerate(loader): # for each training step
 # train your data...
 print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
 batch_x.numpy(), '| batch y: ', batch_y.numpy())

if __name__ == '__main__':
 show_batch()

代码的输出如下：

Epoch:  0 | Step:  0 | batch x:  [ 5\.  7\. 10\.  3\.  4.] | batch y:  [6\. 4\. 1\. 8\. 7.]
Epoch:  0 | Step:  1 | batch x:  [2\. 1\. 8\. 9\. 6.] | batch y:  [ 9\. 10\.  3\.  2\.  5.]
Epoch:  1 | Step:  0 | batch x:  [ 4\.  6\.  7\. 10\.  8.] | batch y:  [7\. 5\. 4\. 1\. 3.]
Epoch:  1 | Step:  1 | batch x:  [5\. 3\. 2\. 1\. 9.] | batch y:  [ 6\.  8\.  9\. 10\.  2.]
Epoch:  2 | Step:  0 | batch x:  [ 4\.  2\.  5\.  6\. 10.] | batch y:  [7\. 9\. 6\. 5\. 1.]
Epoch:  2 | Step:  1 | batch x:  [3\. 9\. 1\. 8\. 7.] | batch y:  [ 8\.  2\. 10\.  3\.  4.]

优化算法

在我们实现神经网络时，总是存在关于应该使用哪种优化算法以获得更好输出的疑问。这是通过修改关键参数，如权重和偏差值来完成的。

这些算法用于最小化（或最大化）误差（E(x)），它依赖于内部参数。它们用于计算从模型中使用的预测变量（x）集得出的目标结果（Y）。

现在，让我们通过以下示例来看看不同类型的算法：

%matplotlib inline

import torch
import torch.utils.data as Data
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

LR = 0.01
BATCH_SIZE = 32
EPOCH = 12

# dummy dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))

# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()

将数据集放入 torch 数据集：

torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(dataset=torch_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2,)

# default network
class Net(torch.nn.Module):
 def __init__(self):
 super(Net, self).__init__()
 self.hidden = torch.nn.Linear(1, 20) # hidden layer
 self.predict = torch.nn.Linear(20, 1) # output layer

def forward(self, x):
 x = F.relu(self.hidden(x)) # activation function for hidden layer
 x = self.predict(x) # linear output
 return x

if __name__ == '__main__':
 # different nets
 net_SGD = Net()
 net_Momentum = Net()
 net_RMSprop = Net()
 net_Adam = Net()
 nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]

# different optimizers
 opt_SGD = torch.optim.SGD(net_SGD.parameters(), lr=LR)
 opt_Momentum = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
 opt_RMSprop = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
 opt_Adam = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
 optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]

loss_func = torch.nn.MSELoss()
 losses_his = [[], [], [], []] # record loss

训练模型并进行多个周期：


 for epoch in range(EPOCH):
 print('Epoch: ', epoch)
 for step, (b_x, b_y) in enumerate(loader): # for each training step
 for net, opt, l_his in zip(nets, optimizers, losses_his):
 output = net(b_x) # get output for every net
 loss = loss_func(output, b_y) # compute loss for every net
 opt.zero_grad() # clear gradients for next train
 loss.backward() # backpropagation, compute gradients
 opt.step() # apply gradients
 l_his.append(loss.data.numpy()) # loss recoder

labels = ['SGD', 'Momentum', 'RMSprop', 'Adam']
 for i, l_his in enumerate(losses_his):
 plt.plot(l_his, label=labels[i])
 plt.legend(loc='best')
 plt.xlabel('Steps')
 plt.ylabel('Loss')
 plt.ylim((0, 0.2))
 plt.show()

执行上述代码块的输出显示在以下图表中：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/94766868-218f-469f-bf52-4d762dae6b47.png

Epoch 计数的输出将如下所示：

Epoch: 0
Epoch:  1
Epoch:  2
Epoch:  3
Epoch:  4
Epoch:  5
Epoch:  6
Epoch:  7
Epoch:  8
Epoch:  9
Epoch:  10
Epoch:  11

我们将绘制所有优化器，并将它们表示在图表中，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/747f9f2a-2f1a-4180-bc46-f6f3d32b9554.png

在下一部分，我们将讨论 RNN。

循环神经网络

使用 RNN 时，与前馈神经网络不同，我们可以利用内部记忆按顺序处理输入。在 RNN 中，节点之间的连接沿时间序列形成一个有向图。这有助于将任务分配给 RNN，处理大量未分割且互相关联的语音或字符识别。

MNIST 数据库

MNIST 数据库包含 60,000 个手写数字。此外，还有一个由 10,000 个数字组成的测试数据集。虽然它是 NIST 数据集的一个子集，但该数据集中的所有数字都进行了大小标准化，并且已经居中在一个 28 x 28 像素的图像中。这里，每个像素的值为 0-255，表示其灰度值。

MNIST 数据集可以在 yann.lecun.com/exdb/mnist/ 找到。

NIST 数据集可以在 www.nist.gov/srd/nist-special-database-19 找到。

RNN 分类

在这里，我们将看一个例子，展示如何构建一个 RNN 来识别 MNIST 数据库中的手写数字：

import torch
from torch import nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

# Hyper Parameters
EPOCH = 1 # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 64
TIME_STEP = 28 # rnn time step / image height
INPUT_SIZE = 28 # rnn input size / image width
LR = 0.01 # learning rate
DOWNLOAD_MNIST = True # set to True if haven't download the data

# Mnist digital dataset
train_data = dsets.MNIST(
 root='./mnist/',
 train=True, # this is training data
 transform=transforms.ToTensor(), # Converts a PIL.Image or numpy.ndarray to
 # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
 download=DOWNLOAD_MNIST, # download it if you don't have it
)

绘制一个示例：


print(train_data.train_data.size()) # (60000, 28, 28)
print(train_data.train_labels.size()) # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[0])
plt.show()

# Data Loader for easy mini-batch return in training
train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

将测试数据转换为变量，选择 2000 个样本加速测试：


test_data = dsets.MNIST(root='./mnist/', train=False, transform=transforms.ToTensor())
test_x = test_data.test_data.type(torch.FloatTensor)[:2000]/255\. # shape (2000, 28, 28) value in range(0,1)
test_y = test_data.test_labels.numpy()[:2000] # covert to numpy array

class RNN(nn.Module):
 def __init__(self):
 super(RNN, self).__init__()

self.rnn = nn.LSTM( # if use nn.RNN(), it hardly learns
 input_size=INPUT_SIZE,
 hidden_size=64, # rnn hidden unit
 num_layers=1, # number of rnn layer
 batch_first=True, # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)
 )

self.out = nn.Linear(64, 10)

def forward(self, x):
 # x shape (batch, time_step, input_size)
 # r_out shape (batch, time_step, output_size)
 # h_n shape (n_layers, batch, hidden_size)
 # h_c shape (n_layers, batch, hidden_size)
 r_out, (h_n, h_c) = self.rnn(x, None) # None represents zero initial hidden state

# choose r_out at the last time step
 out = self.out(r_out[:, -1, :])
 return out

rnn = RNN()
print(rnn)

optimizer = torch.optim.Adam(rnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted

训练和测试不同的 Epoch：

for epoch in range(EPOCH):
 for step, (b_x, b_y) in enumerate(train_loader): # gives batch data
 b_x = b_x.view(-1, 28, 28) # reshape x to (batch, time_step, input_size)

output = rnn(b_x) # rnn output
 loss = loss_func(output, b_y) # cross entropy loss
 optimizer.zero_grad() # clear gradients for this training step
 loss.backward() # backpropagation, compute gradients
 optimizer.step() # apply gradients

if step % 50 == 0:
 test_output = rnn(test_x) # (samples, time_step, input_size)
 pred_y = torch.max(test_output, 1)[1].data.numpy()
 accuracy = float((pred_y == test_y).astype(int).sum()) / float(test_y.size)
 print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)

# print 10 predictions from test data
test_output = rnn(test_x[:10].view(-1, 28, 28))
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10], 'real number')

需要下载并解压以下文件以训练图像：

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./mnist/MNIST/raw/train-images-idx3-ubyte.gz
100.1%
Extracting ./mnist/MNIST/raw/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./mnist/MNIST/raw/train-labels-idx1-ubyte.gz
113.5%
Extracting ./mnist/MNIST/raw/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
100.4%
Extracting ./mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
180.4%
Extracting ./mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz
Processing...
Done!
torch.Size([60000, 28, 28])
torch.Size([60000])
/usr/local/lib/python3.7/site-packages/torchvision/datasets/mnist.py:53: UserWarning: train_data has been renamed data
  warnings.warn("train_data has been renamed data")
/usr/local/lib/python3.7/site-packages/torchvision/datasets/mnist.py:43: UserWarning: train_labels has been renamed targets
 warnings.warn("train_labels has been renamed targets")

上述代码的输出结果如下：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/b9e4d3d4-eb13-4231-9257-661e0a37dcf2.png

让我们通过以下代码进一步处理：

/usr/local/lib/python3.7/site-packages/torchvision/datasets/mnist.py:58: UserWarning: test_data has been renamed data
  warnings.warn("test_data has been renamed data")
/usr/local/lib/python3.7/site-packages/torchvision/datasets/mnist.py:48: UserWarning: test_labels has been renamed targets
  warnings.warn("test_labels has been renamed targets")

RNN(
  (rnn): LSTM(28, 64, batch_first=True)
  (out): Linear(in_features=64, out_features=10, bias=True)
)

Epoch 输出结果如下：

Epoch:  0 | train loss: 2.3156 | test accuracy: 0.12
Epoch:  0 | train loss: 1.1875 | test accuracy: 0.57
Epoch:  0 | train loss: 0.7739 | test accuracy: 0.68
Epoch:  0 | train loss: 0.8689 | test accuracy: 0.73
Epoch:  0 | train loss: 0.5322 | test accuracy: 0.83
Epoch:  0 | train loss: 0.3657 | test accuracy: 0.83
Epoch:  0 | train loss: 0.2960 | test accuracy: 0.88
Epoch:  0 | train loss: 0.3869 | test accuracy: 0.90
Epoch:  0 | train loss: 0.1694 | test accuracy: 0.92
Epoch:  0 | train loss: 0.0869 | test accuracy: 0.93
Epoch:  0 | train loss: 0.2825 | test accuracy: 0.91
Epoch:  0 | train loss: 0.2392 | test accuracy: 0.94
Epoch:  0 | train loss: 0.0994 | test accuracy: 0.91
Epoch:  0 | train loss: 0.3731 | test accuracy: 0.94
Epoch:  0 | train loss: 0.0959 | test accuracy: 0.94
Epoch:  0 | train loss: 0.1991 | test accuracy: 0.95
Epoch:  0 | train loss: 0.0711 | test accuracy: 0.94
Epoch:  0 | train loss: 0.2882 | test accuracy: 0.96
Epoch:  0 | train loss: 0.4420 | test accuracy: 0.95
[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

RNN 循环神经网络 – 回归

现在，我们将处理一个基于 RNN 的回归问题。循环神经网络为神经网络提供了记忆功能。对于序列数据，循环神经网络可以实现更好的效果。在这个例子中，我们将使用 RNN 来预测时间序列数据。

要了解更多关于循环神经网络的信息，请访问 iopscience.iop.org/article/10.1209/0295-5075/18/3/003/meta。

以下代码用于逻辑回归：

%matplotlib inline

import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt

# torch.manual_seed(1) # reproducible

# Hyper Parameters
TIME_STEP = 10 # rnn time step
INPUT_SIZE = 1 # rnn input size
LR = 0.02 # learning rate

# show data
steps = np.linspace(0, np.pi*2, 100, dtype=np.float32) # float32 for converting torch FloatTensor
x_np = np.sin(steps)
y_np = np.cos(steps)
plt.plot(steps, y_np, 'r-', label='target (cos)')
plt.plot(steps, x_np, 'b-', label='input (sin)')
plt.legend(loc='best')
plt.show()

RNN 类在以下代码中定义。我们将以线性方式使用 r_out 计算预测输出。我们也可以使用 for 循环与 torch.stack 来计算预测输出：


class RNN(nn.Module):
 def __init__(self):
     super(RNN, self).__init__()

 self.rnn = nn.RNN(
 input_size=INPUT_SIZE,
 hidden_size=32, # rnn hidden unit
 num_layers=1, # number of rnn layer
 batch_first=True, # input & output will have batch size as 1s dimension. e.g. (batch, time_step, input_size)
 )
 self.out = nn.Linear(32, 1)

 def forward(self, x, h_state):
     # x (batch, time_step, input_size)
     # h_state (n_layers, batch, hidden_size)
     # r_out (batch, time_step, hidden_size)
     r_out, h_state = self.rnn(x, h_state)

     outs = [] # save all predictions
     for time_step in range(r_out.size(1)):                                              outs.append(self.out(r_out[:, time_step, :]))
     return torch.stack(outs, dim=1), h_state

//instantiate RNN
rnn = RNN()
print(rnn)

输出结果如下：

"""
RNN (
 (rnn): RNN(1, 32, batch_first=True)
 (out): Linear (32 -> 1)
)
"""

我们现在需要优化 RNN 参数，如下代码所示，在运行 for 循环以进行预测之前：

optimizer = torch.optim.Adam(rnn.parameters(), lr=LR) 
loss_func = nn.MSELoss()
h_state = None
plt.figure(1, figsize=(12, 5))
plt.ion()

以下代码块运行时会呈现动态效果，但在本书中无法展示。我们添加了一些截图帮助你理解这一效果。我们使用 x 作为输入的 sin 值，y 作为输出的拟合 cos 值。由于这两条曲线之间存在关系，我们将使用 sin 来预测 cos：

for step in range(100):
 start, end = step * np.pi, (step+1)*np.pi # time range
 # use sin predicts cos
 steps = np.linspace(start, end, TIME_STEP, dtype=np.float32, endpoint=False) # float32 for converting torch FloatTensor
 x_np = np.sin(steps)
 y_np = np.cos(steps)

 x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis]) # shape (batch, time_step, input_size)
 y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])

 prediction, h_state = rnn(x, h_state) # rnn output

h_state = h_state.data # repack the hidden state, break the connection from last iteration

 loss = loss_func(prediction, y) # calculate loss
 optimizer.zero_grad() # clear gradients for this training step
 loss.backward() # backpropagation, compute gradients
 optimizer.step() # apply gradients

绘制结果：


 plt.plot(steps, y_np.flatten(), 'r-')
 plt.plot(steps, prediction.data.numpy().flatten(), 'b-')
 plt.draw(); plt.pause(0.05)

plt.ioff()
plt.show()

前述代码的输出如下：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/30e1a235-cfa4-4e32-8513-e9ac707ebcf2.png

以下是第 10 次迭代后生成的图形：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/1e9db5f3-5b02-4c09-8269-9eb57193c681.png

以下是第 25 次迭代后生成的图形：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/8aa151a8-b47a-41ba-9b09-faa0635f0a87.png

我们不会在这里展示所有 100 次迭代的输出图像，而是直接跳到最终的输出，即第 100 次迭代，如下截图所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/93c8dff1-5c2e-4c1d-8efb-e6f16c330095.png

在接下来的部分，我们将探讨自然语言处理（NLP）。

自然语言处理

现在是时候利用 PyTorch 尝试一些自然语言处理技术了。这对那些之前没有在任何深度学习框架中编写代码的人特别有用，尤其是那些对 NLP 核心问题和算法有更好理解的人。

在这一章中，我们将通过简单的小维度示例来观察神经网络训练过程中层权重的变化。一旦你理解了网络的工作原理，就可以尝试自己的模型。

在处理任何基于 NLP 的问题之前，我们需要理解深度学习的基本构件，包括仿射映射、非线性和目标函数。

仿射映射

仿射映射 是深度学习的基本构件之一，如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/9ac3ea96-07ab-425b-9c2b-2487fec88f51.png

在这种情况下，矩阵由 A 表示，向量由 x 和 b 表示。A 和 b 是需要学习的参数，而 b 是偏置。

解释这个的简单示例如下：

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)
lin = nn.Linear(6, 3) # maps from R⁶ to R³, parameters A, b
# data is 2x5\. A maps from 6 to 3... can we map "data" under A?
data = torch.randn(2, 6)
print(lin(data)

之后，使用以下命令运行程序：

$ python3 torch_nlp.py

输出将如下所示：

tensor([[ 1.1105, -0.1102, -0.3235],
        [ 0.4800,  0.1633, -0.2515]], grad_fn=<AddmmBackward>)

非线性

首先，我们需要明确为什么我们需要非线性。考虑我们有两个仿射映射：f(x)=Ax+b 和 g(x)=Cx+d。f(g(x)) 如下所示：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/3c3d54a3-b248-4007-8b84-6f120f64839a.png

在这里，我们可以看到，当仿射映射组合在一起时，结果仍然是一个仿射映射，其中 Ad+b 是一个向量，AC 是一个矩阵。

我们可以将神经网络视为一系列仿射组合。以前，仿射层之间可能引入了非线性。但是幸运的是，现在已经不是这样了，这有助于构建更强大且高效的模型。

在使用最常见的函数（如 tanh (x)、σ(x) 和 ReLU (x)）时，我们看到有一些核心的非线性，如下所示的代码块所示：

#let's see more about non-linearities
#Most of the non-linearities in PyTorch are present in torch.functional which we import as F)
# Please make a note that unlike affine maps, there are mostly no parameters in non-linearites 
# That is, they don't have weights that are updated during training.
#This means that during training the weights are not updated.
data = torch.randn(2, 2)
print(data)
print(F.relu(data))

上述代码的输出如下：

tensor([[ 0.5848, 0.2149],
 [-0.4090, -0.1663]])
tensor([[0.5848, 0.2149],
 [0.0000, 0.0000]])

目标函数

目标函数（也称为损失函数或代价函数）将帮助您的网络进行最小化。它通过选择一个训练实例，将其传递通过神经网络，然后计算输出的损失来工作。

损失函数的导数会被更新，以找到模型的参数。例如，如果模型自信地预测了一个答案，而答案结果是错误的，那么计算出的损失将会很高。如果预测答案正确，那么损失较低。

网络如何最小化？

首先，函数将选择一个训练实例
然后，它通过我们的神经网络传递以获得输出
最后，输出的损失被计算出来

在我们的训练示例中，我们需要最小化损失函数，以减少使用实际数据集时错误结果的概率。

在 PyTorch 中构建网络组件

在将注意力转向 NLP 之前，在本节中我们将使用非线性激活函数和仿射映射在 PyTorch 中构建一个网络。在这个例子中，我们将学习如何使用 PyTorch 内置的负对数似然（negative log likelihood）来计算损失函数，并使用反向传播更新参数。

请注意，网络的所有组件需要继承自nn.Module，并且还需要重写forward()方法。考虑到样板代码，这些是我们应该记住的细节。当我们从nn.Module继承这些组件时，网络组件会提供相应的功能。

现在，如前所述，我们将看一个例子，其中网络接收一个稀疏的词袋（BoW）表示，输出是一个概率分布到两个标签，即英语和西班牙语。同时，这个模型是逻辑回归的一个例子。

使用逻辑回归的 BoW 分类器

概率将被记录在我们的两个标签“英语”和“西班牙语”上，我们生成的模型将映射一个稀疏的 BoW 表示。在词汇表中，我们会为每个词分配一个索引。假设在我们的词汇表中有两个词，即 hello 和 world，它们的索引分别是零和一。例如，对于句子 hello hello hello hello hello, 其 BoW 向量是 [5,0]。类似地，hello world world hello world 的 BoW 向量是 [2,3]，以此类推。

通常，它是 [Count(hello), Count(world)]。

让我们将 BOW 向量表示为 x.

网络的输出如下：

https://github.com/OpenDocCN/freelearn-dl-pt6-zh/raw/master/docs/mobi-ai-proj/img/8905bb53-f8c9-4a9a-9d76-58c0c236c43e.png

接下来，我们需要通过仿射映射传递输入，然后使用 log softmax：

data = [("El que lee mucho y anda mucho, ve mucho y sabe mucho".split(), "SPANISH"),
 ("The one who reads a lot and walks a lot, sees a lot and knows a lot.".split(), "ENGLISH"),
 ("Nunca es tarde si la dicha es buena".split(), "SPANISH"),
 ("It is never late if the joy is good".split(), "ENGLISH")]

test_data = [("Que cada palo aguante su vela".split(), "SPANISH"),
 ("May every mast hold its own sail".split(), "ENGLISH")]

#each word in the vocabulary is mapped to an unique integer using word_to_ix, and that will be considered as that word's index in BOW

word_to_ix = {}
for sent, _ in data + test_data:
 for word in sent:
 if word not in word_to_ix:
 word_to_ix[word] = len(word_to_ix)
print(word_to_ix)

VOCAB_SIZE = len(word_to_ix)
NUM_LABELS = 2

class BoWClassifier(nn.Module): # inheriting from nn.Module!

def __init__(self, num_labels, vocab_size):

#This calls the init function of nn.Module. The syntax might confuse you, but don't be confused. Remember to do it in nn.module 

 super(BoWClassifier, self).__init__()

接下来，我们将定义所需的参数。在这里，这些参数是A和B，以下代码块解释了进一步所需的实现：

 # let's look at the prarmeters required for affine mapping
 # nn.Linear() is defined using Torch that gives us the affine maps.
#We need to ensure that we understand why the input dimension is vocab_size
 # num_labels is the output
 self.linear = nn.Linear(vocab_size, num_labels)

# Important thing to remember: parameters are not present in the non-linearity log softmax. So, let's now think about that.

def forward(self, bow_vec):
 #first, the input is passed through the linear layer
 #then it is passed through log_softmax
 #torch.nn.functional contains other non-linearities and many other fuctions

 return F.log_softmax(self.linear(bow_vec), dim=1)

def make_bow_vector(sentence, word_to_ix):
 vec = torch.zeros(len(word_to_ix))
 for word in sentence:
 vec[word_to_ix[word]] += 1
 return vec.view(1, -1)

def make_target(label, label_to_ix):
 return torch.LongTensor([label_to_ix[label]])

model = BoWClassifier(NUM_LABELS, VOCAB_SIZE)

现在，模型知道了自己的参数。第一个输出是A，第二个是B，如下所示：

#A component is assigned to a class variable in the __init__ function
# of a module, which was done with the line
# self.linear = nn.Linear(...)

# Then from the PyTorch devs, knowledge of the nn.linear's parameters #is stored by the module (here-BoW Classifier)

for param in model.parameters():
 print(param)

#Pass a BoW vector for running the model
# the code is wrapped since we don't need to train it
torch.no_grad()
with torch.no_grad():
 sample = data[0]
 bow_vector = make_bow_vector(sample[0], word_to_ix)
 log_probs = model(bow_vector)
 print(log_probs)

上述代码的输出如下：


{'El': 0, 'que': 1, 'lee': 2, 'mucho': 3, 'y': 4, 'anda': 5, 'mucho,': 6, 've': 7, 'sabe': 8, 'The': 9, 'one': 10, 'who': 11, 'reads': 12, 'a': 13, 'lot': 14, 'and': 15, 'walks': 16, 'lot,': 17, 'sees': 18, 'knows': 19, 'lot.': 20, 'Nunca': 21, 'es': 22, 'tarde': 23, 'si': 24, 'la': 25, 'dicha': 26, 'buena': 27, 'It': 28, 'is': 29, 'never': 30, 'late': 31, 'if': 32, 'the': 33, 'joy': 34, 'good': 35, 'Que': 36, 'cada': 37, 'palo': 38, 'aguante': 39, 'su': 40, 'vela': 41, 'May': 42, 'every': 43, 'mast': 44, 'hold': 45, 'its': 46, 'own': 47, 'sail': 48}
Parameter containing:
tensor([[-0.0347, 0.1423, 0.1145, -0.0067, -0.0954, 0.0870, 0.0443, -0.0923,
 0.0928, 0.0867, 0.1267, -0.0801, -0.0235, -0.0028, 0.0209, -0.1084,
 -0.1014, 0.0777, -0.0335, 0.0698, 0.0081, 0.0469, 0.0314, 0.0519,
 0.0708, -0.1323, 0.0719, -0.1004, -0.1078, 0.0087, -0.0243, 0.0839,
 -0.0827, -0.1270, 0.1040, -0.0212, 0.0804, 0.0459, -0.1071, 0.0287,
 0.0343, -0.0957, -0.0678, 0.0487, 0.0256, -0.0608, -0.0432, 0.1308,
 -0.0264],
 [ 0.0805, 0.0619, -0.0923, -0.1215, 0.1371, 0.0075, 0.0979, 0.0296,
 0.0459, 0.1067, 0.1355, -0.0948, 0.0179, 0.1066, 0.1035, 0.0887,
 -0.1034, -0.1029, -0.0864, 0.0179, 0.1424, -0.0902, 0.0761, -0.0791,
 -0.1343, -0.0304, 0.0823, 0.1326, -0.0887, 0.0310, 0.1233, 0.0947,
 0.0890, 0.1015, 0.0904, 0.0369, -0.0977, -0.1200, -0.0655, -0.0166,
 -0.0876, 0.0523, 0.0442, -0.0323, 0.0549, 0.0462, 0.0872, 0.0962,
 -0.0484]], requires_grad=True)
Parameter containing:
tensor([ 0.1396, -0.0165], requires_grad=True)
tensor([[-0.6171, -0.7755]])

我们得到了张量输出值。但是，正如我们从前面的代码中看到的，这些值与对数概率并不对应，无论哪个是English，哪个对应的是单词Spanish。我们需要训练模型，为此将这些值映射到对数概率是很重要的。

label_to_ix = {"SPANISH": 0, "ENGLISH": 1}

那么我们开始训练我们的模型吧。我们首先通过模型传递实例，得到这些对数概率。然后计算损失函数，损失函数计算完成后，我们计算该损失函数的梯度。最后，使用梯度更新参数。PyTorch 中的nn包提供了损失函数。我们需要使用 nn.NLLLoss()作为负对数似然损失。优化函数也在torch.optim中定义。

在这里，我们将使用随机梯度下降法（SGD）：

# Pass the BoW vector for running the model
# the code is wrapped since we don't need to train it
torch.no_grad()

with torch.no_grad():
 sample = data[0]
 bow_vector = make_bow_vector(sample[0], word_to_ix)
 log_probs = model(bow_vector)
 print(log_probs)

# We will run this on data that can be tested temporarily, before training, just to check the before and after difference using touch.no_grad():

with torch.no_grad():
 for instance, label in test_data:
 bow_vec = make_bow_vector(instance, word_to_ix)
 log_probs = model(bow_vec)
 print(log_probs)

#The matrix column corresponding to "creo" is printed
print(next(model.parameters())[:, word_to_ix["mucho"]])

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

我们不想毫无理由地一次次传递训练数据。实际数据集有多个实例，而不仅仅是 2 个。合理的做法是将模型训练在 5 到 30 个 epoch 之间。

以下代码展示了我们示例的范围：

for epoch in range(100):
 for instance, label in data:
 # Firstly, remember that gradients are accumulated by PyTorch
 # It's important that we clear those gradients before each instance
 model.zero_grad()

#The next step is to prepare our BOW vector and the target should be #wrapped in also we must wrap the target in a tensor in the form of an #integer
 # For example, as considered above, if the target word is SPANISH, #then, the integer wrapped should be 0
#The loss function is already trained to understand that when the 0th element among the log probabilities is the one that is in accordance to SPANISH label

 bow_vec = make_bow_vector(instance, word_to_ix)
 target = make_target(label, label_to_ix)

# Next step is to run the forward pass
 log_probs = model(bow_vec)

在这里，我们将通过调用函数 optimizer.step()来计算各种因素，如损失、梯度和更新参数：


 loss = loss_function(log_probs, target)
 loss.backward()
 optimizer.step()

with torch.no_grad():
 for instance, label in test_data:
 bow_vec = make_bow_vector(instance, word_to_ix)
 log_probs = model(bow_vec)
 print(log_probs)

# After computing and the results, we see that the index that corresponds to Spanish has gone up, and for English is has gone down!
print(next(model.parameters())[:, word_to_ix["mucho"]])

输出如下：


tensor([[-0.7653, -0.6258]])
tensor([[-1.0456, -0.4331]])
tensor([-0.0071, -0.0462], grad_fn=<SelectBackward>)
tensor([[-0.1546, -1.9433]])
tensor([[-0.9623, -0.4813]])
tensor([ 0.4421, -0.4954], grad_fn=<SelectBackward>)