TensorFlow 机器学习秘籍（四）-优快云博客

原文：annas-archive.org/md5/fdd2c78f646d889f873e10b2a8485875

译者：飞龙

协议：CC BY-NC-SA 4.0

第十章：Transformers

Transformers 是 Google 在 2017 年提出的深度学习架构，旨在处理序列数据，用于下游任务，如翻译、问答或文本摘要。这样，它们旨在解决与第九章《循环神经网络》中讨论的 RNNs 类似的问题，但 Transformers 具有显著优势，因为它们不需要按顺序处理数据。除此之外，这使得更高程度的并行化成为可能，从而加速了训练过程。

由于其灵活性，Transformers 可以在大量未标记的数据上进行预训练，然后再针对其他任务进行微调。这些预训练模型的两大主要类型是双向编码器表示的 Transformers（BERT）和生成预训练 Transformers（GPT）。

在本章中，我们将涵盖以下主题：

文本生成
情感分析
文本分类：讽刺检测
问答

我们将首先展示 GPT-2 的文本生成能力——这是最受广大用户使用的 Transformers 架构之一。虽然情感分析也可以由 RNNs 处理（如前一章所示），但正是生成能力最能清晰地展示 Transformers 在自然语言处理中的影响。

文本生成

第一个 GPT 模型是在 2018 年由 OpenAI 的 Radford 等人发布的论文中介绍的——它展示了生成性语言模型如何通过在大量多样的连续文本语料库上进行预训练，获得知识并处理长程依赖关系。随后几年发布了两个继任模型（在更大语料库上训练）：2019 年的 GPT-2（15 亿参数）和 2020 年的 GPT-3（1750 亿参数）。为了在演示能力和计算需求之间取得平衡，我们将使用 GPT-2——截至本文编写时，GPT-3 的 API 访问受到限制。

我们将通过展示如何基于给定提示生成自己的文本，来开始使用 GPT-2 模型，而不进行任何微调。

我们该如何进行？

我们将利用由 Hugging Face 创建的优秀 Transformers 库（huggingface.co/）。它抽象了构建过程中的多个组件，使我们能够专注于模型性能和预期表现。

像往常一样，我们首先加载所需的包：

#get deep learning basics
import tensorflow as tf

Transformers 库的一个优势——也是其流行的原因之一——是我们可以轻松下载特定模型（并且还可以定义合适的分词器）：

from transformers import TFGPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large")
GPT2 = TFGPT2LMHeadModel.from_pretrained("gpt2-large", pad_token_id=tokenizer.eos_token_id)

通常，固定随机种子是一个好主意，以确保结果的可重复性：

# settings
#for reproducability
SEED = 34
tf.random.set_seed(SEED)
#maximum number of words in output text
MAX_LEN = 70

有关 Transformer 中解码器架构的详细描述，请参阅本节末尾的另见部分——现在，我们将重点关注解码方式，它是使用 GPT-2 模型时最重要的决策之一。下面，我们将回顾一些可用的方法。

使用贪心搜索，预测具有最高概率的单词作为序列中的下一个单词：

input_sequence = "There are times when I am really tired of people, but I feel lonely too."

一旦我们有了输入序列，就将其编码，然后调用decode方法：

# encode context the generation is conditioned on
input_ids = tokenizer.encode(input_sequence, return_tensors='tf')
# generate text until the output length (which includes the context length) reaches 70
greedy_output = GPT2.generate(input_ids, max_length = MAX_LEN)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens = True))
Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I'm alone in the world. I feel like I'm alone in my own body. I feel like I'm alone in my own mind. I feel like I'm alone in my own heart. I feel like I'm alone in my own mind

如你所见，结果仍有改进空间：模型开始自我重复，因为高概率词汇掩盖了低概率词汇，使其无法探索更多样化的组合。

一种简单的解决方法是束搜索：我们跟踪备选变体，从而使得更多的比较成为可能：

# set return_num_sequences > 1
beam_outputs = GPT2.generate(
    input_ids, 
    max_length = MAX_LEN, 
    num_beams = 5, 
    no_repeat_ngram_size = 2, 
    num_return_sequences = 5, 
    early_stopping = True
)
print('')
print("Output:\n" + 100 * '-')
# now we have 5 output sequences
for i, beam_output in enumerate(beam_outputs):
      print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_      tokens=True)))
Output:
----------------------------------------------------------------------------------------------------
0: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."
"I feel like I can't do anything right now," she said. "I'm so tired."
1: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."
"I feel like I can't do anything right now," she says. "I'm so tired."
2: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."
"I feel like I can't do anything right now," she says. "I'm not sure what I'm supposed to be doing with my life."
3: There are times when I am really tired of people, but I feel lonely too. I don''t know what to do with myself.""
"I feel like I can't do anything right now," she says. "I'm not sure what I'm supposed to be doing."
4: There are times when I am really tired of people, but I feel lonely too. I don't know what to do with myself."
"I feel like I can't do anything right now," she says. "I'm not sure what I should do."

这确实更加多样化——信息相同，但至少从风格上来看，表达方式有所不同。

接下来，我们可以探索采样——不确定性解码。我们并不严格按照路径来找到具有最高概率的最终文本，而是根据条件概率分布随机选择下一个单词。这个方法有可能生成不连贯的胡言乱语，因此我们使用temperature参数，它会影响概率质量分布：

# use temperature to decrease the sensitivity to low probability candidates
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_k = 0, 
                             temperature = 0.2
)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True))
Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I'm alone in my own world. I feel like I'm alone in my own life. I feel like I'm alone in my own mind. I feel like I'm alone in my own heart. I feel like I'm alone in my own

稍微有点诗意地说，若我们提高温度，会发生什么呢？

sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_k = 0, 
                             temperature = 0.8
)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True))
Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I find it strange how the people around me seem to be always so nice. The only time I feel lonely is when I'm on the road. I can't be alone with my thoughts.
What are some of your favourite things to do in the area

这变得更加有趣，尽管它仍然有点像思路流——这或许是可以预见的，考虑到我们提示词的内容。让我们探索更多调优输出的方法。

在Top-K 采样中，选择最有可能的前k个单词，并将整个概率质量转移到这k个单词上。因此，我们并不增加高概率词汇出现的机会或减少低概率词汇的机会，而是直接将低概率词汇完全移除：

#sample from only top_k most likely words
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_k = 50
)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')
Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I go to a place where you can feel comfortable. It's a place where you can relax. But if you're so tired of going along with the rules, maybe I won't go. You know what? Maybe if I don't go, you won''t ...

这似乎是朝着正确方向迈出的步伐。我们能做得更好吗？

Top-P 采样（也叫做核采样）类似于 Top-K 采样，但不是选择最有可能的前k个单词，而是选择概率总和大于p的最小单词集合，然后将整个概率质量转移到该集合中的单词上。这里的主要区别是，Top-K 采样中，单词集合的大小是静态的（显然），而在 Top-P 采样中，集合的大小可以变化。要使用这种采样方法，我们只需设置top_k = 0并选择一个top_p值：

#sample only from 80% most likely words
sample_output = GPT2.generate(
                             input_ids, 
                             do_sample = True, 
                             max_length = MAX_LEN, 
                             top_p = 0.8, 
                             top_k = 0
)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens = True), '...')
Output:
----------------------------------------------------------------------------------------------------
There are times when I am really tired of people, but I feel lonely too. I feel like I should just be standing there, just sitting there. I know I'm not a danger to anybody. I just feel alone." ...

我们可以结合这两种方法：

#combine both sampling techniques
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = 2*MAX_LEN,                              #to test how long we can generate and it be coherent
                              #temperature = .7,
                              top_k = 50, 
                              top_p = 0.85, 
                              num_return_sequences = 5
)
print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_    special_tokens = True)))
    print('')
Output:
----------------------------------------------------------------------------------------------------
0: There are times when I am really tired of people, but I feel lonely too. I don't feel like I am being respected by my own country, which is why I am trying to change the government."
In a recent video posted to YouTube, Mr. Jaleel, dressed in a suit and tie, talks about his life in Pakistan and his frustration at his treatment by the country's law enforcement agencies. He also describes how he met a young woman from California who helped him organize the protest in Washington.
"She was a journalist who worked with a television channel in Pakistan," Mr. Jaleel says in the video. "She came to my home one day,...
1: There are times when I am really tired of people, but I feel lonely too. It's not that I don't like to be around other people, but it's just something I have to face sometimes.
What is your favorite thing to eat?
The most favorite thing I have eaten is chicken and waffles. But I love rice, soups, and even noodles. I also like to eat bread, but I like it a little bit less.
What is your ideal day of eating?
It varies every day. Sometimes I want to eat at home, because I'm in a house with my family. But then sometimes I just have to have some sort...
2: There are times when I am really tired of people, but I feel lonely too. I think that there is something in my heart that is trying to be a better person, but I don't know what that is."
So what can be done?
"I want people to take the time to think about this," says Jorja, who lives in a small town outside of Boston.
She has been thinking a lot about her depression. She wants to make a documentary about it, and she wants to start a blog about it.
"I want to make a video to be a support system for people who are going through the same thing I was going through...
3: There are times when I am really tired of people, but I feel lonely too.
I want to be able to take good care of myself. I am going to be a very good person, even if I am lonely.
So, if it's lonely, then I will be happy. I will be a person who will be able to have good care of myself.
I have made this wish.
What is my hope? What is my goal? I want to do my best to be able to meet it, but…
"Yuu, what are you saying, Yuu?"
"Uwa, what is it?"
I...
4: There are times when I am really tired of people, but I feel lonely too. The only person I really love is my family. It's just that I'm not alone."
-Juan, 24, a student
A study from the European Economic Area, a free trade area between the EU and Iceland, showed that there are 2.3 million EU citizens living in Iceland. Another survey in 2014 showed that 1.3 million people in Iceland were employed.
The government is committed to making Iceland a country where everyone can live and work.
"We are here to help, not to steal," said one of the people who drove up in a Volkswagen.
...

显然，更复杂的方法设置可以给我们带来相当令人印象深刻的结果。让我们进一步探索这个方向——我们将使用从 OpenAI 的 GPT-2 网站上获取的提示词，并将其输入完整的 GPT-2 模型。这种对比将让我们了解本地（较小）模型与用于原始演示的大型模型之间的表现差异：

MAX_LEN = 500
prompt1 = 'In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.'
input_ids = tokenizer.encode(prompt1, return_tensors='tf')
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = MAX_LEN,                              #to test how long we can generate and it be coherent
                              #temperature = .8,
                              top_k = 50, 
                              top_p = 0.85 
                              #num_return_sequences = 5
)
print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_    special_tokens = True)))
    print('')
Output:
----------------------------------------------------------------------------------------------------
0: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
This is the first time a herd of unicorns have been discovered in the Andes Mountains, a vast region stretching from the Himalayas to the Andes River in Bolivia.
According to the BBC, the unicorns were spotted by a small group of researchers on a private expedition, but they were the only ones that came across the bizarre creatures.
It was later learned that these were not the wild unicorns that were spotted in the wild in recent years, but rather a domesticated variety of the species.
Although they do not speak English, they do carry their own unique language, according to the researchers, who have named it "Ungla."
The herd of unicorns, which was discovered by a small group of researchers, is the first of its kind discovered in the Andes Mountains. It is thought that the herd of wild unicorns were introduced to the area hundreds of years ago by a local rancher who was attempting to make a profit from the animals.
Although they do not speak English, they do carry their own unique language, according to the researchers, who have named it "Ungla."
The researchers claim that the unicorns have only been sighted in the Andes Mountains, where they can be seen throughout the mountains of South America.
While the unicorns do not speak English, they do carry their own unique language, according to the researchers, who have named it "Ungla."
Ungla is a highly intelligent, cooperative species with a high level of social and cognitive complexity, and is capable of displaying sophisticated behaviors.
They are a particularly unique species, because they are capable of surviving in extreme conditions for long periods of time and without being fed or watered.
The team believes that the species was probably domesticated in the Andes Mountains, where it could not survive in its natural habitat.
"We can see from the genetics that the animals were probably domesticated in the Andes Mountains where they could not survive in their natural habitat and with water and food sources," said Professor David Catt, from the University of Cambridge, who led the study.
"So these were animals that would have been...

作为对比，这是完整模型的输出：

输出：

0*: 在一项* 震惊的发现中，科学家们在安第斯山脉的一个偏远、以前未被探索的山谷中发现了一群独角兽。更令研究人员惊讶的是，这些独角兽能说一口流利的英语。

“这不仅是一个科学发现，它也是一个非常重要的发现，因为它将使我们能够进一步研究这个 现象，”哥伦比亚国家人类学与历史研究所（INAH）的乔治·利亚马斯博士在一份声明中说道。

“我们之前发现人类曾用人类的声音与动物交流。在这种情况下，动物正在与我们交流。换句话说，这是动物交流领域的一个突破，”利亚马斯补充道……

在另一个例子中，似乎模型作者的担忧是有道理的：GPT-2 确实能够生成假新闻故事。

prompt2 = 'Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.'
input_ids = tokenizer.encode(prompt2, return_tensors='tf')
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = MAX_LEN,                              #to test how long we can generate and it be coherent
                              #temperature = .8,
                              top_k = 50, 
                              top_p = 0.85
                              #num_return_sequences = 5
)
print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_    special_tokens = True)))
    print('')
Output:
----------------------------------------------------------------------------------------------------
0: Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today. In a video captured by one of her friends, the singer is seen grabbing her bag, but then quickly realizing the merchandise she has to leave is too expensive to be worth a $1.99 purchase.
The video has already gone viral, and while the celebrity is certainly guilty of breaking the law (even if she can't be accused of stealing for a second time), there's one aspect of the situation that should make the most sense. It's just like the shopping situation in the movie The Fast and the Furious, where Michael Corleone is caught in possession of counterfeit designer clothing.
This time around, though, the situation involves Cyrus. It's not a copy, per se. It's actually a replica, a pair of a black and white Nike Air Force 1s, a colorway she wore in her music video.
It seems that the actress is caught by a friend who had gotten her a pair of those sneakers when she was in school, so this is no surprise to her. After all, there was a video of her stealing from her own store back in 2012, when she was a freshman at New York University.
It's not that there's anything wrong with the product. If the merchandise is in good shape, that's all that matters. But there are a few things that should come to mind when it comes to these shoes.
For one, the fabric is incredibly thin. The fabric is so thin that the upper actually makes the shoes look like they're made of leather. There's even a thin layer of plastic between the upper and the shoe.
Secondly, the material isn't even a shoe. It's just a piece of leather. It's not actually a leather shoe at all, even though it's made of the same material as the other items on the show. It's just a piece of leather. And it's not the kind of leather that would actually hold up in a fight.
This is something that should be familiar to anyone who's ever shopped at the store. If you go into the store looking for a pair of new Nike Air Force 1s, and the salesperson is just selling you a piece of leather, you're going to get disappointed. That's the nature of these shoes.
In addition to the aforementioned "stolen" footwear, Miley Cyrus...

输出：

0：麦莉·赛勒斯今天在好莱坞大道上被抓到从阿贝克朗比和费奇商店偷东西。有人看到她试穿了三件裙子，然后试图走出商店。

阿贝克朗比是这位明星常去的几家店之一。

这位歌手在今天中午过后被看到走进位于西好莱坞的阿贝克朗比与费奇商店，随后离开了商店。

这位明星目前正在澳大利亚和新西兰进行巡演，参加 2 月 28 日的《X Factor》演出……

那么，像托尔金这样的文学经典，怎么样？

prompt3 = 'Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry'
input_ids = tokenizer.encode(prompt3, return_tensors='tf')
sample_outputs = GPT2.generate(
                              input_ids,
                              do_sample = True, 
                              max_length = MAX_LEN,                              #to test how long we can generate and it be coherent
                              #temperature = .8,
                              top_k = 50, 
                              top_p = 0.85 
                              #num_return_sequences = 5
)
print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}...".format(i, tokenizer.decode(sample_output, skip_    special_tokens = True)))
    print('')
Output:
----------------------------------------------------------------------------------------------------
0: Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry, and they roared their battle cries as they charged the orcs with their spears and arrows. They reached the front of the line, where the enemy were gathered, and they fell upon them with a hail of fire and arrows, slaying many orcs and wounding others. The battle raged on for a long time, and eventually the two sides met and fought for a long time more. The orcs fell and the two armies were victorious. The orcs were killed and the two armies were victorious.
The two armies fought one last time in battle. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a long time more. Gimli slew many of the orcs and led his men to safety. They went to the city and took it. When they returned, Sauron's servants were waiting to kill them. The two armies fought again, and the battle raged on for a...

输出：

0：莱戈拉斯和金雳带着一声刺耳的战吼向兽人进军，举起了他们的武器。

然后，兽人 发动了进攻。

伟大的兽人战争团在战斗声中前进。他们骄傲地佩戴着武器，俯视着他们的敌人。

远处，可以听见兽人低声喊着命令。

但战斗还没有结束。兽人的斧头和战锤猛烈地击打敌军，就像他们是一支万人的军队一样，斧头砍中了兽人，使他们流血。

在这场大屠杀中，精灵领袖阿拉贡大喊道：“来吧，勇士们，让我们与兽人战斗！”

如上面的例子所示，一个未经微调的 GPT-2 模型（即开箱即用）已经能够生成看似合理的长篇文本。评估这种技术未来对传播领域的影响仍然是一个开放且极具争议的问题：一方面，对于假新闻泛滥的恐惧是完全有理由的（见上面的麦莉·赛勒斯故事）。这尤其令人担忧，因为大规模自动检测生成文本是一个极具挑战性的话题。另一方面，GPT-2 的文本生成能力对于创意工作者可能非常有帮助：无论是风格实验还是讽刺，一个由 AI 驱动的写作助手可以提供巨大的帮助。

另见

网络上有多个关于 GPT-2 文本生成的优秀资源：

介绍该模型的原始 OpenAI 文章：

openai.com/blog/better-language-models/
顶级 GPT-2 开源项目：

awesomeopensource.com/projects/gpt-2
Hugging Face 文档：

huggingface.co/blog/how-to-generate

huggingface.co/transformers/model_doc/gpt2.html

情感分析

在本节中，我们将演示如何使用 DistilBERT（BERT 的轻量版本）来处理情感分析中的常见问题。我们将使用 Kaggle 比赛中的数据（www.kaggle.com/c/tweet-sentiment-extraction）：给定一条推文及其情感（积极、中立或消极），参与者需要识别出定义该情感的推文部分。情感分析通常在商业中应用，作为帮助数据分析师评估公众舆论、进行详细市场调研和跟踪客户体验的系统的一部分。一个重要的应用领域是医疗：可以根据患者的交流模式评估不同治疗对其情绪的影响。

我们该如何进行？

和往常一样，我们首先加载必要的包。

import pandas as pd
import re
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
%matplotlib inline
import keras
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Input, Dense, LSTM, GRU, Embedding
from keras.layers import Activation, Bidirectional, GlobalMaxPool1D, GlobalMaxPool2D, Dropout
from keras.models import Model
from keras import initializers, regularizers, constraints, optimizers, layers
from keras.preprocessing import text, sequence
from keras.callbacks import ModelCheckpoint
from keras.callbacks import EarlyStopping
from keras.optimizers import RMSprop, adam
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer,PorterStemmer
import seaborn as sns
import transformers
from transformers import AutoTokenizer
from tokenizers import BertWordPieceTokenizer
from keras.initializers import Constant
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from collections import Counter
stop=set(stopwords.words('english'))
import os

为了简化代码，我们定义了一些帮助函数来清理文本：去除网站链接、星号遮蔽的 NSFW 词汇和表情符号。

def basic_cleaning(text):
    text=re.sub(r'https?://www\.\S+\.com','',text)
    text=re.sub(r'[^A-Za-z|\s]','',text)
    text=re.sub(r'\*+','swear',text) #capture swear words that are **** out
    return text
def remove_html(text):
    html=re.compile(r'<.*?>')
    return html.sub(r'',text)
# Reference : https://gist.github.com/slowkow/7a7f61f495e3dbb7e3d767f97bd7304b
def remove_emoji(text):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]+", flags=re.UNICODE)
    return emoji_pattern.sub(r'', text)
def remove_multiplechars(text):
    text = re.sub(r'(.)\1{3,}',r'\1', text)
    return text
def clean(df):
    for col in ['text']:#,'selected_text']:
        df[col]=df[col].astype(str).apply(lambda x:basic_cleaning(x))
        df[col]=df[col].astype(str).apply(lambda x:remove_emoji(x))
        df[col]=df[col].astype(str).apply(lambda x:remove_html(x))
        df[col]=df[col].astype(str).apply(lambda x:remove_multiplechars(x))
    return df
def fast_encode(texts, tokenizer, chunk_size=256, maxlen=128):    
    tokenizer.enable_truncation(max_length=maxlen)
    tokenizer.enable_padding(max_length=maxlen)
    all_ids = []

    for i in range(0, len(texts), chunk_size):
        text_chunk = texts[i:i+chunk_size].tolist()
        encs = tokenizer.encode_batch(text_chunk)
        all_ids.extend([enc.ids for enc in encs])

    return np.array(all_ids)
def preprocess_news(df,stop=stop,n=1,col='text'):
    '''Function to preprocess and create corpus'''
    new_corpus=[]
    stem=PorterStemmer()
    lem=WordNetLemmatizer()
    for text in df[col]:
        words=[w for w in word_tokenize(text) if (w not in stop)]

        words=[lem.lemmatize(w) for w in words if(len(w)>n)]

        new_corpus.append(words)

    new_corpus=[word for l in new_corpus for word in l]
    return new_corpus

加载数据。

df = pd.read_csv('/kaggle/input/tweet-sentiment-extraction/train.csv')
df.head()

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_10_1.png

图 10.1：推文情感分析数据示例

上面的快照展示了我们将关注分析的一个数据样本：完整文本、关键短语及其相关的情感（积极、消极或中立）。

我们继续进行相对标准的数据预处理：

basic_cleaning – 用于去除网站 URL 和非字符内容，并将*脏话替换为 swear 一词。
remove_html。
remove_emojis。
remove_multiplechars – 该函数用于处理当一个单词中有超过 3 个连续字符时，例如，wayyyyy。该函数会去除其中的多余字母，保留一个。

df.dropna(inplace=True)
df_clean = clean(df)

对于标签，我们进行独热编码，将它们分词，并转换为序列。

df_clean_selection = df_clean.sample(frac=1)
X = df_clean_selection.text.values
y = pd.get_dummies(df_clean_selection.sentiment)
tokenizer = text.Tokenizer(num_words=20000)
tokenizer.fit_on_texts(list(X))
list_tokenized_train = tokenizer.texts_to_sequences(X)
X_t = sequence.pad_sequences(list_tokenized_train, maxlen=128)

DistilBERT 是 BERT 的轻量版本：它的参数比 BERT 少 40%，但性能达到 BERT 的 97%。对于本例，我们主要使用它的分词器和嵌入矩阵。虽然该矩阵是可训练的，但为了减少训练时间，我们不使用这个选项。

tokenizer = transformers.AutoTokenizer.from_pretrained("distilbert-base-uncased")  ## change it to commit
# Save the loaded tokenizer locally
save_path = '/kaggle/working/distilbert_base_uncased/'
if not os.path.exists(save_path):
    os.makedirs(save_path)
tokenizer.save_pretrained(save_path)
# Reload it with the huggingface tokenizers library
fast_tokenizer = BertWordPieceTokenizer('distilbert_base_uncased/vocab.txt', lowercase=True)
fast_tokenizer
X = fast_encode(df_clean_selection.text.astype(str), fast_tokenizer, maxlen=128)
transformer_layer = transformers.TFDistilBertModel.from_pretrained('distilbert-base-uncased')
embedding_size = 128 input_ = Input(shape=(100,)) 
inp = Input(shape=(128, )) 
embedding_matrix=transformer_layer.weights[0].numpy() 
x = Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1],embeddings_initializer=Constant(embedding_matrix),trainable=False)(inp)

我们按常规步骤定义模型。

x = Bidirectional(LSTM(50, return_sequences=True))(x) 
x = Bidirectional(LSTM(25, return_sequences=True))(x) 
x = GlobalMaxPool1D()(x) x = Dropout(0.5)(x) 
x = Dense(50, activation='relu', kernel_regularizer='L1L2')(x) 
x = Dropout(0.5)(x) 
x = Dense(3, activation='softmax')(x) 
model_DistilBert = Model(inputs=[inp], outputs=x)
model_DistilBert.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model_DistilBert.summary()
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 128)               0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 128, 768)          23440896  
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128, 100)          327600    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 128, 50)           25200     
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 50)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 50)                2550      
_________________________________________________________________
dropout_2 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 153       
=================================================================
Total params: 23,796,399
Trainable params: 355,503
Non-trainable params: 23,440,896
_________________________________________________________________

现在我们可以拟合模型了：

model_DistilBert.fit(X,y,batch_size=32,epochs=10,validation_split=0.1)
Train on 24732 samples, validate on 2748 samples
Epoch 1/10
24732/24732 [==============================] - 357s 14ms/step - loss: 1.0516 - accuracy: 0.4328 - val_loss: 0.8719 - val_accuracy: 0.5466
Epoch 2/10
24732/24732 [==============================] - 355s 14ms/step - loss: 0.7733 - accuracy: 0.6604 - val_loss: 0.7032 - val_accuracy: 0.6776
Epoch 3/10
24732/24732 [==============================] - 355s 14ms/step - loss: 0.6668 - accuracy: 0.7299 - val_loss: 0.6407 - val_accuracy: 0.7354
Epoch 4/10
24732/24732 [==============================] - 355s 14ms/step - loss: 0.6310 - accuracy: 0.7461 - val_loss: 0.5925 - val_accuracy: 0.7478
Epoch 5/10
24732/24732 [==============================] - 347s 14ms/step - loss: 0.6070 - accuracy: 0.7565 - val_loss: 0.5817 - val_accuracy: 0.7529
Epoch 6/10
24732/24732 [==============================] - 343s 14ms/step - loss: 0.5922 - accuracy: 0.7635 - val_loss: 0.5817 - val_accuracy: 0.7584
Epoch 7/10
24732/24732 [==============================] - 343s 14ms/step - loss: 0.5733 - accuracy: 0.7707 - val_loss: 0.5922 - val_accuracy: 0.7638
Epoch 8/10
24732/24732 [==============================] - 343s 14ms/step - loss: 0.5547 - accuracy: 0.7832 - val_loss: 0.5767 - val_accuracy: 0.7627
Epoch 9/10
24732/24732 [==============================] - 346s 14ms/step - loss: 0.5350 - accuracy: 0.7870 - val_loss: 0.5767 - val_accuracy: 0.7584
Epoch 10/10
24732/24732 [==============================] - 346s 14ms/step - loss: 0.5219 - accuracy: 0.7955 - val_loss: 0.5994 - val_accuracy: 0.7580

从上述输出可以看出，模型收敛得相当快，并且在验证集上的准确率在 10 次迭代后已经达到了 76%。进一步微调超参数和更长时间的训练可以提高性能，但即使在这个水平，经过训练的模型——例如，通过 TensorFlow Serving——也能为商业应用中的情感分析逻辑提供有价值的补充。

另见

最好的起点是 Hugging Face 的文档：curiousily.com/posts/sentiment-analysis-with-bert-and-hugging-face-using-pytorch-and-python/。

开放领域问题回答

给定一段文本及与该文本相关的问题，问题回答（QA）的概念是确定回答该问题的文本子集。这是应用 Transformer 架构成功的许多任务之一。Transformers 库有多个预训练的 QA 模型，可以在没有数据集进行微调的情况下应用（这是一种零样本学习形式）。

然而，不同的模型在不同的示例上可能会失败，了解原因可能是有益的。在本节中，我们将展示 TensorFlow 2.0 的 GradientTape 功能：它允许我们记录对一组变量进行自动微分的操作。为了解释模型在给定输入上的输出，我们可以：

对输入进行一热编码——与整数令牌（通常在此上下文中使用）不同，一热编码表示是可微分的
实例化 GradientTape 并监视我们的输入变量
计算通过模型的前向传播
获取我们感兴趣的输出的梯度（例如，特定类别的 logit），相对于监视的输入
使用标准化梯度作为解释

本节中的代码改编自 Fast Forward Labs 发布的结果：experiments.fastforwardlabs.com/。

我们如何开始？

import os
import zipfile
import shutil
import urllib.request
import logging
import lzma
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering, TFBertForMaskedLM, TFBertForQuestionAnswering

和往常一样，我们需要一些样板代码：首先编写一个用于获取预训练 QA 模型的函数。

def get_pretrained_squad_model(model_name):

    model, tokenizer = None, None

    if model_name == "distilbertsquad1":        
        tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad",use_fast=True)
        model = TFBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad", from_pt=True)

    elif model_name == "distilbertsquad2": 
        tokenizer = AutoTokenizer.from_pretrained("twmkn9/distilbert-base-uncased-squad2",use_fast=True)
        model = TFAutoModelForQuestionAnswering.from_pretrained("twmkn9/distilbert-base-uncased-squad2", from_pt=True)

    elif model_name == "bertsquad2": 
        tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2",use_fast=True)
        model = TFBertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2", from_pt=True)

    elif model_name == "bertlargesquad2": 
        tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",use_fast=True)
        model = TFBertForQuestionAnswering.from_pretrained("deepset/bert-large-uncased-whole-word-masking-squad2", from_pt=True)

    elif model_name == "albertbasesquad2": 
        tokenizer = AutoTokenizer.from_pretrained("twmkn9/albert-base-v2-squad2",use_fast=True)
        model = TFBertForQuestionAnswering.from_pretrained("twmkn9/albert-base-v2-squad2", from_pt=True)

    elif model_name == "distilrobertasquad2": 
        tokenizer = AutoTokenizer.from_pretrained("twmkn9/distilroberta-base-squad2",use_fast=True)
        model = TFBertForQuestionAnswering.from_pretrained("twmkn9/
distilroberta-base-squad2", from_pt=True)

    elif model_name == "robertasquad2": 
        tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2",use_fast=True)
        model = TFAutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2", from_pt=True)

    elif model_name == "bertlm":

        tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",                                                  use_fast=True)
        model = TFBertForMaskedLM.from_pretrained("bert-base-uncased",                                                   from_pt=True)
    return model, tokenizer

确定答案的范围。

def get_answer_span(question, context, model, tokenizer): 
    inputs = tokenizer.encode_plus(question, context, return_tensors="tf", add_special_tokens=True, max_length=512) 
    answer_start_scores, answer_end_scores = model(inputs)  
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0] 
    answer_end = (tf.argmax(answer_end_scores, axis=1) + 1).numpy()[0]  
    print(tokenizer.convert_tokens_to_string(inputs["input_ids"][0][answer_start:answer_end]))
    return answer_start, answer_end

我们需要一些用于数据准备的函数。

def clean_tokens(gradients, tokens, token_types):

    """
      Clean the tokens and gradients gradients
      Remove "[CLS]","[CLR]", "[SEP]" tokens
      Reduce (mean) gradients values for tokens that are split ##
    """

    token_holder = []
    token_type_holder = []    
    gradient_holder = []     
    i = 0

    while i < len(tokens):
        if (tokens[i] not in ["[CLS]","[CLR]", "[SEP]"]):
            token = tokens[i]              
            conn = gradients[i]               
            token_type = token_types[i]

            if i < len(tokens)-1 :
                if tokens[i+1][0:2] == "##":
                    token = tokens[i]
                    conn = gradients[i]  
                    j = 1
                    while i < len(tokens)-1 and tokens[i+1][0:2] == "##":                        
                        i +=1 
                        token += tokens[i][2:]
                        conn += gradients[i]   
                        j+=1
                    conn = conn /j 
            token_holder.append(token)
            token_type_holder.append(token_type)
            gradient_holder.append(conn)

        i +=1

    return  gradient_holder,token_holder, token_type_holder
def get_best_start_end_position(start_scores, end_scores):

    answer_start = tf.argmax(start_scores, axis=1).numpy()[0] 
    answer_end = (tf.argmax(end_scores, axis=1) + 1).numpy()[0] 
    return answer_start, answer_end
def get_correct_span_mask(correct_index, token_size):

    span_mask = np.zeros((1, token_size))
    span_mask[0, correct_index] = 1
    span_mask = tf.constant(span_mask, dtype='float32')

    return span_mask 

def get_embedding_matrix(model):

    if "DistilBert" in type(model).__name__:
        return model.distilbert.embeddings.word_embeddings
    else:
        return model.bert.embeddings.word_embeddings
def get_gradient(question, context, model, tokenizer): 

    """Return gradient of input (question) wrt to model output span prediction 
      Args:
          question (str): text of input question
          context (str): text of question context/passage
          model (QA model): Hugging Face BERT model for QA transformers.modeling_tf_distilbert.TFDistilBertForQuestionAnswering, transformers.modeling_tf_bert.TFBertForQuestionAnswering
          tokenizer (tokenizer): transformers.tokenization_bert.BertTokenizerFast 
      Returns:
            (tuple): (gradients, token_words, token_types, answer_text)
    """
    embedding_matrix = get_embedding_matrix(model)  
    encoded_tokens =  tokenizer.encode_plus(question, context, add_special_tokens=True, return_token_type_ids=True, return_tensors="tf")
    token_ids = list(encoded_tokens["input_ids"].numpy()[0])
    vocab_size = embedding_matrix.get_shape()[0]
    # convert token ids to one hot. We can't differentiate wrt to int token ids hence the need for one hot representation
    token_ids_tensor = tf.constant([token_ids], dtype='int32')
    token_ids_tensor_one_hot = tf.one_hot(token_ids_tensor, vocab_size) 

    with tf.GradientTape(watch_accessed_variables=False) as tape:

        # (i) watch input variable
        tape.watch(token_ids_tensor_one_hot)

        # multiply input model embedding matrix; allows us do backprop wrt one hot input 
        inputs_embeds = tf.matmul(token_ids_tensor_one_hot,embedding_matrix)  
        # (ii) get prediction
        start_scores,end_scores = model({"inputs_embeds": inputs_embeds, "token_type_ids": encoded_tokens["token_type_ids"], "attention_mask": encoded_tokens["attention_mask"] })
        answer_start, answer_end = get_best_start_end_position(start_scores, end_scores)
        start_output_mask = get_correct_span_mask(answer_start, len(token_ids))
        end_output_mask = get_correct_span_mask(answer_end, len(token_ids))
        # zero out all predictions outside of the correct span positions; we want to get gradients wrt to just these positions
        predict_correct_start_token = tf.reduce_sum(start_scores *                                                     start_output_mask)
        predict_correct_end_token = tf.reduce_sum(end_scores *                                                   end_output_mask) 
        # (iii) get gradient of input with respect to both start and end output
        gradient_non_normalized = tf.norm(
            tape.gradient([predict_correct_start_token, predict_correct_end_token], token_ids_tensor_one_hot),axis=2)
        # (iv) normalize gradient scores and return them as "explanations"
        gradient_tensor = (
            gradient_non_normalized /
            tf.reduce_max(gradient_non_normalized)
        )
        gradients = gradient_tensor[0].numpy().tolist()
        token_words = tokenizer.convert_ids_to_tokens(token_ids) 
        token_types = list(encoded_tokens["token_type_ids"].numpy()[0])
        answer_text = tokenizer.decode(token_ids[answer_start:answer_end])
        return  gradients,  token_words, token_types,answer_text
def explain_model(question, context, model, tokenizer, explain_method = "gradient"):    
    if explain_method == "gradient":        
        return get_gradient(question, context, model, tokenizer)

最后进行绘图：

def plot_gradients(tokens, token_types, gradients, title): 

    """ Plot  explanations
    """
    plt.figure(figsize=(21,3)) 
    xvals = [ x + str(i) for i,x in enumerate(tokens)]
    colors =  [ (0,0,1, c) for c,t in zip(gradients, token_types) ]
    edgecolors = [ "black" if t==0 else (0,0,1, c)  for c,t in zip(gradients, token_types) ]
    # colors =  [  ("r" if t==0 else "b")  for c,t in zip(gradients, token_types) ]    
    plt.tick_params(axis='both', which='minor', labelsize=29)    
    p = plt.bar(xvals, gradients, color=colors, linewidth=1, edgecolor=edgecolors)    
    plt.title(title)     
    p=plt.xticks(ticks=[i for i in range(len(tokens))], labels=tokens, fontsize=12,rotation=90)

我们将比较一小部分模型在不同问题上的表现。

questions = [
    { "question": "what is the goal of the fourth amendment?  ", "context": "The Fourth Amendment of the U.S. Constitution provides that '[t]he right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.'The ultimate goal of this provision is to protect people's right to privacy and freedom from unreasonable intrusions by the government. However, the Fourth Amendment does not guarantee protection from all searches and seizures, but only those done by the government and deemed unreasonable under the law." },
    { "question": ""what is the taj mahal made of?", "context": "The Taj Mahal is an ivory-white marble mausoleum on the southern bank of the river Yamuna in the Indian city of Agra. It was commissioned in 1632 by the Mughal emperor Shah Jahan (reigned from 1628 to 1658) to house the tomb of his favourite wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself. The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall. Construction of the mausoleum was essentially completed in 1643, but work continued on other phases of the project for another 10 years. The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around 32 million rupees, which in 2020 would be approximately 70 billion rupees (about U.S. $916 million). The construction project employed some 20,000 artisans under the guidance of a board of architects led by the court architect to the emperor. The Taj Mahal was designated as a UNESCO World Heritage Site in 1983 for being the jewel of Muslim art in India and one of the universally admired masterpieces of the world's heritage. It is regarded by many as the best example of Mughal architecture and a symbol of India's rich history. The Taj Mahal attracts 7–8 million visitors a year and in 2007, it was declared a winner of the New 7 Wonders of the World (2000–2007) initiative." },
    { "question": "Who ruled macedonia ", "context": "Macedonia was an ancient kingdom on the periphery of Archaic and Classical Greece, and later the dominant state of Hellenistic Greece. The kingdom was founded and initially ruled by the Argead dynasty, followed by the Antipatrid and Antigonid dynasties. Home to the ancient Macedonians, it originated on the northeastern part of the Greek peninsula. Before the 4th century BC, it was a small kingdom outside of the area dominated by the city-states of Athens, Sparta and Thebes, and briefly subordinate to Achaemenid Persia" },
    { "question": "what are the symptoms of COVID-19", "context": "COVID-19 is the infectious disease caused by the most recently discovered coronavirus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019\. The most common symptoms of COVID-19 are fever, tiredness, and dry cough. Some patients may have aches and pains, nasal congestion, runny nose, sore throat or diarrhea. These symptoms are usually mild and begin gradually. Some people become infected but don't develop any symptoms and don't feel unwell. Most people (about 80%) recover from the disease without needing special treatment. Around 1 out of every 6 people who gets COVID-19 becomes seriously ill and develops difficulty breathing. Older people, and those with underlying medical problems like high blood pressure, heart problems or diabetes, are more likely to develop serious illness. People with fever, cough and difficulty breathing should seek medical attention." },
]
model_names = ["distilbertsquad1","distilbertsquad2","bertsquad2","bertlargesquad2"]
result_holder = []
for model_name in model_names:
    bqa_model, bqa_tokenizer = get_pretrained_squad_model(model_name)

    for row in questions:

        start_time = time.time() 
        question, context = row["question"], row["context"] 
        gradients, tokens, token_types, answer  = explain_model(question, context, bqa_model, bqa_tokenizer) 
        elapsed_time = time.time() - start_time
        result_holder.append({"question": question,  "context":context, "answer": answer, "model": model_name, "runtime": elapsed_time})
result_df = pd.DataFrame(result_holder)

格式化结果以便于检查。

question_df = result_df[result_df["model"] == "bertsquad2"].reset_index()[["question"]]
df_list = [question_df]
for model_name in model_names:

    sub_df = result_df[result_df["model"] == model_name].reset_index()[["answer", "runtime"]]
    sub_df.columns = [ (col_name + "_" + model_name)  for col_name in                                                             sub_df.columns]
    df_list.append(sub_df)

jdf = pd.concat(df_list, axis=1)
answer_cols = ["question"] + [col for col in jdf.columns if 'answer' in col]
jdf[answer_cols]

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_10_2.png

图 10.2：展示不同模型生成的答案的样本记录

从结果数据中我们可以观察到，即使在这个样本数据集上，模型之间也存在明显的差异：

DistilBERT（SQUAD1）能回答 5/8 个问题，正确 2 个
DistilBERT（SQUAD2）能回答 7/8 个问题，正确 7 个
BERT base 能回答 5/8 个问题，正确 5 个
BERT large 能回答 7/8 个问题，正确 7 个

runtime_cols = [col for col in jdf.columns if 'runtime' in col] 
mean_runtime = jdf[runtime_cols].mean()
print("Mean runtime per model across 4 question/context pairs")
print(mean_runtime)
Mean runtime per model across 4 question/context pairs
runtime_distilbertsquad1    0.202405
runtime_distilbertsquad2    0.100577
runtime_bertsquad2          0.266057
runtime_bertlargesquad2     0.386156
dtype: float64

基于上述结果，我们可以对基于 BERT 的问答模型的工作原理获得一些洞见：

在 BERT 模型无法生成答案的情况下（例如，它只给出 CLS 标记），几乎没有输入的标记具有高的归一化梯度分数。这表明在使用的指标方面还有改进的空间——可以超越解释分数，并可能将其与模型置信度分数结合，以获得更完整的情境概览。
分析基础版和大版本 BERT 模型的性能差异表明，应该进一步研究性能提升与推理时间延长之间的权衡（更好的性能对比更长的推理时间）。
考虑到我们在选择评估数据集时可能存在的问题，一个可能的结论是，DistilBERT（在 SQuAD2 上训练）比基础 BERT 表现更好——这凸显了使用 SQuAD1 作为基准存在的问题。

第十一章：使用 TensorFlow 和 TF-Agents 进行强化学习

TF-Agents 是一个用于 强化学习（RL）的 TensorFlow（TF）库。通过提供多个模块化组件，TF-Agents 使得各种算法的设计和实现变得更加简单，这些组件对应 RL 问题的核心部分：

一个智能体在环境中操作，并通过处理每次选择动作时收到的信号来进行学习。在 TF-Agents 中，环境通常用 Python 实现，并用 TF 包装器封装，以便高效并行化。
策略将环境中的观测映射为动作的分布。
驱动器 在环境中执行策略，经过指定的步数（也叫回合）。
回放缓冲区 用于存储在环境中执行策略的经验（智能体在动作空间中的轨迹，以及相关的奖励）；训练时，会查询缓冲区中的一部分轨迹。

基本思路是将我们讨论的每个问题转化为一个 RL 问题，然后将各个组件映射到 TF-Agents 对应的部分。在本章中，我们将展示如何使用 TF-Agents 来解决一些简单的 RL 问题：

GridWorld 问题
OpenAI Gym 环境
多臂强盗问题用于内容个性化

展示 TF-Agents 中强化学习能力的最佳方式是通过一个玩具问题：GridWorld 是一个很好的选择，因为它具有直观的几何结构和易于理解的动作，尽管如此，它仍然是一个合适的目标，我们可以研究智能体为达成目标而采取的最佳路径。

GridWorld

本节中的代码改编自 github.com/sachag678。

我们首先展示在 GridWorld 环境中基本的 TF-Agents 功能。RL 问题最好在游戏（我们有一套明确的规则和完全可观察的环境）或像 GridWorld 这样的玩具问题中研究。一旦基本概念在一个简化但不简单的环境中得到了清晰的定义，我们就可以转向逐步更具挑战性的情境。

第一步是定义一个 GridWorld 环境：这是一个 6x6 的方形棋盘，智能体从 (0,0) 开始，终点在 (5,5)，智能体的目标是找到从起点到终点的路径。可能的动作包括上/下/左/右移动。如果智能体到达终点，它将获得 100 的奖励，并且如果智能体在 100 步内没有到达终点，游戏将结束。这里提供了一个 GridWorld “地图”的示例：

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_11_01.png

图 11.1：GridWorld “地图”

现在我们理解了要处理的内容，让我们构建一个模型，从 (0,0) 找到通向 (5,5) 的路径。

我们该如何进行？

和往常一样，我们首先加载必要的库：

import tensorflow as tf
import numpy as np
from tf_agents.environments import py_environment, tf_environment, tf_py_environment, utils, wrappers, suite_gym
from tf_agents.specs import array_spec
from tf_agents.trajectories import trajectory,time_step as ts
from tf_agents.agents.dqn import dqn_agent
from tf_agents.networks import q_network
from tf_agents.drivers import dynamic_step_driver
from tf_agents.metrics import tf_metrics, py_metrics
from tf_agents.policies import random_tf_policy
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.utils import common
from tf_agents.drivers import py_driver, dynamic_episode_driver
from tf_agents.utils import common
import matplotlib.pyplot as plt

TF-Agents 是一个积极开发中的库，因此尽管我们尽力保持代码更新，但在你运行这段代码时，某些导入可能需要修改。

一个关键步骤是定义智能体将要操作的环境。通过继承PyEnvironment类，我们指定init方法（动作和观察定义）、重置/终止状态的条件以及移动机制：

class GridWorldEnv(py_environment.PyEnvironment):
# the _init_ contains the specifications for action and observation
    def __init__(self):
        self._action_spec = array_spec.BoundedArraySpec(
            shape=(), dtype=np.int32, minimum=0, maximum=3, name='action')
        self._observation_spec = array_spec.BoundedArraySpec(
            shape=(4,), dtype=np.int32, minimum=[0,0,0,0],                            maximum=[5,5,5,5], name='observation')
        self._state=[0,0,5,5] #represent the (row, col, frow, fcol) of the player and the finish
        self._episode_ended = False
    def action_spec(self):
        return self._action_spec
    def observation_spec(self):
        return self._observation_spec
# once the same is over, we reset the state
    def _reset(self):
        self._state=[0,0,5,5]
        self._episode_ended = False
        return ts.restart(np.array(self._state, dtype=np.int32))
# the _step function handles the state transition by applying an action to the current state to obtain a new one
    def _step(self, action):
        if self._episode_ended:
            return self.reset()
        self.move(action)
        if self.game_over():
            self._episode_ended = True
        if self._episode_ended:
            if self.game_over():
                reward = 100
            else:
                reward = 0
            return ts.termination(np.array(self._state, dtype=np.int32),             reward)
        else:
            return ts.transition(
                np.array(self._state, dtype=np.int32), reward=0,                 discount=0.9)
    def move(self, action):
        row, col, frow, fcol = self._state[0],self._state[1],self._        state[2],self._state[3]
        if action == 0: #down
            if row - 1 >= 0:
                self._state[0] -= 1
        if action == 1: #up
            if row + 1 < 6:
                self._state[0] += 1
        if action == 2: #left
            if col - 1 >= 0:
                self._state[1] -= 1
        if action == 3: #right
            if col + 1 < 6:
                self._state[1] += 1
    def game_over(self):
        row, col, frow, fcol = self._state[0],self._state[1],self._        state[2],self._state[3]
        return row==frow and col==fcol
def compute_avg_return(environment, policy, num_episodes=10):
    total_return = 0.0
    for _ in range(num_episodes):
        time_step = environment.reset()
        episode_return = 0.0
        while not time_step.is_last():
            action_step = policy.action(time_step)
            time_step = environment.step(action_step.action)
            episode_return += time_step.reward
            total_return += episode_return
    avg_return = total_return / num_episodes
    return avg_return.numpy()[0]
def collect_step(environment, policy):
    time_step = environment.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = environment.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)
    # Add trajectory to the replay buffer
    replay_buffer.add_batch(traj)

我们有以下初步设置：

# parameter settings
num_iterations = 10000  
initial_collect_steps = 1000  
collect_steps_per_iteration = 1  
replay_buffer_capacity = 100000  
fc_layer_params = (100,)
batch_size = 128 # 
learning_rate = 1e-5  
log_interval = 200  
num_eval_episodes = 2  
eval_interval = 1000

我们首先创建环境并将其封装，以确保它们在 100 步后终止：

train_py_env = wrappers.TimeLimit(GridWorldEnv(), duration=100)
eval_py_env = wrappers.TimeLimit(GridWorldEnv(), duration=100)
train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

对于这个任务，我们将使用深度 Q 网络（DQN）智能体。这意味着我们需要首先定义网络及其关联的优化器：

q_net = q_network.QNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)

如上所示，TF-Agents 库正在积极开发中。目前版本适用于 TF > 2.3，但它最初是为 TensorFlow 1.x 编写的。此改编中使用的代码是基于先前版本开发的，因此为了兼容性，我们需要使用一些不太优雅的解决方法，例如以下代码：

train_step_counter = tf.compat.v2.Variable(0)

定义智能体：

tf_agent = dqn_agent.DqnAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        q_network=q_net,
        optimizer=optimizer,
        td_errors_loss_fn = common.element_wise_squared_loss,
        train_step_counter=train_step_counter)
tf_agent.initialize()
eval_policy = tf_agent.policy
collect_policy = tf_agent.collect_policy

接下来的步骤是创建回放缓冲区和回放观察者。前者用于存储训练用的（动作，观察）对：

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
        data_spec = tf_agent.collect_data_spec,
        batch_size = train_env.batch_size,
        max_length = replay_buffer_capacity)
print("Batch Size: {}".format(train_env.batch_size))
replay_observer = [replay_buffer.add_batch]
train_metrics = [
            tf_metrics.NumberOfEpisodes(),
            tf_metrics.EnvironmentSteps(),
            tf_metrics.AverageReturnMetric(),
            tf_metrics.AverageEpisodeLengthMetric(),
]

然后我们从回放缓冲区创建数据集，以便可以进行迭代：

dataset = replay_buffer.as_dataset(
            num_parallel_calls=3,
            sample_batch_size=batch_size,
    num_steps=2).prefetch(3)

最后的准备工作是创建一个驱动程序，模拟游戏中的智能体，并将（状态，动作，奖励）元组存储在回放缓冲区中，同时还需要存储若干个度量：

driver = dynamic_step_driver.DynamicStepDriver(
            train_env,
            collect_policy,
            observers=replay_observer + train_metrics,
    num_steps=1)
iterator = iter(dataset)
print(compute_avg_return(eval_env, tf_agent.policy, num_eval_episodes))
tf_agent.train = common.function(tf_agent.train)
tf_agent.train_step_counter.assign(0)
final_time_step, policy_state = driver.run()

完成准备工作后，我们可以运行驱动程序，从数据集中获取经验，并用它来训练智能体。为了监控/记录，我们会在特定间隔打印损失和平均回报：

episode_len = []
step_len = []
for i in range(num_iterations):
    final_time_step, _ = driver.run(final_time_step, policy_state)
    experience, _ = next(iterator)
    train_loss = tf_agent.train(experience=experience)
    step = tf_agent.train_step_counter.numpy()
    if step % log_interval == 0:
        print('step = {0}: loss = {1}'.format(step, train_loss.loss))
        episode_len.append(train_metrics[3].result().numpy())
        step_len.append(step)
        print('Average episode length: {}'.format(train_metrics[3].                                                  result().numpy()))
    if step % eval_interval == 0:
        avg_return = compute_avg_return(eval_env, tf_agent.policy,                                        num_eval_episodes)
        print('step = {0}: Average Return = {1}'.format(step, avg_return))

一旦代码成功执行，你应该看到类似以下的输出：

step = 200: loss = 0.27092617750167847 Average episode length: 96.5999984741211 step = 400: loss = 0.08925052732229233 Average episode length: 96.5999984741211 step = 600: loss = 0.04888586699962616 Average episode length: 96.5999984741211 step = 800: loss = 0.04527277499437332 Average episode length: 96.5999984741211 step = 1000: loss = 0.04451741278171539 Average episode length: 97.5999984741211 step = 1000: Average Return = 0.0 step = 1200: loss = 0.02019939199090004 Average episode length: 97.5999984741211 step = 1400: loss = 0.02462056837975979 Average episode length: 97.5999984741211 step = 1600: loss = 0.013112186454236507 Average episode length: 97.5999984741211 step = 1800: loss = 0.004257255233824253 Average episode length: 97.5999984741211 step = 2000: loss = 78.85380554199219 Average episode length: 100.0 step = 2000:
Average Return = 0.0 step = 2200: loss = 0.010012316517531872 Average episode length: 100.0 step = 2400: loss = 0.009675763547420502 Average episode length: 100.0 step = 2600: loss = 0.00445540901273489 Average episode length: 100.0 step = 2800: loss = 0.0006154756410978734

尽管训练过程的输出很详细，但并不适合人类阅读。不过，我们可以通过可视化来观察智能体的进展：

plt.plot(step_len, episode_len)
plt.xlabel('Episodes')
plt.ylabel('Average Episode Length (Steps)')
plt.show()

这将为我们提供以下图表：

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_11_02.png

图 11.2：每集平均长度与集数的关系

图表展示了我们模型的进展：在前 4000 集后，平均每集时长出现大幅下降，表明我们的智能体在达到最终目标时所需时间越来越少。

另见

有关自定义环境的文档可以参考www.tensorflow.org/agents/tutorials/2_environments_tutorial。

强化学习（RL）是一个庞大的领域，甚至是一个简单的介绍也超出了本书的范围。但对于那些有兴趣了解更多的人，最好的推荐是经典的 Sutton 和 Barto 书籍：incompleteideas.net/book/the-book.html

倾斜摆杆（CartPole）

在本节中，我们将使用 Open AI Gym，这是一个包含可以通过强化学习方法解决的非平凡基础问题的环境集。我们将使用 CartPole 环境。智能体的目标是学习如何在移动的小车上保持一根杆子平衡，可能的动作包括向左或向右移动：

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_11_03.png

图 11.3：CartPole 环境，黑色小车平衡着一根长杆

现在我们了解了环境，接下来让我们构建一个模型来平衡一个杆子。

我们该如何进行呢？

我们首先安装一些前提条件并导入必要的库。安装部分主要是为了确保我们能够生成训练智能体表现的可视化效果：

!sudo apt-get install -y xvfb ffmpeg
!pip install gym
!pip install 'imageio==2.4.0'
!pip install PILLOW
!pip install pyglet
!pip install pyvirtualdisplay
!pip install tf-agents
from __future__ import absolute_import, division, print_function
import base64
import imageio
import IPython
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image
import pyvirtualdisplay
import tensorflow as tf
from tf_agents.agents.dqn import dqn_agent
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.networks import q_network
from tf_agents.policies import random_tf_policy
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.trajectories import trajectory
from tf_agents.utils import common
tf.compat.v1.enable_v2_behavior()
# Set up a virtual display for rendering OpenAI gym environments.
display = pyvirtualdisplay.Display(visible=0, size=(1400, 900)).start()

和之前一样，我们定义了一些我们玩具问题的超参数：

num_iterations = 20000 
initial_collect_steps = 100  
collect_steps_per_iteration = 1  
replay_buffer_max_length = 100000  
# parameters of the neural network underlying at the core of an agent
batch_size = 64  
learning_rate = 1e-3  
log_interval = 200  
num_eval_episodes = 10  
eval_interval = 1000

接下来，我们继续定义我们问题的函数。首先计算一个策略在环境中固定时间段内的平均回报（以回合数为衡量标准）：

def compute_avg_return(environment, policy, num_episodes=10):
  total_return = 0.0
  for _ in range(num_episodes):
    time_step = environment.reset()
    episode_return = 0.0
    while not time_step.is_last():
      action_step = policy.action(time_step)
      time_step = environment.step(action_step.action)
      episode_return += time_step.reward
    total_return += episode_return
  avg_return = total_return / num_episodes
  return avg_return.numpy()[0]

收集单个步骤及相关数据聚合的样板代码如下：

def collect_step(environment, policy, buffer):
  time_step = environment.current_time_step()
  action_step = policy.action(time_step)
  next_time_step = environment.step(action_step.action)
  traj = trajectory.from_transition(time_step, action_step, next_time_step)
  # Add trajectory to the replay buffer
  buffer.add_batch(traj)
def collect_data(env, policy, buffer, steps):
  for _ in range(steps):
    collect_step(env, policy, buffer)

如果一张图片值千言万语，那么视频一定更好。为了可视化我们智能体的表现，我们需要一个渲染实际动画的函数：

def embed_mp4(filename):
  """Embeds an mp4 file in the notebook."""
  video = open(filename,'rb').read()
  b64 = base64.b64encode(video)
  tag = '''
  <video width="640" height="480" controls>
    <source src="img/mp4;base64,{0}" type="video/mp4">
  Your browser does not support the video tag.
  </video>'''.format(b64.decode())
  return IPython.display.HTML(tag)
def create_policy_eval_video(policy, filename, num_episodes=5, fps=30):
  filename = filename + ".mp4"
  with imageio.get_writer(filename, fps=fps) as video:
    for _ in range(num_episodes):
      time_step = eval_env.reset()
      video.append_data(eval_py_env.render())
      while not time_step.is_last():
        action_step = policy.action(time_step)
        time_step = eval_env.step(action_step.action)
        video.append_data(eval_py_env.render())
  return embed_mp4(filename)

在初步工作完成后，我们可以开始真正地设置我们的环境：

env_name = 'CartPole-v0'
env = suite_gym.load(env_name)
env.reset()

在 CartPole 环境中，适用以下内容：

一个观察是一个包含四个浮动数值的数组：
- 小车的位置和速度
- 杆子的角位置和速度
奖励是一个标量浮动值
一个动作是一个标量整数，只有两个可能的值：
- 0 — “向左移动”
- 1 — “向右移动”

和之前一样，分开训练和评估环境，并应用包装器：

train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)
train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

定义构成我们智能体学习算法基础的网络：一个神经网络，根据环境的观察作为输入，预测所有动作的预期回报（通常在强化学习文献中称为 Q 值）：

fc_layer_params = (100,)
q_net = q_network.QNetwork(
    train_env.observation_spec(),
    train_env.action_spec(),
    fc_layer_params=fc_layer_params)
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
train_step_counter = tf.Variable(0)

有了这个，我们可以实例化一个 DQN 智能体：

agent = dqn_agent.DqnAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    q_network=q_net,
    optimizer=optimizer,
    td_errors_loss_fn=common.element_wise_squared_loss,
    train_step_counter=train_step_counter)
agent.initialize()

设置策略——主要用于评估和部署的策略，以及用于数据收集的次要策略：

eval_policy = agent.policy
collect_policy = agent.collect_policy

为了进行一个不算很复杂的比较，我们还将创建一个随机策略（顾名思义，它是随机执行的）。然而，这展示了一个重要的观点：策略可以独立于智能体创建：

random_policy = random_tf_policy.RandomTFPolicy(train_env.time_step_spec(), train_env.action_spec())

为了从策略中获得一个动作，我们调用 policy.action(time_step) 方法。time_step 包含来自环境的观察。这个方法返回一个策略步骤，这是一个包含三个组件的命名元组：

动作：要执行的动作（向左移动或向右移动）
状态：用于有状态（基于 RNN）的策略
信息：辅助数据，例如动作的对数概率：

example_environment = tf_py_environment.TFPyEnvironment(
    suite_gym.load('CartPole-v0'))
time_step = example_environment.reset()

重放缓冲区跟踪从环境中收集的数据，这些数据用于训练：

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
    data_spec=agent.collect_data_spec,
    batch_size=train_env.batch_size,
    max_length=replay_buffer_max_length)

对于大多数代理，collect_data_spec 是一个命名元组，称为轨迹（Trajectory），它包含关于观察、动作、奖励以及其他项的规格。

我们现在利用随机策略来探索环境：

collect_data(train_env, random_policy, replay_buffer, initial_collect_steps)

现在，代理可以通过管道访问重放缓冲区。由于我们的 DQN 代理需要当前观察和下一次观察来计算损失，因此管道一次会采样两行相邻数据（num_steps = 2）：

dataset = replay_buffer.as_dataset(
    num_parallel_calls=3, 
    sample_batch_size=batch_size, 
    num_steps=2).prefetch(3)
iterator = iter(dataset)

在训练部分，我们在两个步骤之间切换：从环境中收集数据并用它来训练 DQN：

agent.train = common.function(agent.train)
# Reset the train step
agent.train_step_counter.assign(0)
# Evaluate the agent's policy once before training.
avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
returns = [avg_return]
for _ in range(num_iterations):
  # Collect a few steps using collect_policy and save to the replay buffer.
  collect_data(train_env, agent.collect_policy, replay_buffer, collect_steps_per_iteration)
  # Sample a batch of data from the buffer and update the agent's network.
  experience, unused_info = next(iterator)
  train_loss = agent.train(experience).loss
  step = agent.train_step_counter.numpy()
  if step % log_interval == 0:
    print('step = {0}: loss = {1}'.format(step, train_loss))
  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
    print('step = {0}: Average Return = {1}'.format(step, avg_return))
    returns.append(avg_return)

代码块的（部分）输出如下。快速回顾一下，step 是训练过程中的迭代次数，loss 是深度网络中驱动代理逻辑的损失函数值，Average Return 是当前运行结束时的奖励：

step = 200: loss = 4.396056175231934
step = 400: loss = 7.12950325012207
step = 600: loss = 19.0213623046875
step = 800: loss = 45.954856872558594
step = 1000: loss = 35.900394439697266
step = 1000: Average Return = 21.399999618530273
step = 1200: loss = 60.97482681274414
step = 1400: loss = 8.678962707519531
step = 1600: loss = 13.465248107910156
step = 1800: loss = 42.33995056152344
step = 2000: loss = 42.936370849609375
step = 2000: Average Return = 21.799999237060547

每次迭代包含 200 个时间步，保持杆子竖立会得到 1 分奖励，因此每一轮的最大奖励为 200：

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_11_04.png

图 11.4：每次迭代的平均回报

从前面的图表中可以看出，代理大约需要 1 万次迭代才能发现一个成功的策略（其中有一些波动，正如奖励曲线中的 U 形模式所示）。之后，奖励趋于稳定，算法能够每次成功完成任务。

我们还可以通过视频观察代理的表现。关于随机策略，您可以尝试以下操作：

create_policy_eval_video(random_policy, "random-agent")

而关于已经训练的策略，您可以尝试以下操作：

create_policy_eval_video(agent.policy, "trained-agent")

另请参阅

Open AI Gym 环境文档可以在gym.openai.com/找到。

MAB

在概率论中，多臂赌博机（MAB）问题指的是一种情境，其中有限的资源必须在多个竞争的选择之间分配，以最大化某种形式的长期目标。这个名称来源于用于制定模型第一版的类比。假设我们有一个赌徒，面前是一排老虎机，他必须决定选择哪些老虎机进行游戏，玩多少次，以及以什么顺序进行。在强化学习（RL）中，我们将其表述为一个代理（agent），该代理希望平衡探索（获得新知识）和开发（基于已经获得的经验优化决策）。这种平衡的目标是在一段时间内最大化总奖励。

MAB 是一个简化的强化学习问题：代理采取的动作不会影响环境的后续状态。这意味着不需要建模状态转移，不需要为过去的动作分配奖励，也不需要提前规划以到达奖励状态。MAB 代理的目标是确定一个策略，使得随时间推移能够最大化累积奖励。

主要挑战是有效应对探索与利用的难题：如果我们总是尝试利用期望奖励最高的动作，就有可能错过那些通过更多探索可以发现的更好的动作。

本示例中使用的设置来源于 Vowpal Wabbit 教程，网址为vowpalwabbit.org/tutorials/cb_simulation.html。

在本节中，我们将模拟个性化在线内容的问题：Tom 和 Anna 在一天中的不同时间访问网站并查看一篇文章。Tom 早上喜欢政治，下午喜欢音乐，而 Anna 早上喜欢体育或政治，下午喜欢政治。将这个问题用多臂赌博机（MAB）术语表述，这意味着：

上下文是一个包含{用户，时间段}的对。
可能的动作是新闻主题{政治、体育、音乐、食物}。
如果用户在此时看到他们感兴趣的内容，则奖励为 1，否则为 0。

目标是最大化通过用户的点击率（CTR）衡量的奖励。

我们该怎么做呢？

像往常一样，我们首先加载必要的包：

!pip install tf-agents
import abc
import numpy as np
import tensorflow as tf
from tf_agents.agents import tf_agent
from tf_agents.drivers import driver
from tf_agents.environments import py_environment
from tf_agents.environments import tf_environment
from tf_agents.environments import tf_py_environment
from tf_agents.policies import tf_policy
from tf_agents.specs import array_spec
from tf_agents.specs import tensor_spec
from tf_agents.trajectories import time_step as ts
from tf_agents.trajectories import trajectory
from tf_agents.trajectories import policy_step
tf.compat.v1.reset_default_graph()
tf.compat.v1.enable_resource_variables()
tf.compat.v1.enable_v2_behavior()
nest = tf.compat.v2.nest
from tf_agents.bandits.agents import lin_ucb_agent
from tf_agents.bandits.environments import stationary_stochastic_py_environment as sspe
from tf_agents.bandits.metrics import tf_metrics
from tf_agents.drivers import dynamic_step_driver
from tf_agents.replay_buffers import tf_uniform_replay_buffer
import matplotlib.pyplot as plt

我们接下来定义一些超参数，这些参数将在后续使用：

batch_size = 2
num_iterations = 100 
steps_per_loop = 1

我们需要的第一个函数是一个上下文采样器，用于生成来自环境的观察值。由于我们有两个用户和一天中的两个时间段，实际上是生成两元素二进制向量：

def context_sampling_fn(batch_size):
  def _context_sampling_fn():
    return np.random.randint(0, 2, [batch_size, 2]).astype(np.float32)
  return _context_sampling_fn

接下来，我们定义一个通用函数来计算每个臂的奖励：

class CalculateReward(object):

    """A class that acts as linear reward function when called."""
    def __init__(self, theta, sigma):
        self.theta = theta
        self.sigma = sigma
    def __call__(self, x):
        mu = np.dot(x, self.theta)
        #return np.random.normal(mu, self.sigma)
        return (mu > 0) + 0

我们可以使用该函数来定义每个臂的奖励。这些奖励反映了在本食谱开头描述的偏好集：

arm0_param = [2, -1]
arm1_param = [1, -1] 
arm2_param = [-1, 1] 
arm3_param = [ 0, 0] 
arm0_reward_fn = CalculateReward(arm0_param, 1)
arm1_reward_fn = CalculateReward(arm1_param, 1)
arm2_reward_fn = CalculateReward(arm2_param, 1)
arm3_reward_fn = CalculateReward(arm3_param, 1)

我们函数设置的最后一部分涉及计算给定上下文的最优奖励：

def compute_optimal_reward(observation):
    expected_reward_for_arms = [
      tf.linalg.matvec(observation, tf.cast(arm0_param, dtype=tf.float32)),
      tf.linalg.matvec(observation, tf.cast(arm1_param, dtype=tf.float32)),
      tf.linalg.matvec(observation, tf.cast(arm2_param, dtype=tf.float32)),
      tf.linalg.matvec(observation, tf.cast(arm3_param, dtype=tf.float32))
    ]
    optimal_action_reward = tf.reduce_max(expected_reward_for_arms, axis=0)

    return optimal_action_reward

为了本示例的目的，我们假设环境是静态的；换句话说，偏好在时间上没有变化（这在实际场景中不一定成立，取决于你关注的时间范围）：

environment = tf_py_environment.TFPyEnvironment(
    sspe.StationaryStochasticPyEnvironment(
        context_sampling_fn(batch_size),
        [arm0_reward_fn, arm1_reward_fn, arm2_reward_fn, arm3_reward_fn],
        batch_size=batch_size))

我们现在准备实例化一个实现赌博机算法的代理。我们使用预定义的LinUCB类；像往常一样，我们定义观察值（两个元素，表示用户和时间），时间步长和动作规范（四种可能内容类型之一）：

observation_spec = tensor_spec.TensorSpec([2], tf.float32)
time_step_spec = ts.time_step_spec(observation_spec)
action_spec = tensor_spec.BoundedTensorSpec(
    dtype=tf.int32, shape=(), minimum=0, maximum=2)
agent = lin_ucb_agent.LinearUCBAgent(time_step_spec=time_step_spec,
                                     action_spec=action_spec)

多臂赌博机（MAB）设置的一个关键组成部分是遗憾，它被定义为代理实际获得的奖励与oracle 策略的期望奖励之间的差异：

regret_metric = tf_metrics.RegretMetric(compute_optimal_reward)

我们现在可以开始训练我们的代理。我们运行训练循环num_iterations次，每次执行steps_per_loop步。找到这些参数的合适值通常是要在更新的时效性和训练效率之间找到平衡：

replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
    data_spec=agent.policy.trajectory_spec,
    batch_size=batch_size,
    max_length=steps_per_loop)
observers = [replay_buffer.add_batch, regret_metric]
driver = dynamic_step_driver.DynamicStepDriver(
    env=environment,
    policy=agent.collect_policy,
    num_steps=steps_per_loop * batch_size,
    observers=observers)
regret_values = []
for _ in range(num_iterations):
    driver.run()
    loss_info = agent.train(replay_buffer.gather_all())
    replay_buffer.clear()
    regret_values.append(regret_metric.result())

我们可以通过绘制后续迭代中的遗憾（负奖励）来可视化实验结果：

plt.plot(regret_values)
plt.ylabel('Average Regret')
plt.xlabel('Number of Iterations')

这将为我们绘制出以下图表：

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_11_05.png

图 11.5：训练过的 UCB 代理随时间的表现

如前图所示，在初始学习阶段（在第 30 次迭代附近的遗憾出现峰值），代理会不断改进，提供期望的内容。过程中有很多变化，表明即便在一个简化的环境下——两个用户——高效个性化仍然是一个挑战。改进的可能方向包括更长时间的训练，或者调整 DQN 代理，使其能够采用更复杂的逻辑来进行预测。

另见

相关的强盗算法及其环境的广泛集合可以在TF-Agents 文档库中找到：github.com/tensorflow/agents/tree/master/tf_agents/bandits/agents/examples/v2。

对上下文多臂强盗问题感兴趣的读者可以参考Sutton 和 Barto的书中的相关章节：web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf。

第十二章：将 TensorFlow 应用到生产环境

在本书中，我们已经看到 TensorFlow 能够实现许多模型，但 TensorFlow 能做的远不止这些。本章将向你展示其中的一些内容。在本章中，我们将涵盖以下主题：

在 TensorBoard 中可视化图表
使用 TensorBoard 的 HParams 进行超参数调优
使用 tf.test 实现单元测试
使用多个执行器
使用 tf.distribute.strategy 进行 TensorFlow 并行化
保存和恢复 TensorFlow 模型
使用 TensorFlow Serving

我们将首先展示如何使用 TensorBoard 的各个方面，TensorBoard 是 TensorFlow 自带的一个功能。这个工具允许我们即使在模型训练过程中，也能可视化总结指标、图表和图像。接下来，我们将展示如何编写适用于生产环境的代码，重点是单元测试、多处理单元的训练分配，以及高效的模型保存和加载。最后，我们将通过将模型托管为 REST 端点，解决机器学习服务方案。

在 TensorBoard 中可视化图表

监控和排查机器学习算法可能是一项艰巨的任务，尤其是在你必须等待训练完成后才能知道结果的情况下。为了应对这种情况，TensorFlow 提供了一个计算图可视化工具，称为TensorBoard。借助 TensorBoard，我们可以在训练过程中可视化图表和重要数值（如损失、准确率、批次训练时间等）。

准备工作

为了展示我们如何使用 TensorBoard 的各种方式，我们将重新实现第八章中卷积神经网络章节里的入门 CNN 模型配方中的 MNIST 模型。然后，我们将添加 TensorBoard 回调并拟合模型。我们将展示如何监控数值、值集的直方图，如何在 TensorBoard 中创建图像，以及如何可视化 TensorFlow 模型。

如何实现…

首先，我们将加载脚本所需的库：

import tensorflow as tf
import numpy as np
import datetime

现在我们将重新实现 MNIST 模型：

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)
# Padding the images by 2 pixels since in the paper input images were 32x32
x_train = np.pad(x_train, ((0,0),(2,2),(2,2),(0,0)), 'constant')
x_test = np.pad(x_test, ((0,0),(2,2),(2,2),(0,0)), 'constant')
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
# Set model parameters
image_width = x_train[0].shape[0]
image_height = x_train[0].shape[1]
num_channels = 1 # grayscale = 1 channel
# Training and Test data variables
batch_size = 100
evaluation_size = 500
generations = 300
eval_every = 5
# Set for reproducible results
seed = 98
np.random.seed(seed)
tf.random.set_seed(seed)
# Declare the model
input_data = tf.keras.Input(dtype=tf.float32, shape=(image_width,image_height, num_channels), name="INPUT")
# First Conv-ReLU-MaxPool Layer
conv1 = tf.keras.layers.Conv2D(filters=6,
                               kernel_size=5,
                               padding='VALID',
                               activation="relu",
                               name="C1")(input_data)
max_pool1 = tf.keras.layers.MaxPool2D(pool_size=2,
                                      strides=2, 
                                      padding='SAME',
                                      name="S1")(conv1)
# Second Conv-ReLU-MaxPool Layer
conv2 = tf.keras.layers.Conv2D(filters=16,
                               kernel_size=5,
                               padding='VALID',
                               strides=1,
                               activation="relu",
                               name="C3")(max_pool1)
max_pool2 = tf.keras.layers.MaxPool2D(pool_size=2,
                                      strides=2, 
                                      padding='SAME',
                                      name="S4")(conv2)
# Flatten Layer
flatten = tf.keras.layers.Flatten(name="FLATTEN")(max_pool2)
# First Fully Connected Layer
fully_connected1 = tf.keras.layers.Dense(units=120,
                                         activation="relu",
                                         name="F5")(flatten)
# Second Fully Connected Layer
fully_connected2 = tf.keras.layers.Dense(units=84,
                                         activation="relu",
                                         name="F6")(fully_connected1)
# Final Fully Connected Layer
final_model_output = tf.keras.layers.Dense(units=10,
                                           activation="softmax",
                                           name="OUTPUT"
                                           )(fully_connected2)

model = tf.keras.Model(inputs= input_data, outputs=final_model_output)

接下来，我们将使用稀疏类别交叉熵损失和 Adam 优化器编译模型。然后，我们将展示总结：

model.compile(
    optimizer="adam", 
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)
model.summary()

我们将为每次运行创建一个带时间戳的子目录。总结写入器将把TensorBoard日志写入这个文件夹：
```
log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") 
```

接下来，我们将实例化一个TensorBoard回调并将其传递给fit方法。训练阶段的所有日志将存储在此目录中，并可以立即在TensorBoard中查看：

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, 
                                                      write_images=True,
                                                      histogram_freq=1 )
model.fit(x=x_train, 
          y=y_train, 
          epochs=5,
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])

然后我们通过运行以下命令启动TensorBoard应用程序：
```
$ tensorboard --logdir="logs" 
```
然后我们在浏览器中导航到以下链接：http://127.0.0.0:6006。如果需要，我们可以通过传递--port 6007命令（例如，在 6007 端口运行）来指定不同的端口。我们还可以通过在笔记本中运行%tensorboard --logdir="logs"命令来启动 TensorBoard。请记住，TensorBoard 将在你的程序运行时可见。
我们可以通过 TensorBoard 的标量视图快速且轻松地可视化和比较多个实验的度量。在默认情况下，TensorBoard 会在每个训练周期记录度量和损失。我们可以使用以下参数通过每个批次更新该频率：update_freq='batch'。我们还可以使用参数 write_images=True 将模型权重可视化为图像，或者使用 histogram_freq=1 以直方图的形式（每个周期计算）显示偏差和权重。
这是标量视图的截图：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_01.png

图 12.1：训练和测试损失随着时间推移而减少，而训练和测试准确度则增加
这里，我们展示如何通过直方图摘要可视化权重和偏差。通过此仪表板，我们可以绘制非标量张量（如权重和偏差）在不同时间点的多个直方图可视化。这样，我们就能看到这些值是如何随时间变化的：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_02.png

图 12.2：在 TensorBoard 中通过直方图视图可视化权重和偏差
现在，我们将通过 TensorFlow 的图形仪表板可视化 TensorFlow 模型，仪表板通过不同的视图展示模型。该仪表板不仅可以可视化操作级别的图，还可以显示概念级别的图。操作级图展示了 Keras 模型以及指向其他计算节点的额外边缘，而概念级图仅展示 Keras 模型。这些视图可以帮助我们快速检查并比较我们的设计，并理解 TensorFlow 模型结构。
这里，我们展示如何可视化操作级别的图：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_03.png

图 12.3：TensorBoard 中的操作级图
通过添加 TensorBoard 回调，我们可以可视化损失、度量、模型权重作为图像等内容。但我们也可以使用 tf.summary 模块来写入可以在 TensorFlow 中可视化的摘要数据。首先，我们需要创建一个 FileWriter，然后就可以写入直方图、标量、文本、音频或图像摘要。在这里，我们将使用图像摘要 API 来写入图像，并在 TensorBoard 中进行可视化：
```
# Create a FileWriter for the timestamped log directory.
file_writer = tf.summary.create_file_writer(log_dir)
with file_writer.as_default():
    # Reshape the images and write image summary.
    images = np.reshape(x_train[0:10], (-1, 32, 32, 1))
    tf.summary.image("10 training data examples", images, max_outputs=10, step=0) 
```

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_04.png

图 12.4：在 TensorBoard 中可视化图像

请注意，不要过于频繁地将图像摘要写入 TensorBoard。例如，如果我们在 10,000 次训练中每次都写入一次图像摘要，那么将生成 10,000 张图像的摘要数据。这会非常迅速地占用磁盘空间。

工作原理…

在本节中，我们在 MNIST 数据集上实现了一个 CNN 模型。我们添加了 TensorBoard 回调并训练了模型。然后，我们使用 TensorFlow 的可视化工具来监控数值和数值集合的直方图，进而可视化模型图等。

请记住，我们可以通过命令行启动 TensorBoard，如食谱中所示，但也可以通过使用 %tensorboard 魔法命令在笔记本中启动它。

另见

关于 TensorBoard API 的一些参考资料，请访问以下网站：

官方的 TensorBoard 指南：www.tensorflow.org/tensorboard/get_started
TensorFlow 摘要 API：www.tensorflow.org/api_docs/python/tf/summary

还有更多…

TensorBoard.dev 是 Google 提供的免费托管服务。其目的是轻松托管、跟踪和分享机器学习实验给任何人。在我们启动实验后，只需将 TensorBoard 日志上传到 TensorBoard 服务器。然后，分享链接，任何拥有该链接的人都可以查看我们的实验。请注意不要上传敏感数据，因为上传的 TensorBoard 数据集是公开的，所有人都可以看到。

使用 TensorBoard 的 HParams 管理超参数调优

在机器学习项目中调优超参数可能是一项真正的挑战。这个过程是迭代的，并且可能需要很长时间来测试所有的超参数组合。但幸运的是，HParams——一个 TensorBoard 插件，来拯救我们。它允许我们通过测试找到最佳的超参数组合。

准备工作

为了说明 HParams 插件如何工作，我们将使用一个在 MNIST 数据集上的顺序模型实现。我们将配置 HParams，并比较几个超参数组合，以找到最佳的超参数优化。

如何操作…

首先，我们将加载脚本所需的库：

import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
import numpy as np
import datetime

接下来，我们将加载并准备 MNIST 数据集：

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
## Set model parameters
image_width = x_train[0].shape[0]
image_height = x_train[0].shape[1]
num_channels = 1 # grayscale = 1 channel

然后，对于每个超参数，我们将定义要测试的值列表或区间。在这一部分，我们将介绍三个超参数：每层单元数、Dropout 率和优化器：

HP_ARCHITECTURE_NN = hp.HParam('archi_nn', 
hp.Discrete(['128,64','256,128']))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.0, 0.1))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

该模型将是一个顺序模型，包含五层：一个展平层，接着是一个全连接层，一个 Dropout 层，再是另一个全连接层，最后是一个具有 10 个单元的输出层。训练函数将接收一个包含超参数组合的 HParams 字典作为参数。由于我们使用的是 Keras 模型，我们在 fit 方法中添加了 HParams Keras 回调来监控每次实验。对于每次实验，插件将记录超参数组合、损失值和指标。如果需要监控其他信息，我们还可以添加一个 Summary File Writer：

def train_model(hparams, experiment_run_log_dir):

    nb_units = list(map(int, hparams[HP_ARCHITECTURE_NN].split(",")))

    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Flatten(name="FLATTEN"))
    model.add(tf.keras.layers.Dense(units=nb_units[0], activation="relu", name="D1"))
    model.add(tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="DROP_OUT"))
    model.add(tf.keras.layers.Dense(units=nb_units[1], activation="relu", name="D2"))
    model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))

    model.compile(
        optimizer=hparams[HP_OPTIMIZER], 
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )

    tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=experiment_run_log_dir)
    hparams_callback = hp.KerasCallback(experiment_run_log_dir, hparams)

    model.fit(x=x_train, 
              y=y_train, 
              epochs=5,
              validation_data=(x_test, y_test),
              callbacks=[tensorboard_callback, hparams_callback]
             )
model = tf.keras.Model(inputs= input_data, outputs=final_model_output)

接下来，我们将对所有超参数进行迭代：

for archi_nn in HP_ARCHITECTURE_NN.domain.values:
    for optimizer in HP_OPTIMIZER.domain.values:
        for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
            hparams = {
                HP_ARCHITECTURE_NN : archi_nn, 
                HP_OPTIMIZER: optimizer,
                HP_DROPOUT : dropout_rate
            }

            experiment_run_log_dir="logs/experiment-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

            train_model(hparams, experiment_run_log_dir)

然后，我们通过运行此命令启动 TensorBoard 应用程序：
```
$ tensorboard --logdir="logs" 
```
然后，我们可以快速而轻松地在 HParams 表格视图中可视化结果（超参数和指标）。如果需要，左侧面板可以应用过滤器和排序：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_05.png

图 12.5：在 TensorBoard 中可视化的 HParams 表格视图
在平行坐标视图中，每个轴表示一个超参数或指标，每次运行由一条线表示。这个可视化方法可以快速识别出最佳的超参数组合：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_06.png

图 12.6：在 TensorBoard 中可视化的 HParams 平行坐标视图

使用 TensorBoard HParams 是一种简单且富有洞察力的方式，可以识别最佳超参数，并帮助管理你在 TensorFlow 中的实验。

另见

有关 HParams TensorBoard 插件的参考，请访问以下网站：

官方 TensorBoard 指南：www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams

实现单元测试

测试代码可以加速原型开发，提高调试效率，快速变更，并且使得代码共享变得更容易。TensorFlow 2.0 提供了tf.test模块，我们将在本节中介绍它。

准备工作

在编写 TensorFlow 模型时，单元测试可以帮助检查程序功能。这对我们很有帮助，因为当我们想对程序单元进行修改时，测试可以确保这些修改不会以未知的方式破坏模型。在 Python 中，主要的测试框架是unittest，但 TensorFlow 提供了自己的测试框架。在本节中，我们将创建一个自定义层类，并实现一个单元测试，演示如何在 TensorFlow 中编写单元测试。

如何操作…

首先，我们需要加载必要的库，如下所示：
```
import tensorflow as tf
import numpy as np 
```

然后，我们需要声明我们的自定义门控函数，应用f(x) = a1 * x + b1：

class MyCustomGate(tf.keras.layers.Layer):

    def __init__(self, units, a1, b1):
        super(MyCustomGate, self).__init__()
        self.units = units
        self.a1 = a1
        self.b1 = b1
    # Compute f(x) = a1 * x + b1
    def call(self, inputs):
        return inputs * self.a1 + self.b1

接下来，我们创建一个继承自tf.test.TestCase类的单元测试类。setup方法是一个hook方法，在每个test方法之前被调用。assertAllEqual方法检查预期输出和计算输出是否相等：

class MyCustomGateTest(tf.test.TestCase):
    def setUp(self):
        super(MyCustomGateTest, self).setUp()
        # Configure the layer with 1 unit, a1 = 2 et b1=1
        self.my_custom_gate = MyCustomGate(1,2,1)
    def testMyCustomGateOutput(self):
        input_x = np.array([[1,0,0,1],
                           [1,0,0,1]])
        output = self.my_custom_gate(input_x)
        expected_output = np.array([[3,1,1,3], [3,1,1,3]])
        self.assertAllEqual(output, expected_output)

现在我们需要在脚本中加入一个main()函数，用于运行所有单元测试：
```
tf.test.main() 
```

从终端运行以下命令。我们应该会得到如下输出：

$ python3 01_implementing_unit_tests.py
...
[       OK ] MyCustomGateTest.testMyCustomGateOutput
[ RUN      ] MyCustomGateTest.test_session
[  SKIPPED ] MyCustomGateTest.test_session
----------------------------------------------------------------------
Ran 2 tests in 0.016s
OK (skipped=1)

我们实现了一个测试并且通过了。不要担心那两个test_session测试——它们是虚拟测试。

请注意，许多专门为 TensorFlow 量身定制的断言可以在tf.test API 中找到。

它是如何工作的…

在这一部分，我们使用tf.test API 实现了一个 TensorFlow 单元测试，它与 Python 的单元测试非常相似。记住，单元测试有助于确保代码按预期功能运行，增加共享代码的信心，并使得可重复性更容易实现。

另见

有关tf.test模块的参考，请访问以下网站：

官方 TensorFlow 测试 API：www.tensorflow.org/api_docs/python/tf/test

使用多个执行器

你可能知道，TensorFlow 有许多特性，包括计算图，它们天生适合并行计算。计算图可以在不同的处理器之间拆分，也可以在不同的批次之间进行处理。我们将在本节中讨论如何在同一台机器上访问不同的处理器。

准备工作

在本教程中，我们将向您展示如何访问同一系统上的多个设备并在其上进行训练。设备是 CPU 或加速单元（GPU、TPU），TensorFlow 可以在其上运行操作。这是非常常见的情况：除了 CPU，机器可能还有一个或多个 GPU 可以共享计算负载。如果 TensorFlow 能够访问这些设备，它将通过贪婪过程自动将计算分配到多个设备。然而，TensorFlow 也允许程序通过命名范围位置指定哪些操作将在哪个设备上执行。

在本教程中，我们将向您展示不同的命令，这些命令将允许您访问系统上的各种设备；我们还将演示如何查找 TensorFlow 正在使用的设备。请记住，一些功能仍然是实验性的，可能会发生变化。

如何实现…

为了查找 TensorFlow 为哪些操作使用了哪些设备，我们将通过设置tf.debugging.set_log_device_placement为True来激活设备分配日志。如果 TensorFlow 操作同时支持 CPU 和 GPU 设备，该操作将在默认情况下执行在 GPU 设备上（如果 GPU 可用）：

tf.debugging.set_log_device_placement(True)
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0

我们还可以使用张量设备属性来返回该张量将分配到的设备名称：

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
print(a.device)
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
print(b.device)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0

默认情况下，TensorFlow 会自动决定如何在计算设备（CPU 和 GPU）之间分配计算，有时我们需要通过创建tf.device函数的设备上下文来选择要使用的设备。在此上下文中执行的每个操作都会使用所选设备：

tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:CPU:0

如果我们将matmul操作移出上下文，如果有可用的 GPU 设备，则该操作将在 GPU 设备上执行：

tf.debugging.set_log_device_placement(True)
with tf.device('/device:CPU:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0

在使用 GPU 时，TensorFlow 会自动占用 GPU 内存的大部分。虽然这通常是期望的行为，但我们可以采取措施更加谨慎地分配 GPU 内存。虽然 TensorFlow 永远不会释放 GPU 内存，但我们可以通过设置 GPU 内存增长选项，逐步增加其分配，直到达到最大限制（仅在需要时）。注意，物理设备初始化后不能修改：
```
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_memory_growth(gpu_devices[0], True)
    except RuntimeError as e:
        # Memory growth cannot be modified after GPU has been initialized
        print(e) 
```

如果我们想要对 TensorFlow 使用的 GPU 内存设置硬性限制，我们还可以创建一个虚拟 GPU 设备并设置分配到该虚拟 GPU 的最大内存限制（单位：MB）。注意，虚拟设备初始化后不能修改：

gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                   [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    except RuntimeError as e:
        # Memory growth cannot be modified after GPU has been initialized
        print(e)

我们还可以通过单个物理 GPU 来模拟虚拟 GPU 设备。通过以下代码可以实现：

gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                   [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
                                                    tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
    except RuntimeError as e:
        # Memory growth cannot be modified after GPU has been initialized
        print(e)

有时我们可能需要编写稳健的代码来判断是否有可用的 GPU。TensorFlow 有一个内置函数，可以测试 GPU 是否可用。当我们希望在 GPU 可用时利用它并将特定操作分配给它时，这非常有帮助。通过以下代码可以实现：
```
if tf.test.is_built_with_cuda(): 
    <Run GPU specific code here> 
```

如果我们需要将特定操作分配给 GPU，可以输入以下代码。这将执行简单的计算，并将操作分配给主 CPU 和两个辅助 GPU：

if tf.test.is_built_with_cuda():
    with tf.device('/cpu:0'):
        a = tf.constant([1.0, 3.0, 5.0], shape=[1, 3])
        b = tf.constant([2.0, 4.0, 6.0], shape=[3, 1])

        with tf.device('/gpu:0'):
            c = tf.matmul(a,b)
            c = tf.reshape(c, [-1])

        with tf.device('/gpu:1'):
            d = tf.matmul(b,a)
            flat_d = tf.reshape(d, [-1])

        combined = tf.multiply(c, flat_d)
    print(combined)
Num GPUs Available:  2
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Reshape in device /job:localhost/replica:0/task:0/device:GPU:1
Executing op Mul in device /job:localhost/replica:0/task:0/device:CPU:0
tf.Tensor([  88\.  264\.  440\.  176\.  528\.  880\.  264\.  792\. 1320.], shape=(9,), dtype=float32)

我们可以看到，前两个操作已在主 CPU 上执行，接下来的两个操作在第一个辅助 GPU 上执行，最后两个操作在第二个辅助 GPU 上执行。

工作原理…

当我们希望为 TensorFlow 操作设置特定设备时，我们需要了解 TensorFlow 如何引用这些设备。TensorFlow 中的设备名称遵循以下约定：

设备	设备名称
主 CPU	`/device:CPU:0`
主 GPU	`/GPU:0`
第二 GPU	`/job:localhost/replica:0/task:0/device:GPU:1`
第三 GPU	`/job:localhost/replica:0/task:0/device:GPU:2`

请记住，TensorFlow 将 CPU 视为一个独立的处理器，即使该处理器是一个多核处理器。所有核心都被包装在/device:CPU:0中，也就是说，TensorFlow 默认确实使用多个 CPU 核心。

还有更多…

幸运的是，现在在云端运行 TensorFlow 比以往任何时候都更容易。许多云计算服务提供商提供 GPU 实例，这些实例拥有主 CPU 和强大的 GPU。请注意，获得 GPU 的简单方法是通过 Google Colab 运行代码，并在笔记本设置中将 GPU 设置为硬件加速器。

并行化 TensorFlow

训练模型可能非常耗时。幸运的是，TensorFlow 提供了几种分布式策略来加速训练，无论是针对非常大的模型还是非常大的数据集。本食谱将向我们展示如何使用 TensorFlow 分布式 API。

准备中

TensorFlow 分布式 API 允许我们通过将模型复制到不同节点并在不同数据子集上进行训练来分布式训练。每个策略支持一个硬件平台（多个 GPU、多个机器或 TPU），并使用同步或异步训练策略。在同步训练中，每个工作节点在不同的数据批次上训练，并在每一步汇聚它们的梯度。而在异步模式下，每个工作节点独立训练数据，变量异步更新。请注意，目前 TensorFlow 仅支持上面描述的数据并行性，根据路线图，它很快将支持模型并行性。当模型太大，无法放入单个设备时，就需要将模型分布到多个设备上进行训练。本例将介绍该 API 提供的镜像策略。

如何实现…

首先，我们将加载该食谱所需的库，如下所示：

import tensorflow as tf
import tensorflow_datasets as tfds

我们将创建两个虚拟 GPU：

# Create two virtual GPUs
gpu_devices = tf.config.list_physical_devices('GPU')
if gpu_devices:
    try:
        tf.config.experimental.set_virtual_device_configuration(gpu_devices[0],
                                                   [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
                                                    tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024) ])
    except RuntimeError as e:
        # Memory growth cannot be modified after GPU has been initialized
        print(e)

接下来，我们将通过tensorflow_datasets API 加载 MNIST 数据集，如下所示：

datasets, info = tfds.load('mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']

然后，我们将准备数据：

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label
mnist_train = mnist_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
mnist_train = mnist_train.cache()
mnist_train = mnist_train.shuffle(info.splits['train'].num_examples)
mnist_train = mnist_train.prefetch(tf.data.experimental.AUTOTUNE)
mnist_test = mnist_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
mnist_test = mnist_test.cache()
mnist_test = mnist_test.prefetch(tf.data.experimental.AUTOTUNE)

我们现在准备应用镜像策略。这个策略的目标是在同一台机器上的所有 GPU 上复制模型。每个模型在不同的批次数据上进行训练，并应用同步训练策略：
```
mirrored_strategy = tf.distribute.MirroredStrategy() 
```
接下来，我们检查是否有两个设备对应于在本示例开始时创建的两个虚拟 GPU，如下所示：
```
print('Number of devices: {}'.format(mirrored_strategy.num_replicas_in_sync)) 
```
然后，我们将定义批量大小的值。给数据集的批量大小就是全局批量大小。全局批量大小是每个副本的所有批量大小的总和。所以，我们需要使用副本的数量来计算全局批量大小：
```
BATCH_SIZE_PER_REPLICA = 128
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync
mnist_train = mnist_train.batch(BATCH_SIZE)
mnist_test = mnist_test.batch(BATCH_SIZE) 
```

接下来，我们将使用镜像策略作用域定义并编译我们的模型。请注意，所有在作用域内创建的变量都会在所有副本之间镜像：

with mirrored_strategy.scope():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(name="FLATTEN"))
    model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
    model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
    model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))

    model.compile(
        optimizer="sgd", 
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )

一旦编译完成，我们就可以像平常一样拟合之前的模型：

model.fit(mnist_train, 
          epochs=10,
          validation_data= mnist_test
          )

使用策略作用域是分布式训练所需要做的唯一事情。

它是如何工作的…

使用分布式 TensorFlow API 非常简单。你需要做的就是分配作用域。然后，可以手动或自动将操作分配给工作节点。请注意，我们可以轻松切换不同的策略。

这是一些分布式策略的简要概述：

TPU 策略类似于镜像策略，但它运行在 TPU 上。
多工作节点镜像策略与镜像策略非常相似，但模型在多台机器上进行训练，可能配有多个 GPU。我们需要指定跨设备通信。
中央存储策略在一台机器上使用同步模式并配备多个 GPU。变量不进行镜像，而是放置在 CPU 上，操作会复制到所有本地 GPU 上。
参数服务器策略是在一组机器上实现的。一些机器担任工作节点角色，另一些则担任参数服务器角色。工作节点进行计算，参数服务器存储模型的变量。

另见

有关 tf.distribute.Strategy 模块的一些参考，请访问以下网站：

TensorFlow 分布式训练: www.tensorflow.org/guide/distributed_training
tf.distribute API: www.tensorflow.org/api_docs/python/tf/distribute

还有更多…

在这个示例中，我们刚刚使用了镜像策略，并通过 Keras API 执行了程序。请注意，当在图模式下使用时，TensorFlow 分布式 API 的效果比在急切模式下要好。

这个 API 更新速度很快，因此请随时查阅官方文档，了解在什么场景下支持哪些分布式策略（Keras API、自定义训练循环或 Estimator API）。

保存和恢复 TensorFlow 模型

如果我们想在生产环境中使用机器学习模型，或者将我们训练好的模型用于迁移学习任务，我们必须保存我们的模型。在本节中，我们将概述一些保存和恢复权重或整个模型的方法。

准备工作

在本篇中，我们将总结几种保存 TensorFlow 模型的方法。我们将涵盖保存和恢复整个模型、仅保存权重以及模型检查点的最佳方式。

如何操作…

我们首先加载必要的库：
```
import tensorflow as tf 
```

接下来，我们将使用 Keras Sequential API 构建一个 MNIST 模型：

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))

model.compile(optimizer="sgd", 
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"]
             )
model.fit(x=x_train, 
          y=y_train, 
          epochs=5,
          validation_data=(x_test, y_test)
         )

然后，我们将使用推荐的格式保存整个模型为磁盘上的 SavedModel 格式。此格式保存模型图和变量：
```
model.save("SavedModel") 
```
在磁盘上创建一个名为 SavedModel 的目录。它包含一个 TensorFlow 程序，saved_model.pb 文件；variables 目录，包含所有参数的确切值；以及 assets 目录，包含 TensorFlow 图使用的文件：
```
SavedModel
|_ assets
|_ variables
|_ saved_model.pb 
```
请注意，save() 操作也接受其他参数。可以根据模型的复杂性以及传递给 save 方法的签名和选项创建额外的目录。

接下来，我们将恢复我们保存的模型：

model2 = tf.keras.models.load_model("SavedModel")

如果我们更倾向于将模型保存为 H5 格式，我们可以传递一个以 .h5 结尾的文件名，或者添加 save_format="h5" 参数：
```
model.save("SavedModel.h5")
model.save("model_save", save_format="h5") 
```

我们还可以使用 ModelCheckpoint 回调来将整个模型或仅仅是权重保存到检查点结构中，间隔一定的训练周期。这个回调会被添加到 fit 方法中的 callback 参数中。在下面的配置中，模型的权重会在每个 epoch 后被保存：

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath="./checkpoint",save_weights_only=True, save_freq='epoch')
model.fit(x=x_train, 
          y=y_train, 
          epochs=5,
          validation_data=(x_test, y_test),
          callbacks=[checkpoint_callback]
         )

我们稍后可以加载整个模型或仅加载权重，以便继续训练。在这里，我们将重新加载权重：
```
model.load_weights("./checkpoint") 
```

现在，您已经准备好保存和恢复整个模型、仅保存权重或模型检查点。

工作原理…

在本节中，我们提供了几种保存和恢复整个模型或仅保存权重的方法。这使得您能够将模型投入生产，或避免从头开始重新训练一个完整的模型。我们还介绍了如何在训练过程中以及训练后保存模型。

另见

关于这个主题的一些参考资料，请访问以下网站：

官方训练检查点指南: www.tensorflow.org/guide/checkpoint
官方 SavedModel 格式指南: www.tensorflow.org/guide/saved_model
tf.saved_model API: www.tensorflow.org/api_docs/python/tf/saved_model/save
Keras 模型检查点 API: www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

使用 TensorFlow Serving

在本节中，我们将向您展示如何在生产环境中提供机器学习模型。我们将使用TensorFlow 扩展（TFX）平台的 TensorFlow Serving 组件。TFX 是一个 MLOps 工具，旨在为可扩展和高性能的模型任务构建完整的端到端机器学习管道。TFX 管道由一系列组件组成，涵盖数据验证、数据转换、模型分析和模型服务等内容。在这个教程中，我们将重点介绍最后一个组件，它支持模型版本控制、多个模型等功能。

准备开始

本节开始时，建议您阅读官方文档和 TFX 网站上的简短教程，网址为www.tensorflow.org/tfx。

在本例中，我们将构建一个 MNIST 模型，保存它，下载 TensorFlow Serving 的 Docker 镜像，运行它，并向 REST 服务器发送 POST 请求以获取一些图像预测。

如何操作…

在这里，我们将像之前一样开始，先加载必要的库：

import tensorflow as tf
import numpy as np
import requests
import matplotlib.pyplot as plt
import json

我们将使用 Keras 的 Sequential API 构建一个 MNIST 模型：

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize
x_train = x_train / 255
x_test = x_test/ 255
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(name="FLATTEN"))
model.add(tf.keras.layers.Dense(units=128 , activation="relu", name="D1"))
model.add(tf.keras.layers.Dense(units=64 , activation="relu", name="D2"))
model.add(tf.keras.layers.Dense(units=10, activation="softmax", name="OUTPUT"))

model.compile(optimizer="sgd", 
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"]
             )
model.fit(x=x_train, 
          y=y_train, 
          epochs=5,
          validation_data=(x_test, y_test)
         )

然后，我们将把模型保存为 SavedModel 格式，并为每个版本的模型创建一个目录。TensorFlow Serving 需要特定的目录结构，并且模型必须以 SavedModel 格式保存。每个模型版本应导出到给定路径下的不同子目录中。这样，当我们调用服务器进行预测时，就可以轻松指定我们想要使用的模型版本：https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_07.png

图 12.7：TensorFlow Serving 期望的目录结构截图

上面的截图展示了所需的目录结构。在该结构中，我们有定义好的数据目录my_mnist_model，后跟模型版本号1。在版本号目录下，我们保存 protobuf 模型和一个variables文件夹，其中包含需要保存的变量。

我们需要注意，在数据目录中，TensorFlow Serving 会查找整数编号的文件夹。TensorFlow Serving 会自动启动并获取最大整数编号下的模型。这意味着，要部署一个新模型，我们需要将其标记为版本 2，并放入一个新的文件夹，该文件夹也标记为 2。TensorFlow Serving 随后会自动识别该模型。
然后，我们将通过 Docker 安装 TensorFlow Serving。如果需要，我们建议读者访问官方 Docker 文档，以获取 Docker 安装说明。

第一步是拉取最新的 TensorFlow Serving Docker 镜像：
```
$ docker pull tensorflow/serving 
```
现在，我们将启动一个 Docker 容器：将 REST API 端口 8501 映射到主机的 8501 端口，使用之前创建的模型my_mnist_model，将其绑定到模型的基本路径/models/my_mnist_model，并将环境变量MODEL_NAME设置为my_mnist_model：
```
$ docker run -p 8501:8501 \
  --mount type=bind,source="$(pwd)/my_mnist_model/",target=/models/my_mnist_model \
  -e MODEL_NAME=my_mnist_model -t tensorflow/serving 
```

然后，我们将显示图像进行预测：

num_rows = 4
num_cols = 3
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for row in range(num_rows):
    for col in range(num_cols):
        index = num_cols * row + col
        image = x_test[index]
        true_label = y_test[index]
        plt.subplot(num_rows, 2*num_cols, 2*index+1)
        plt.imshow(image.reshape(28,28), cmap="binary")
        plt.axis('off')
        plt.title('\n\n It is a {}'.format(y_test[index]), fontdict={'size': 16})
plt.tight_layout()
plt.show()

https://github.com/OpenDocCN/freelearn-dl-pt3-zh/raw/master/docs/ml-tf-cb/img/B16254_12_08.png

现在，我们可以将二进制数据提交到<host>:8501并获取 JSON 响应，显示结果。我们可以通过任何机器和任何编程语言来完成此操作。这样做非常有用，因为不必依赖客户端拥有 TensorFlow 的本地副本。

在这里，我们将向我们的服务器发送 POST 预测请求并传递图像。服务器将返回每个图像对应的 10 个概率，表示每个数字（从0到9）的概率：
```
json_request = '{{ "instances" : {} }}'.format(x_test[0:12].tolist())
resp = requests.post('http://localhost:8501/v1/models/my_mnist_model:predict', data=json_request, headers = {"content-type": "application/json"})
print('response.status_code: {}'.format(resp.status_code))     
print('response.content: {}'.format(resp.content))
predictions = json.loads(resp.text)['predictions'] 
```

接下来，我们将展示我们图像的预测结果：

num_rows = 4
num_cols = 3
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for row in range(num_rows):
    for col in range(num_cols):
        index = num_cols * row + col
        image = x_test[index]
        predicted_label = np.argmax(predictions[index])
        true_label = y_test[index]
        plt.subplot(num_rows, 2*num_cols, 2*index+1)
        plt.imshow(image.reshape(28,28), cmap="binary")
        plt.axis('off')
        if predicted_label == true_label:
            color = 'blue'
        else:
            color = 'red'
        plt.title('\n\n The model predicts a {} \n and it is a {}'.format(predicted_label, true_label), fontdict={'size': 16}, color=color)
plt.tight_layout()
plt.show()