TensorFlow【master】 _8.11

最新推荐文章于 2022-03-29 16:36:03 发布

原创最新推荐文章于 2022-03-29 16:36:03 发布 · 置顶 · 899 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习

学习笔记专栏收录该内容

27 篇文章

订阅专栏

学习记录

7.10 安装、官网、StanfordTF Course
7.11 - StanfordTF Course
8.6 - 知乎资料学习

1.Tensorflow官网

https://www.tensorflow.org/get_started/get_started_for_beginners

安装Tensorflow

直接在pytharm下安装

图和会话

https://www.tensorflow.org/programmers_guide/graphs
- tf.Graph：包含图结构和图集合
- tf.Graph 对象会为其包含的 tf.Operation 对象定义一个命名空间。TensorFlow 会自动为您的图中的每个指令选择一个唯一名称，但您也可以指定描述性名称，使您的程序阅读和调试起来更轻松。TensorFlow API 提供两种方法来覆盖操作名称：

2.Stanford TF 2017：

CS 20SI: Tensorflow for Deep Learning Research
- https://www.bilibili.com/video/av9156347/?from=search&seid=6905181275544516403

Lecture1

Tensorflow的主要特点是将定义计算（definition of computation）和执行计算（execution of computation）
-
操作、变量、常量组合在一起就是一个tensor
tensor只会定义一个计算图，需要通过session来fetch the value of a graph
构建graph的步骤：
- 创建新图
- 设置为默认图（no more than one graph）
- 添加节点
如果有两个graph（default_graph 和 user_graph）
使用graph的好处：
- 可以只计算需要的某个子图（subgraph）
- 是一种分布式预算的设计思路，利于分布式计算
调试代码时出现这个问题，明天去看看
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 通过tensorflow编译安装获取底层的CPU加速接口。
TensorFlow如何充分使用所有CPU核数，提高TensorFlow的CPU使用率，以及Intel的MKL加速
- http://nooverfit.com/wp/tensorflow如何充分使用所有cpu核数，提高tensorflow的cpu使用率

Lecture2 OPs

numpy broadcast特性
- https://www.youtube.com/watch?v=9kC836XhICU
- numpy.broadcast
- 广播用以描述numpy中对两个形状不同的阵列进行数学计算的处理机制。较小的阵列“广播”到较大阵列相同的形状尺度上，使它们对等以进行数学计算。广播提供了一种向量化阵列的操作方式，因此Python不需要像C一样循环。广播操作不需要数据复制，通常执行效率非常高。然而，有时广播是个坏主意，可能会导致内存浪费以致计算减慢。
- 向量被stretching to be the compatible shape
- https://www.cnblogs.com/yangmang/p/7125458.html
- 广播原则：如果两个数组的后缘维度(即：从末尾开始算起的维度)的轴长相符或其中一方的长度为1，则认为它们是广播兼容的，广播会在缺失和(或)长度为1的轴上进行。
- 理解后缘轴长度
- 感觉最好不要用，代码可读性差。

Constants

tf.constant(value, dtype=None, shape=None, name=’Const’, verify_shape=False)
- verify_shape用来校验value 和 shape是否一致
- 如果想直接查看tf的值，可以开启tf.InteractiveSession()模式，通过a.eval()查看a的结构和内容（而不是通过session来执行）
- as it avoids having to pass an explicit Session object to run ops.
  - https://www.tensorflow.org/api_docs/python/tf/InteractiveSession?hl=zh-cn
Tensors filled with specific value
- tf.zeros(shape, dtype=tf.float32, name=None)
- tf.zeros_like(input_tensor, dtype=None, name=None, optimize=True)
- tf.ones / tf.ones_like
- tf.fill(dims, value, name=None)
  - tf.fill([2,3],8) => [[8,8,8],[8,8,8]]
Constants as sequences
- tf.linspace()
- tf.range(start, limit, delta, dtype, name)
Randomly generated constants
- tf.random_normal()
- tf.truncated_normal()
  - 截断正态分布是截断分布(Truncated Distribution)的一种，那么截断分布是什么？截断分布是指限制变量x取值范围(scope)的一种分布。
  - 正态分布，如果随机数偏离均值超过2个标准差，就重新随机
- tf.random_uniform()
- tf.random_shuffle()
  - 对张量的内容进行排序
- tf.random_crop()
- tf.multinomial()
- tf.random_gamma()
- tf.set_random_seed(seed)
Operations
- tf.add_n([a,b,b]) => a+b+b
- tf.matmul(a,b) 矩阵相乘，如果维数不对应，通过 tf.reshape(a,[m,n]) 变换
Tensorflow Data type
- takes Python natives types: boolean, numeric(int, float), strings
- - ones_like返回与原tensor相同type的数据，如果输入是字符型，则error，因为1不是string.
- Do not use Python native types for tensors because TF has to infer Python type.
- Constants were stored in graph, which makes loading graphs expensive when constants are big.

Variables

定义variable时指定类型和初始值
tf.Variable is a class, but tf.constant is an op
初始化
- 变量使用前需要初始化：
- https://blog.youkuaiyun.com/yjk13703623757/article/details/77075711
- 全局初始化：

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

init_ab = tf.variables_initializer([a,b],name="init_ab")
with tf.Session() as sess:
    sess.run(init_ab)

w = tf.Variable(xxx)
with tf.Session() as sess:
    sess.run(w.initializer)

Eval() a variable，获取variable的取值

with tf.Session() as sess:
    sess.run(w.initializer)
    print w.eval()

tf.Variable.assign()
- 赋值语句
- .assign()是一个op，需要在session里执行过才会生效

with tf.Session() as sess:
    sess.run(w.initializer)
    sess.run(w.assign(100))
    print w.eval()

Use a variable to initialize another variable
- 如果一个变量依赖另一个变量的值，需要另一个变量initialize。可以直接用：
  - u=tf.Variable(2* w.intialized_value())
Session vs InteractiveSession
- The only differences is an InteractiveSession makes itself the default.
Control Dependencies
- defines which ops should be run first
- tf.Graph.control_dependencies(control_inputs)

Placeholders

作用：
- Can assemble the graph first without knowing the values needed for computation.
- placeholder是TensorFlow的占位符节点，由placeholder方法创建，其也是一种常量，但是由用户在调用run方法是传递的，也可以将placeholder理解为一种形参。即其不像constant那样直接可以使用，需要用户传递常数值。
- feed the values to placeholders using a dictionary.
tf.placeholder(dtype, shape=None, name=None)

#当你要run的session里的OP包含placeholder时，需要对placeholder赋值
a=tf.placeholder(tf.float32, shape=[3])
b=tf.constant([5,5,5], tf.float32)
c=a+b
with tf.Session() as sess:
    print sess.run(c,feed_dict={a:[1,2,3]})

- test if a tensor is feedable
    - tf.Graph.is_feedable(tensor)

Lazy loading
- Put in the OP during execution
- 坏处：没有显示地显示 tf.add这个节点，不易于tensorboard的graph阅读。
一个case
- 把weight、bias、incoming设定为variable，便于optimize更新参数

Lecture3 Basic models

linear model case problems
- optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)
  - 梯度下降优化函数的原理，直接输入loss function，形式定义要求
  - API文档：https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer
  - 原理：looks at all trainable variables that optimizer depends on and update them in a loop
    - 里面被训练的variable是 tf.Variable()中trainable=True的所有变量
- _, l = sess.run([optimizer, loss], feed_dict={X: x, Y: y})
  - sess.run中run的对象填写要求，取_是因为该函数return的值用不到
- tf中的条件判断：没有if a-b>d then ，需要构建condition
logistic regression（MINST）
- target:

Lecture4 Structure your models

word2vec
- https://blog.youkuaiyun.com/mylove0414/article/details/61616617
- https://zhuanlan.zhihu.com/p/37176454
- http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
  - If two different words have very similar “contexts” (that is, what words are likely to appear around them), then our model needs to output very similar results for these two words.
Sturcture a tensorflow model
Word embedding
- 含义：capture the semantic relationships between words.
  - count & predict
  - CBOW和Skip-gram
    - continus bag-of-words / skip-gram
    - https://zhuanlan.zhihu.com/p/35074402
NCE（noise contrastive estimation）
- PDF：https://arxiv.org/pdf/1410.8251.pdf
  - 直接计算softmax成本非常大（需要遍历所有v个单词）
- https://blog.youkuaiyun.com/littlely_ll/article/details/79252064
  - 噪音对比估计
  - 假设X=(x1,x2,⋯,xTd)是从真实的数据（或语料库）中抽取样本，但是样本服从什么样的分布我们不知道，那么先假设其中的每个xi服从一个未知的概率密度函数pd。这样我们需要一个相对可参考的分布反过来去估计概率密度函数pd，这个可参考的分布或称之为噪音分布应该是我们知道的，比如高斯分布，均匀分布等。假设这个噪音分布的概率密度函数为pn，从中抽取样本数据为Y=(y1,y2,⋯,yTn)，而这个数据称之为噪声样本，我们的目的就是通过学习一个分类器把这两类样本区别开来，并能从模型中学到数据的属性，噪音对比估计的思想就是“通过比较而学习”。
namescope: group nodes together
- http://web.stanford.edu/class/cs20si/lectures/notes_04.pdf
- 主要是为了在tensorboard中 visualization 好理解
word2vec代码
- https://github.com/moonfansLTH/stanford-tensorflow-tutorials/blob/master/examples/04_word2vec.py

Lecture5 Manage your experiment

more word2vec
- tf.gradients：求导公式
构建tf graph时可以用这样一个框架
- structure model
  - def load_data(self)
    - feel free to add instance variables to model object that store loaded data
  - def add_placeholders(self)
    - add placeholders variables to tensorflow computational graph
  - def create_feed_dict(self, input_batch, label_batch)
    - creates the feed_dict for training the given step
  - def add_model(self, input_data)
    - implements core of model that transform input_data into predictions
  - def add_loss_op(self, pred)
    - adds ops for loss to the computational graph
  - def run_epoch(self, sess, input_data, input_labels)
    - trains the model for one-epoch
  - def fit(self, sess, input_data, input_labels)
    - fit model on the provided data
  - def predict(self, sess, input_data, input_labels=None)
    - make predictions from the provided model
- language model(model) –从model中继承
  - def add_embedding(self)
    - add embedding layer, that maps from vocabulary to vectors
- 代码的规范
  - https://github.com/moonfansLTH/cs224d/blob/master/assignment2/q2_NER.py
manage experiments
- tf.train.Saver
  - saves graph’s variables in binary files
- tf.train.Saver.save(sess, save_path, global_step=None…)
- self.global_step
  - tf.Variable(10, trainable=False, name=’global_step’)
  - https://www.tensorflow.org/versions/r1.8/api_docs/python/tf/train/global_step?hl=zh-cn
  - https://blog.youkuaiyun.com/leviopku/article/details/78508951
    -save your model
tf.summary: visualize our summary statistics during our training
- tf.summary.scalar ：loss function的取值
- tf.summary.histogram：weights
- tf.summary.image
- Step1：create summaries
- Step2：run them
- Step3：write summary to file
- 在tensor board中查看结果

L6 Intuition Behind Backpropagation as a Computational Graph

BP：https://www.youtube.com/watch?v=u2OeYrlAx_A
http://cs231n.github.io/optimization-2/#grad
计算梯度的对象是Final Loss function
如果上一层的神经元是多个，反向传播时如何处理？
- add up
矩阵求导x`
- https://blog.youkuaiyun.com/dinkwad/article/details/72819832
- 标量对矩阵的求导
- 矩阵对矩阵的求导

学习 CS231n: Convolutional Neural Networks for Visual Recognition.

Backpropagation, Intuitions

3.知乎神贴

白话TensorFlow+实战系列

https://zhuanlan.zhihu.com/p/26454768
常用损失函数
- 分类问题的交叉熵：
  - pred = tf.matmal(x,w)+b
  - loss = tf.nn.softmax_cross_entropy_with_logits(pred,y)
- 回归问题的MSE：
  - loss = tf.reduce_mean(tf.square(pred-y))
学习率
- 指数衰减型：
  - learning_rate = tf.train.exponential_decay(0.1, global_step, 100, 0.98, staircase = True)
- 过拟合
  - L1：tf.contrib.layers.l1_regularizer(lambda)(w)
  - L2：tf.contrib.layers.l2_regularizer(lambda)(w)
example
- 构建了一个[20,10,10,8]的全连接神经网络
- 其中get_weight函数用于获取每一层的权重w并进行L2正则化后放入collection中进行管理。 collection 是个啥玩意儿
- cur_net表示的是当前神经网络层。
- in_net与out_net表示的是相邻的两层，用于构建权重。注意两层之间参数的传递和迭代
- for循环就是构建网络过程。
- mse_loss就是均方差损失函数。
- 最后一并加入collection中，全部相加就得到最后的loss。
滑动平均模型
- tf.train.ExponentialMovingAverage
- 该函数会为每一个变量生成一个影子变量（shadow_variable），影子变量的初始值即为变量的初始值，随后影子变量由该方程进行改变：
- shadow_variable = decay * shadow_variable + (1 - decay) * variable
变量管理
- 基于字典的方法创建变量
  - 用字典的方式，key就是你取的网络层名字，value就是神经网络各层的变量。
  - 使用变量时直接提取dict里的对象
- 变量共享
  - 该方法主要是通过tf.get_variable()与tf.variable_scope()函数来实现。
  - tf.get_variable() 获取变量或创建变量
  - tf.variable_scope() 创建变量的命名空间，当reuse = True时，指明在该管理器中，tf.get_variable()用于获取已经创建的变量；当reuse = False时，指明在该管理器中，tf.get_variable()用于创建变量。
  - - 创建时reuse=False，调用参数时设置resue=True
- 创建变量的方法对比
  - tf.Variable() 与 tf.get_variable()
    - 共同点：都是用于在一个name_scope下获取或创建一个变量的方式
    - 区别：
      - 前者用于创建一个新变量，在同一个name_scope下可以创建相同名字的变量，底层实现会自动引入别名机制，两次调用产生两个不同的变量。
      - 后者用于获取一个变量，且不受name_scope约束，变量的name只包含variable_scope，若这个变量已存在，则自动获取；若不存在，则自动创建一个变量。
      - 如图所示，get_variable创建的变量不包含name_scope属性
  - tf.variable_scope() 与 tf.name_scope()
    - name_scope用于管理一个图里的各种op，返回的是一个以scope_name命名的context manager，一个name_scope下可定义各种op活name_scope，避免各个op之间冲突。
    - variable_scope一般与get_variable()配合使用，用于管理一个graph中变量的名字，避免变量之间的命名冲突，它允许在一个variable_scope下共享变量。
- 变量共享：
  - 在 TensorFlow 中，我们定义一个变量，相当于往 Graph 中添加了一个节点。和普通的 python 函数不一样，在一般的函数中，我们对输入进行处理，然后返回一个结果，而函数里边定义的一些局部变量我们就不管了。但是在 TensorFlow 中，我们在函数里边创建了一个变量，就是往 Graph 中添加了一个节点。出了这个函数后，这个节点还是存在于 Graph 中的。
  - https://blog.youkuaiyun.com/selous/article/details/77095155?locationNum=9&fps=1
  - 同一个项目下共享（在定义后紧接着使用），需添加 scopre.reuse_variables()
- tf.variable, tf.placeholder 的区别
  - https://blog.youkuaiyun.com/shenxiaoming77/article/details/79141078
  - tf.placeholder() 占位符。* trainable==False *
  - tf.Variable() 一般变量用这种方式定义。 * 可以选择 trainable 类型 *
  - tf.get_variable() 一般都是和 tf.variable_scope() 配合使用，从而实现变量共享的功能。 * 可以选择 trainable 类型 *