Eager Execution

本文介绍了TensorFlow 2.0中Eager Execution的特性,如直观接口、快速迭代、调试方便和动态控制流。它演示了如何在Python环境中轻松使用TensorFlow,计算梯度,训练模型,并展示了变量和优化器的使用,以及如何结合TensorBoard进行模型可视化。

参考 Tensorflow学习——Eager Execution - 云+社区 - 腾讯云

TensorFlow's eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.

Eager execution is a flexible machine learning platform for research and experimentation, providing:

  • An intuitive interface—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data.
  • Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
  • Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

Eager execution supports most TensorFlow operations and GPU acceleration.

Note: Some models may experience increased overhead with eager execution enabled. Performance improvements are ongoing, but please file a bug if you find a problem and share your benchmarks.

Setup and basic usage

from __future__ import absolute_import, division, print_function, unicode_literals
import os

import tensorflow as tf

import cProfile

In Tensorflow 2.0, eager execution is enabled by default.

tf.executing_eagerly()
True

Now you can run TensorFlow operations and the results will return immediately:

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))

Output:
--------------
hello, [[4.]]
--------------

Enabling eager execution changes how TensorFlow operations behave—now they immediately evaluate and return their values to Python. tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Since there isn't a computational graph to build and run later in a session, it's easy to inspect results using print() or a debugger. Evaluating, printing, and checking tensor values does not break the flow for computing gradients.

Eager execution works nicely with NumPy. NumPy operations accept tf.Tensor arguments. TensorFlow math operations convert Python objects and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy method returns the object's value as a NumPy ndarray.

a = tf.constant([[1, 2],
                 [3, 4]])
print(a)


Output:
-------------------------------------
tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)
-------------------------------------

# Broadcasting support
b = tf.add(a, 1)
print(b)

Output:
-------------------------------------
tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)
-------------------------------------

# Operator overloading is supported
print(a * b)

Output:
--------------------------------------
tf.Tensor(
[[ 2  6]
 [12 20]], shape=(2, 2), dtype=int32)
--------------------------------------

# Use NumPy values
import numpy as np

c = np.multiply(a, b)
print(c)

Output:
----------
[[ 2  6]
 [12 20]]
----------

# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
#     [3 4]]

Output:
--------
[[1 2]
 [3 4]]
--------

Dynamic control flow

A major benefit of eager execution is that all the functionality of the host language is available while your model is executing. So, for example, it is easy to write fizzbuzz:

def fizzbuzz(max_num):
  counter = tf.constant(0)
  max_num = tf.convert_to_tensor(max_num)
  for num in range(1, max_num.numpy()+1):
    num = tf.constant(num)
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num.numpy())
    counter += 1

fizzbuzz(15)


Output:
-------
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
---------

This has conditionals that depend on tensor values and it prints these values at runtime.

Eager training

Computing gradients

Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tf.GradientTape to trace operations for computing gradients later.

You can use tf.GradientTape to train and/or compute gradients in eager. It is especially useful for complicated training loops.

Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular tf.GradientTape can only compute one gradient; subsequent calls throw a runtime error.

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

Output:
-----------------------------------------------
tf.Tensor([[2.]], shape=(1, 1), dtype=float32)
-----------------------------------------------
 

Train a model

The following example creates a multi-layer model that classifies the standard MNIST handwritten digits. It demonstrates the optimizer and layer APIs to build trainable graphs in an eager execution environment.

# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()

dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),
   tf.cast(mnist_labels,tf.int64)))
dataset = dataset.shuffle(1000).batch(32)

Output:
--------------------------------------------------------------------------------------------
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
--------------------------------------------------------------------------------------------

# Build the model
mnist_model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16,[3,3], activation='relu',
                         input_shape=(None, None, 1)),
  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10)
])

Even without training, call the model and inspect the output in eager execution:

for images,labels in dataset.take(1):
  print("Logits: ", mnist_model(images[0:1]).numpy())

Output:
-----------------------------------------------------------------------------------
Logits:  [[ 0.08425338  0.05135306 -0.06030881 -0.01655817 -0.01808648  0.03281952
  -0.00409645  0.04448885 -0.05569661  0.00947583]]
-----------------------------------------------------------------------------------

While keras models have a builtin training loop (using the fit method), sometimes you need more customization. Here's an example, of a training loop implemented with eager:

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_history = []

Note: Use the assert functions in tf.debugging to check if a condition holds up. This works in eager and graph execution.

def train_step(images, labels):
  with tf.GradientTape() as tape:
    logits = mnist_model(images, training=True)
    
    # Add asserts to check the shape of the output.
    tf.debugging.assert_equal(logits.shape, (32, 10))
    
    loss_value = loss_object(labels, logits)

  loss_history.append(loss_value.numpy().mean())
  grads = tape.gradient(loss_value, mnist_model.trainable_variables)
  optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
      train_step(images, labels)
    print ('Epoch {} finished'.format(epoch))

train(epochs = 3)

Output:
-----------------
Epoch 0 finished
Epoch 1 finished
Epoch 2 finished
-----------------

import matplotlib.pyplot as plt

plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')

Output:
-------------------------------
Text(0, 0.5, 'Loss [entropy]')
-------------------------------
 

Variables and optimizers

tf.Variable objects store mutable tf.Tensor-like values accessed during training to make automatic differentiation easier.

The collections of variables can be encapsulated into layers or models, along with methods that operate on them. See Custom Keras layers and models for details. The main difference between layers and models is that models add methods like Model.fit, Model.evaluate, and Model.save.

For example, the automatic differentiation example above can be rewritten:

class Linear(tf.keras.Model):
  def __init__(self):
    super(Linear, self).__init__()
    self.W = tf.Variable(5., name='weight')
    self.B = tf.Variable(10., name='bias')
  def call(self, inputs):
    return inputs * self.W + self.B
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random.normal([NUM_EXAMPLES])
noise = tf.random.normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])

Next:

  1. Create the model.
  2. The Derivatives of a loss function with respect to model parameters.
  3. A strategy for updating the variables based on the derivatives.
model = Linear()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

steps = 300
for i in range(steps):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

Output:
---------------------------
Initial loss: 68.503
Loss at step 000: 65.829
Loss at step 020: 29.887
Loss at step 040: 13.870
Loss at step 060: 6.732
Loss at step 080: 3.551
Loss at step 100: 2.133
Loss at step 120: 1.502
Loss at step 140: 1.220
Loss at step 160: 1.095
Loss at step 180: 1.039
Loss at step 200: 1.014
Loss at step 220: 1.003
Loss at step 240: 0.998
Loss at step 260: 0.996
Loss at step 280: 0.995
---------------------------

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

Output:
------------------
Final loss: 0.994
------------------
 
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Output:
---------------------------------------------
W = 3.002486228942871, B = 2.050537347793579
---------------------------------------------
 

Note: Variables persist until the last reference to the python object is removed, and is the variable is deleted.

Object-based saving

A tf.keras.Model includes a covienient save_weights method allowing you to easily create a checkpoint:

model.save_weights('weights')
status = model.load_weights('weights')

Using tf.train.Checkpoint you can take full control over this process.

This section is an abbreviated version of the guide to training checkpoints.

x = tf.Variable(10.)
checkpoint = tf.train.Checkpoint(x=x)
x.assign(2.)   # Assign a new value to the variables and save.
checkpoint_path = './ckpt/'
checkpoint.save('./ckpt/')

Output:
------------
'./ckpt/-1'
------------
 
x.assign(11.)  # Change the variable after saving.

# Restore values from the checkpoint
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path))

print(x)  # => 2.0

Output:
-------------------------------------------------------------
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>
-------------------------------------------------------------
 

To save and load models, tf.train.Checkpoint stores the internal state of objects, without requiring hidden variables. To record the state of a model, an optimizer, and a global step, pass them to a tf.train.Checkpoint:

model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
checkpoint_dir = 'path/to/model_dir'
if not os.path.exists(checkpoint_dir):
  os.makedirs(checkpoint_dir)
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
root = tf.train.Checkpoint(optimizer=optimizer,
                           model=model)

root.save(checkpoint_prefix)
root.restore(tf.train.latest_checkpoint(checkpoint_dir))

Output:
----------------------------------------------------------------------------------
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fe1dc0e59b0>
----------------------------------------------------------------------------------
 

Note: In many training loops, variables are created after tf.train.Checkpoint.restore is called. These variables will be restored as soon as they are created, and assertions are available to ensure that a checkpoint has been fully loaded. See the guide to training checkpoints for details.

Object-oriented metrics

tf.keras.metrics are stored as objects. Update a metric by passing the new data to the callable, and retrieve the result using the tf.keras.metrics.result method, for example:

m = tf.keras.metrics.Mean("loss")
m(0)
m(5)
m.result()  # => 2.5
m([8, 9])
m.result()  # => 5.5

Output:
-----------------------------------------------------------
<tf.Tensor: id=669732, shape=(), dtype=float32, numpy=5.5>
-----------------------------------------------------------
 

Summaries and TensorBoard

TensorBoard is a visualization tool for understanding, debugging and optimizing the model training process. It uses summary events that are written while executing the program.

You can use tf.summary to record summaries of variable in eager execution. For example, to record summaries of loss once every 100 training steps:

logdir = "./tb/"
writer = tf.summary.create_file_writer(logdir)

steps = 1000
with writer.as_default():  # or call writer.set_as_default() before the loop.
  for i in range(steps):
    step = i + 1
    # Calculate loss with your real train function.
    loss = 1 - 0.001 * step
    if step % 100 == 0:
      tf.summary.scalar('loss', loss, step=step)


!ls tb/


Output:
---------------------------------------------------------------------------------
events.out.tfevents.1573608300.kokoro-gcp-ubuntu-prod-1330328282.25967.669737.v2
---------------------------------------------------------------------------------

Advanced automatic differentiation topics

Dynamic models

tf.GradientTape can also be used in dynamic models. This example for a backtracking line search algorithm looks like normal NumPy code, except there are gradients and is differentiable, despite the complex control flow:

def line_search_step(fn, init_x, rate=1.0):
  with tf.GradientTape() as tape:
    # Variables are automatically tracked.
    # But to calculate a gradient from a tensor, you must `watch` it.
    tape.watch(init_x)
    value = fn(init_x)
  grad = tape.gradient(value, init_x)
  grad_norm = tf.reduce_sum(grad * grad)
  init_value = value
  while value > init_value - rate * grad_norm:
    x = init_x - rate * grad
    value = fn(x)
    rate /= 2.0
  return x, value

Custom gradients

Custom gradients are an easy way to override gradients. Within the forward function, define the gradient with respect to the inputs, outputs, or intermediate results. For example, here's an easy way to clip the norm of the gradients in the backward pass:

@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
  y = tf.identity(x)
  def grad_fn(dresult):
    return [tf.clip_by_norm(dresult, norm), None]
  return y, grad_fn

Custom gradients are commonly used to provide a numerically stable gradient for a sequence of operations:

def log1pexp(x):
  return tf.math.log(1 + tf.exp(x))

def grad_log1pexp(x):
  with tf.GradientTape() as tape:
    tape.watch(x)
    value = log1pexp(x)
  return tape.gradient(value, x)



# The gradient computation works fine at x = 0.
grad_log1pexp(tf.constant(0.)).numpy()

Output:
----
0.5
----


# However, x = 100 fails because of numerical instability.
grad_log1pexp(tf.constant(100.)).numpy()
Output:
----
nan
----

Here, the log1pexp function can be analytically simplified with a custom gradient. The implementation below reuses the value for tf.exp(x) that is computed during the forward pass—making it more efficient by eliminating redundant calculations:

@tf.custom_gradient
def log1pexp(x):
  e = tf.exp(x)
  def grad(dy):
    return dy * (1 - 1 / (1 + e))
  return tf.math.log(1 + e), grad

def grad_log1pexp(x):
  with tf.GradientTape() as tape:
    tape.watch(x)
    value = log1pexp(x)
  return tape.gradient(value, x)


# As before, the gradient computation works fine at x = 0.
grad_log1pexp(tf.constant(0.)).numpy()

Output:
----
0.5
----



# And the gradient computation also works at x = 100.
grad_log1pexp(tf.constant(100.)).numpy()

Output:
----
1.0
----

Performance

Computation is automatically offloaded to GPUs during eager execution. If you want control over where a computation runs you can enclose it in a tf.device('/gpu:0') block (or the CPU equivalent):

import time

def measure(x, steps):
  # TensorFlow initializes a GPU the first time it's used, exclude from timing.
  tf.matmul(x, x)
  start = time.time()
  for i in range(steps):
    x = tf.matmul(x, x)
  # tf.matmul can return before completing the matrix multiplication
  # (e.g., can return after enqueing the operation on a CUDA stream).
  # The x.numpy() call below will ensure that all enqueued operations
  # have completed (and will also copy the result to host memory,
  # so we're including a little more than just the matmul operation
  # time).
  _ = x.numpy()
  end = time.time()
  return end - start

shape = (1000, 1000)
steps = 200
print("Time to multiply a {} matrix by itself {} times:".format(shape, steps))

# Run on CPU:
with tf.device("/cpu:0"):
  print("CPU: {} secs".format(measure(tf.random.normal(shape), steps)))

# Run on GPU, if available:
if tf.config.experimental.list_physical_devices("GPU"):
  with tf.device("/gpu:0"):
    print("GPU: {} secs".format(measure(tf.random.normal(shape), steps)))
else:
  print("GPU: not found")


Output:
------------------------------------------------------------
Time to multiply a (1000, 1000) matrix by itself 200 times:
CPU: 1.1374788284301758 secs
GPU: 0.03955197334289551 secs
------------------------------------------------------------

A tf.Tensor object can be copied to a different device to execute its operations:

if tf.config.experimental.list_physical_devices("GPU"):
  x = tf.random.normal([10, 10])

  x_gpu0 = x.gpu()
  x_cpu = x.cpu()

  _ = tf.matmul(x_cpu, x_cpu)    # Runs on CPU
  _ = tf.matmul(x_gpu0, x_gpu0)  # Runs on GPU:0


Output:
-------------------------------------------------------------------------------------------
WARNING:tensorflow:From <ipython-input-43-876293b5769c>:4: _EagerTensorBase.gpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.identity instead.
WARNING:tensorflow:From <ipython-input-43-876293b5769c>:5: _EagerTensorBase.cpu (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.identity instead.
-------------------------------------------------------------------------------------------

Benchmarks

For compute-heavy models, such as ResNet50 training on a GPU, eager execution performance is comparable to tf.function execution. But this gap grows larger for models with less computation and there is work to be done for optimizing hot code paths for models with lots of small operations.

Work with functions

While eager execution makes development and debugging more interactive, TensorFlow 1.x style graph execution has advantages for distributed training, performance optimizations, and production deployment. To bridge this gap, TensorFlow 2.0 introduces functions via the tf.function API. For more information, see the tf.function guide.

回溯线搜索(backtracking line search)是一种用于优化算法中的搜索技术。其基本思想是在搜索方向上进行一系列的试探,以找到一个满足一定条件的可接受步长。具体来说,假设我们在某一点 $x$ 处进行优化,搜索方向为 $d$,则回溯线搜索的过程如下: 1. 选择一个初始步长 $\alpha_0>0$,一般可以选择较小的值,比如 $\alpha_0=1$ 或 $\alpha_0=0.1$; 2. 在每一次迭代中,计算 $f(x+\alpha_k d)$,其中 $f$ 是待优化的目标函数,$k$ 表示当前的迭代次数; 3. 如果 $f(x+\alpha_k d) \leq f(x) + c_1 \alpha_k \nabla f(x)^T d$,其中 $c_1 \in (0,1)$ 是一个预回溯直线搜索(backtracking line search)是一种用于优化算法的技术,特别是用于求解无约束非线性优化问题。在每次迭代中,回溯直线搜索算法会尝试在当前搜索方向上找到一个满足一定条件的步长,使得在该步长下目标函数值可以得到显著的改善。 具体来说,回溯直线搜索算法在每次迭代中按照当前搜索方向移动一定步长,然后检查目标函数是否得到了改善。如果目标函数得到了改善,则接受该步长并继续迭代;否则,将步长缩小一定比例(通常是折半),并重复该过程,直到找到一个满足条件的步长。 回溯直线搜索算法的优点是可以在无约束优化问题中进行全局搜索,因为它可以通过改变搜索方向和步长来遍历整个搜索空间。然而,回溯直线搜索算法的缺点是它可能需要进行大量的迭代才能找到最优解,因此在实际应用中,通常需要与其他优化算法结合使用,以便在更短的时间内找到最优解。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值