在Tensorflow中,每次sess.run(self.optimizer)时,都会同时计算梯度并且更新变量。 但是在pytorch中,可以通过三个步骤实现:
- 梯度清零: optimizer.zero_grad()
- 反向传播计算每个参数的梯度 loss.backward()
- 梯度下降并更新参数 optimizer.step()
如果想实现多次计算梯度后,再统一更新一次梯度(由minibatch实现大的batchsize). 或者 (长序列插帧内部迭代完整个序列后再传播一次梯度)。 就需要用到分开计算梯度和更新参数了。 在pytorch中很容易实现,但是在tensorflow中需要自己写计算梯度和更新参数的过程。
import tensorflow as tf
import numpy as np
import os
os.environment['CUDA_VISIBLE_DEVICES']='0'
x_data= np.array(range(1,20))
num_dataset = len(x_data)
batchsize= 4
minibatch_size = 2
with tf.graph().as_default():
x = tf.placeholder(dype='float32'. shape = None)
w = tf.Variable(initial_value=4., dtype='float32')
loss = w * w * x
# Optimizer definition - nothing different from any classical example
opt = tf.train.GradientDescentOptimizer(0.1)
# Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
# Creation of a list of variables with the same shape as the trainable ones
# initialized with zeros
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
# Calls the compute_gradients function of the optimizer to obtain the list of gradients
gvs = opt.compute_gradients(loss, tvs)
# Adds to each element from the list you initialized earlier with zeros its gradient
# (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
# Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_count in range(batch_size):
# 在run每个batch, 需先将前一个batch所得的累积梯度清零
sess.run(zero_ops)
batch_data = x_data[batch_count*batch_size: (batch_count+1)*batch_size]
# Accumulate the gradients 'minibatch_size' times in accum_vars using accum_ops
for minibatch_count in range(minibatch_size):
minibatch_data = batch_data[minibatch_count*minibatch_size: (minibatch_count+1)*minibatch_size]
accum_array = sess.run(accum_ops, feed_dict={x: minibatch_data})
print("[%d][%d]" % (batch_count, minibatch_count), accum_array)
print(sess.run(tvs))
# Run the train_step ops to update the weights based on your accumulated gradients
sess.run(train_step)
————————————————
版权声明:本文为优快云博主「dekiang」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_41560402/article/details/106930463