本文是关于tensorflow中variable scope的介绍,主要参考自官方文档:tensorflow共享变量
在tensorflow建立复杂模型时需要共享很多变量而且可能需要在同一个地方初始化它们,这就需要用到tf.variable_scope() 和tf.get_variable().
首先由一个问题引入。
假设需要在一个只有两层卷积的卷积网络模型里面定义图片的filter,如果只用tf.Variable,代码大致如下:
def my_image_filter(input_images):
conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv1_weights")
conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
conv1 = tf.nn.conv2d(input_images, conv1_weights,
strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(conv1 + conv1_biases)
conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv2_weights")
conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
conv2 = tf.nn.conv2d(relu1, conv2_weights,
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv2 + conv2_biases)
很容易联想比该模型大的多的模型是啥样的。这还是一个比较小的模型就已经有四个不同的变量:conv1_weights, conv1_biases, conv2_weights, conv2_biases。
问题发生在需要重用这个模型的时候。加入你想将该filter应用于两个不同的图像image1和image2,你想让每幅图片都经过带有相同参数的filter进行处理。你可以调用my_image_filter,但是这会造成一个后果:你会生成两组变量,每组四个一共八个变量。如以下代码所示:
# First call creates one set of 4 variables.
result1 = my_image_filter(image1)
# Another set of 4 variables is created in the second call.
result2 = my_image_filter(image2)
防止重复生成变量的一个方法就是在一个单独的地方生成变量同时把它们传递给函数来使用,代码如下:
variables_dict = {
"conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
name="conv1_weights")
"conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
... etc. ...
}
def my_image_filter(input_images, variables_dict):
conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])
conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv2 + variables_dict["conv2_biases"])
# Both calls to my_image_filter() now use the same variables
result1 = my_image_filter(image1, variables_dict)
result2 = my_image_filter(image2, variables_dict)
这样看起来挺方便,但是将变量定义到外面破坏了代码的封装性。
一个用来处理该问题的方法就是用类创建模型,而该类专注于管理需要操作的变量。tensorflow给了一个简易的管理方式,不需要类,利用Variable Scope机制在创建图的时候可以允许共享已命名的变量。
Variable scope的例子
tensorflow中variable scope主要包括两个函数:
tf.get_variable(name, shape, initializer): 创建并返回一个给定name的变量
tf.variable_scope(scope_name):管理传递给tf.get_variable的name的命名空间。
tf.get_variable创建变量的时候使用一个初始化器(initializer)而不像tf.Variable一样直接传给值。初始化器是一个函数,拿到一个shape然后提供一个拥有该shape的tensor。以下是几个例子:
tf.constant_initializer(value) #initializes everything to the provided value,
tf.random_uniform_initializer(a, b) #initializes uniformly from [a, b],
tf.random_normal_initializer(mean, stddev) #initializes from the normal distribution with the given mean and standard deviation.
用tf.get_variable()来重构之前的代码看看tf.get_variable是如何解决之前的问题的。
def conv_relu(input, kernel_shape, bias_shape):
# Create variable named "weights".
weights = tf.get_variable("weights", kernel_shape,
initializer=tf.random_normal_initializer())
# Create variable named "biases".
biases = tf.get_variable("biases", bias_shape,
initializer=tf.constant_initializer(0.0))
conv = tf.nn.conv2d(input, weights,
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv + biases)
该函数利用了段名称weights以及biasses,如果在conv1和conv2中都应用,需要有不同的名字,这里就要用到tf.variable_scope(),它给变量一个命名空间,具体使用代码如下:
def my_image_filter(input_images):
with tf.variable_scope("conv1"):
# Variables created here will be named "conv1/weights", "conv1/biases".
relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
with tf.variable_scope("conv2"):
# Variables created here will be named "conv2/weights", "conv2/biases".
return conv_relu(relu1, [5, 5, 32, 32], [32])
下面看看假如调用my_image_filte两次会发生什么:
result1 = my_image_filter(image1)
result2 = my_image_filter(image2)
# Raises ValueError(... conv1/weights already exists ...)
这里tf.get_variable会检查变量是否被重复创建,如果你想要共享变量需要按如下设置reuse_variables():
with tf.variable_scope("image_filters") as scope:
result1 = my_image_filter(image1)
scope.reuse_variables()
result2 = my_image_filter(image2)
这时一个轻量级并且安全的方式来共享变量。
Variable Scope是如何工作的?
get_variable经常按如下方式引用:
v = tf.get_variable(name, shape, dtype, initializer)
该调用根据scope来决定做以下两件事中的某一件:
当tf.get_variable_scope().reuse == False时,v是一个被新创建的tf.Variable,shape和datatype都是提供的那个。该创建的变量的全名为当前的variable scope name+提供的name,同时会有一个核查的操作来保证没有变量和当前这个全名一样。如果已存在含有该全名的变量,则函数会引出一个ValueError。如果变量被创建,则初始化为initializer(shape)的值。例如:
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
assert v.name == "foo/v:0"
当scope被设置可以重用变量,即tf.get_variable_scope().reuse == True,则调用会寻找一个已经存在的变量名和当前scope+name的名一致的变量,如果找不到,则引出一个ValueError,如果找到了就返回该变量,代码示意如下:
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
v1 = tf.get_variable("v", [1])
assert v1 is v
tf.variable_scope()基础
variable scope最基本的功能就是携带一个name作为变量名的前缀并且有一个reuse-flag来区分以上两种情况。嵌套的variable scopes将它们的名字类似与目录那样添加上去:
with tf.variable_scope("foo"):
with tf.variable_scope("bar"):
v = tf.get_variable("v", [1])
assert v.name == "foo/bar/v:0"
当前的 variable scope 可以利用tf.get_variable_scope()取得,同时reuse 标志可以调用tf.get_variable_scope().reuse_variables()设置为True。
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
tf.get_variable_scope().reuse_variables()
v1 = tf.get_variable("v", [1])
assert v1 is v
注意你不能把reuse标志设置为False,不然会造成不必要困扰。
虽然你不能显式地设置reuse为False,但你可以进入一个reusing的variable scope然后退出它,回到一个不能reusing的scope上。可以在开始一个scope时通过reuse=True来实现。reuse参数时继承的,所有子scope会继承上一级scope的reuse参数。示例如下:
with tf.variable_scope("root"):
# At start, the scope is not reusing.
assert tf.get_variable_scope().reuse == False
with tf.variable_scope("foo"):
# Opened a sub-scope, still not reusing.
assert tf.get_variable_scope().reuse == False
with tf.variable_scope("foo", reuse=True):
# Explicitly opened a reusing scope.
assert tf.get_variable_scope().reuse == True
with tf.variable_scope("bar"):
# Now sub-scope inherits the reuse flag.
assert tf.get_variable_scope().reuse == True
# Exited the reusing scope, back to a non-reusing one.
assert tf.get_variable_scope().reuse == False
捕获 variable scope
在一些复杂的例子中,可以传递VariableScope 对象而不时使用名字。
with tf.variable_scope("foo") as foo_scope:
v = tf.get_variable("v", [1])
with tf.variable_scope(foo_scope):
w = tf.get_variable("w", [1])
with tf.variable_scope(foo_scope, reuse=True):
v1 = tf.get_variable("v", [1])
w1 = tf.get_variable("w", [1])
assert v1 is v
assert w1 is w
在嵌套的scope内用scope对象打开一个新的scope,它的名字时独立的,并不会改变:
with tf.variable_scope("foo") as foo_scope:
assert foo_scope.name == "foo"
with tf.variable_scope("bar"):
with tf.variable_scope("baz") as other_scope:
assert other_scope.name == "bar/baz"
with tf.variable_scope(foo_scope) as foo_scope2:
assert foo_scope2.name == "foo" # Not changed.
variable scope中的initializer
variable scope可以设置一个默认的initializer,它以及它的子scope里面的每一个tf.get_variable()调用都会继承该initializer,但在get_variable()里显式声明的initializer会重写之前默认的initializer:
with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
v = tf.get_variable("v", [1])
assert v.eval() == 0.4 # Default initializer as set above.
w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
assert w.eval() == 0.3 # Specific initializer overrides the default.
with tf.variable_scope("bar"):
v = tf.get_variable("v", [1])
assert v.eval() == 0.4 # Inherited default initializer.
with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
v = tf.get_variable("v", [1])
assert v.eval() == 0.2 # Changed default initializer.
tf.variable_scope()中ops的名字
执行 with tf.variable_scope(“name”)的时候相当于隐式地打开了tf.name_scope(“name”),所以在改变variable名字的时候也会改变ops的名字:
with tf.variable_scope("foo"):
x = 1.0 + tf.get_variable("v", [1])
assert x.op.name == "foo/add"
但是在variable scope外额外加的name scope只会改变ops的名字:
with tf.variable_scope("foo"):
with tf.name_scope("bar"):
v = tf.get_variable("v", [1])
x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"
利用获取的对象打开variable scope时不会改变当前scope ops的名字。