理解tensorflow变量空间scope

最新推荐文章于 2025-06-02 09:05:41 发布

fengkuangshine

最新推荐文章于 2025-06-02 09:05:41 发布

阅读量2.1k

点赞数 8

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/qq_40317897/article/details/81116950

在TensorFlow中编程时，遇到`Variable`已存在和RNNCell重用的错误。本文深入探讨`get_variable()`和`variable_scope()`的区别，以及`name_scope`与`variable_scope`在变量命名和共享中的作用。`variable_scope`主要用于变量共享，配合`tf.get_variable()`避免重复创建。而`name_scope`则主要管理命名空间，使图结构更清晰。在解决LSTM中变量重用问题时，正确使用`variable_scope`是关键。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题出现在编写lstm程序时碰到两个问题始终得不到解决：

Variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights already exists, disallowed.

Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x000002206E714240> with a different variable scope than its first use.

其实知道这两个问题和变量的作用空间相关，但是对这个name_scope 和 variable_scope不是很理解，这两日恶补相关知识。

期间看了百度谷歌和csdn博主的答案，很多文章都没有讲到点子上，原理没搞懂，换个问题ERROR还是存在。不过也基本上知道是 get_variable() 和 variable_scope() 出了问题。

get_variable() 和 variable_scope()

要理解 name_scope 和 variable_scope，首先必须明确二者的使用目的。我们都知道，和普通模型相比，神经网络的节点非常多，节点节点之间的连接（权值矩阵）也非常多。所以我们费尽心思，准备搭建一个网络，然后有了图1的网络，WTF! 因为变量太多，我们构造完网络之后，一看，什么鬼，这个变量到底是哪层的？？


fig1. 引入命名空间之前	fig2. 引入命名空间之后

为了解决这个问题，我们引入了 name_scope 和 variable_scope，二者又分别承担着不同的责任：

* name_scope: * 为了更好地管理变量的命名空间而提出的。比如在 tensorboard 中，因为引入了 name_scope，我们的 Graph 看起来才井然有序。
* variable_scope: * 大大大部分情况下，跟 tf.get_variable() 配合使用，实现变量共享的功能。

tf.Variable和tf.get_variable有什么区别呢？name_scope和variable_scope又有什么区别呢？下面细细说来：

首先明确tf.Variable和tf.get_variable都是获取变量的操作，其中tf.Variable的详细介绍可见这里，tf.get_variable的在这里，简要来说：

tf.Variable的name选项是选填项，可以通过name='v'的形式给出，而

tf.get_variable的name项是一个必填项，tf.get_variable会根据name去试图创建一个变量，如果在同一个scope下存在同名的变量，则程序会报错。具体例子有下面程序给出（在所有操作之前先 import tensorflow as tf）：

import tensorflow as tf
# 在名字为foo的scope下创建名字为v的变量
with tf.variable_scope("foo")
    v = tf.get_variable("v", [1], initializer = tf.constant_initializer(1.0))
    print(v.name)  # 输出foo/v:0表明这空间foo下的名为v的变量第一次输出

# 因为在foo空间内存在name = "v"的变量，则以下代码会出错
with tf.variable_scope("foo")
    v = tf.get_variable("v", [1])

# 但是。如果我们想迭代调用foo下的v变量我们可以将v变量的reuse声明为True这样
# tf.get_variable函数就可以调用已经创建好了的name为v的变量了
with tf.variable_scope("foo"， reuse = True)
    v = tf.get_variable("v", [1])

# 我们可以验证下结果,结果输出为True，表明v和v1是指向的同一个变量
with tf.variable_scope("foo", reuse = True)
    v = tf.get_variable("v", [1], initializer = tf.constant_initializer(1.0))

with tf.variable_scope("foo")
    v1 = tf.get_variable("v", [1])

print(v == v1)

此外，tf.variable_scope是可以嵌套的，当发生嵌套时，如果内层的reuse未指定则与外层的reuse保持一致，如果内层指定了reuse的取值，则内层的reuse根据取值而定。可以通过tf.get_variable_scope().reuse来获取当前的reuse的值：

import tensorflow as tf
with tf.variable_scope("root")
    print(tf.get_variable_scope().reuse)  # 输出False
    with tf.variable_scope("foo", reuse = True)
        print(tf.get_variable_scope().reuse)  # 输出True
        with tf.variable_scope("leaf", reuse = True)
            print(tf.get_variable_scope().reuse)  # 输出True
        print(tf.get_variable_scope().reuse)  # 输出True
    print(tf.get_variable_scope().reuse)  # 输出False

tf.Variable的原理也是一样

import tensorflow as tf

v2 = tf.Variable([1,2], dtype=tf.float32)
print (v2.name)
v2 = tf.Variable([1,2], dtype=tf.float32, name='V')
print (v2.name)
v2 = tf.Variable([1,2], dtype=tf.float32, name='V')
print (v2.name)
print (type(v2))
print (v2)

输出：

Variable:0  # 对于未命名的变量v
V:0  # 对于第一次输出name = v的变量
V_1:0  # 对于第二次输出name = v的变量
<class 'tensorflow.python.ops.variables.Variable'>
Tensor("V_1/read:0", shape=(2,), dtype=float32)

从上面的实验结果来看，这两种方式所定义的变量具有相同的类型。而且只有 tf.get_variable() 创建的变量之间会发生命名冲突。在实际使用中，创建变量方式的用途也是分工非常明确的。其中

tf.Variable() 一般变量用这种方式定义。 * 可以选择 trainable 类型 *
tf.get_variable() 一般都是和 tf.variable_scope() 配合使用，从而实现变量共享的功能。 * 可以选择 trainable 类型 *

tf.trainable_variables()函数就不细讲了，就是我们定义的所有的 trainable=True 的所有变量以一个list的形式返回。

那么name_scope和variable_scope有什么区别呢？

import tensorflow as tf
with tf.name_scope('nsc1'):
    v1 = tf.Variable([1], name='v1')
    with tf.variable_scope('vsc1'):
        v2 = tf.Variable([1], name='v2')
        v3 = tf.get_variable(name='v3', shape=[])
print ('v1.name: ', v1.name)
print ('v2.name: ', v2.name)
print ('v3.name: ', v3.name)

结果：

v1.name:  nsc1/v1:0
v2.name:  nsc1/vsc1/v2:0
v3.name:  vsc1/v3:0

从上面的例子可以看出：

1. name_scope并不会对tf.get_variable函数创建的变量进行操作

2. name_scope对tf.Variable创建的变量指定变量空间

3. tf.name_scope() 主要是用来管理命名空间的，这样子让我们的整个模型更加有条理。而 tf.variable_scope() 的作用是为了实现变量共享，它和 tf.get_variable() 来完成变量共享的功能。

这样，在构建网络的过程中我们可以使用两种方法创建变量：

第一种：使用tf.Variable

import tensorflow as tf
sess = tf.Session(config=config)

# 拿官方的例子改动一下
def my_image_filter():
    conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
        name="conv1_weights")
    conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
    conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
        name="conv2_weights")
    conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
    return None

# First call creates one set of 4 variables.
result1 = my_image_filter()
# Another set of 4 variables is created in the second call.
result2 = my_image_filter()
# 获取所有的可训练变量
vs = tf.trainable_variables()
print ('There are %d train_able_variables in the Graph: ' % len(vs))
for v in vs:
    print (v)

输出：

There are 8 train_able_variables in the Graph: 
<tf.Variable 'conv1_weights:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'conv1_biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'conv2_weights:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'conv2_biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'conv1_weights_1:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'conv1_biases_1:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'conv2_weights_1:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'conv2_biases_1:0' shape=(32,) dtype=float32_ref>

第二种：使用 tf.get_variable()

import tensorflow as tf
# 设置GPU按需增长
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

# 下面是定义一个卷积层的通用方式
def conv_relu(kernel_shape, bias_shape):
    # Create variable named "weights".
    weights = tf.get_variable("weights", kernel_shape, initializer=tf.random_normal_initializer())
    # Create variable named "biases".
    biases = tf.get_variable("biases", bias_shape, initializer=tf.constant_initializer(0.0))
    return None


def my_image_filter():
    # 按照下面的方式定义卷积层，非常直观，而且富有层次感
    with tf.variable_scope("conv1"):
        # Variables created here will be named "conv1/weights", "conv1/biases".
        relu1 = conv_relu([5, 5, 32, 32], [32])
    with tf.variable_scope("conv2"):
        # Variables created here will be named "conv2/weights", "conv2/biases".
        return conv_relu( [5, 5, 32, 32], [32])


with tf.variable_scope("image_filters") as scope:
    # 下面我们两次调用 my_image_filter 函数，但是由于引入了 变量共享机制
    # 可以看到我们只是创建了一遍网络结构。
    result1 = my_image_filter()
    scope.reuse_variables()
    result2 = my_image_filter()


# 看看下面，完美地实现了变量共享！！！
vs = tf.trainable_variables()
print ('There are %d train_able_variables in the Graph: ' % len(vs))
for v in vs:
    print (v)

结果：

There are 4 train_able_variables in the Graph: 
<tf.Variable 'image_filters/conv1/weights:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'image_filters/conv1/biases:0' shape=(32,) dtype=float32_ref>
<tf.Variable 'image_filters/conv2/weights:0' shape=(5, 5, 32, 32) dtype=float32_ref>
<tf.Variable 'image_filters/conv2/biases:0' shape=(32,) dtype=float32_ref>

首先我们要确立一种 Graph 的思想。在 TensorFlow 中，我们定义一个变量，相当于往 Graph 中添加了一个节点。和普通的 python 函数不一样，在一般的函数中，我们对输入进行处理，然后返回一个结果，而函数里边定义的一些局部变量我们就不管了。但是在 TensorFlow 中，我们在函数里边创建了一个变量，就是往 Graph 中添加了一个节点。出了这个函数后，这个节点还是存在于 Graph 中的。

回到最开始的问题

Variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights already exists, disallowed.

Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x000002206E714240> with a different variable scope than its first use.

这两个问题还没解决

对于tf.contrib.rnn.MultiRNNCell如下不会报错

def lstm():
    lstm_fw_cell=tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias = 1.0,
                                    state_is_tuple=True,
                                    reuse=tf.get_variable_scope().reuse)
    return lstm_fw_cell

with tf.variable_scope(None, default_name="Rnn"):    
    cell = tf.contrib.rnn.MultiRNNCell([lstm() for _ in range(NUM_LAYERS)])
    output, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32)

对于tf.contrib.rnn.static_bidirectional_rnn如下书写不会报错

with tf.variable_scope(None, default_name="bidirectional-rnn"):
    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias = 1.0,
                                            state_is_tuple=True,
                                            reuse=tf.get_variable_scope().reuse )
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias = 1.0,
                                            state_is_tuple=True,
                                            reuse=tf.get_variable_scope().reuse )


    outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell,
                                                        lstm_bw_cell, x,
                                                        dtype = tf.float32)