tensorflow设置layer的执行位置（device）

最新推荐文章于 2025-05-19 21:48:45 发布

武小胖儿

最新推荐文章于 2025-05-19 21:48:45 发布

阅读量1.5k

点赞数 2

CC 4.0 BY-SA版权

分类专栏： tensorflow model parallel

本文链接：https://blog.youkuaiyun.com/cleanarea/article/details/90692135

tensorflow 同时被 3 个专栏收录

7 篇文章

订阅专栏

model

1 篇文章

订阅专栏

parallel

1 篇文章

订阅专栏

1.tensoflow默认的执行位置

https://tensorflow.google.cn/guide/using_gpu

1.1 设备选择的优先级

If a TensorFlow operation has both CPU and GPU implementations, the GPU devices will be given priority when the operation is assigned to a device. For example, matmul has both CPU and GPU kernels. On a system with devices cpu:0 and gpu:0, gpu:0 will be selected to run matmul。==默认优先选择GPU，那么是否可以强行改变执行位置？

1.2 查看执行位置

查看方法：

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出结果：

conv1/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv1/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
Shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/weights/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/shape: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/min: (Const): /job:localhost/replica:0/task:0/device:CPU:0
conv2/biases/Initializer/random_uniform/max: (Const): /job:localhost/replica:0/task:0/device:CPU:0

结论：

CPU版本的tf，默认把所有参数和操作放在了CPU:0.

1.3 指定variable的放置位置，查看op的执行位置。

验证方法：

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

输出结果：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

分析：

You will see that now a and b are assigned to cpu:0. Since a device was not explicitly specified for the MatMul operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0 in this example) and automatically copy tensors between devices if required.

不指定op位置时，tf会将op部署在默认device上。如果此时，op和variable不在同一个device上，这些variable会被复制到op所在的device上完成运算。

1.4 variable定义位置指定的device和assign指定的位置不一致

会抛出一下异常：

InvalidArgumentError (see above for traceback): Cannot assign a device for operation conv1_2/Assign_3/value: node conv1_2/Assign_3/value (defined at ./CP_Alexnet/alexnet.py:229) was explicitly assigned to /device:CPU:1 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.

解释：

varaible会创建到它指定的device上，赋值时也应该在该device上查找；如果赋值时，不指定查找的device，tf就会到默认device下找该variable。如果创建指定的device和tf默认device不一致，就会出现上述错误。

1.5 CPU和GPU能否混合使用

能，具体方法如下：

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum)

运行结果：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

1.6 load SaveModel中的variable有怎样的特征

save_model生成的variable文件，保留了variable的device信息。因为我们需要动态调整varaible的device，因此所有model的layer必须在线生成，不能直接加载保存好的SaveModel。

小节：

1）variable定义和赋值指定的device必须一致，否则会报错。

2）variable和op的device，尽可能一致，否则会带来额外的内存复制开销。

3）耐心是美德。

2. 把model的layer放到特定device上执行。

core：op和variable定义在同一个device上，variable赋值时要引用该位置。

method：

config = tf.ConfigProto(device_count = {"CPU":3,"GPU":2}) #常见可用的device list
with tf.Session(config = config) as sess: #将上述device list添加到该session中
  with tf.device(device_name):#指定特定device
    #1.create model
    model = Data_Parall_AlexNet(param_list)
    #2. define the model's structure
    model.create()
    #3. intialize all all parameters
    sess.run(tf.global_variables_initializer())
    #4. assign value to all params
    sess.run(model.load_initial_weights())

analyze：

1）tf.ConfigProto的属性device_count可以控制Session可用的device list，经过上述代码的配置，该session中可用的device为

cpu:0,cpu:1,cpu:2,gpu:0,gpu:1。其中，device_count采用<string, int32>的方式来定义可用设备，string = “CPU” or "GPU"，int32定义了设备数目。注意：这里是数目，不是名称；只有tf.device才能指定名称。

2)session为每个variable和ops创建了执行环境，当sess的代码块退出后，该sess下的所有variable和ops会被清空。