TensorFlow实现估值网络（Q-learning）代码遇到的问题

最新推荐文章于 2024-11-04 03:48:45 发布

黑暗骑士V

最新推荐文章于 2024-11-04 03:48:45 发布

阅读量3.9k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：深度学习文章标签：强化学习 TensorFlow Q-Learning

本文链接：https://blog.youkuaiyun.com/JsonD/article/details/73477779

深度学习专栏收录该内容

11 篇文章

订阅专栏

《TensorFlow实战》8.3 TensorFlow实现估值网络这一节看了3遍之后终于明白了大概思路。本人使用是Windows10、 TensorFlow-0.12、 GTX 1070 GPU，在代码实现的过程中遇到几个问题。第二个问题看了2个小时外加睡了一觉才解决。

该例子代码的实现思路是先搭建一个吃箱子的小游戏，然后再搭建Q-learning网络跑这个

游戏，每轮走50步然后去获取该轮积分的最大值。完整代码可以参考这位美女同学的博客：

http://blog.youkuaiyun.com/Felaim/article/details/70880726

问题1：

   (参考代码144行)

    self.streamAC,self.streamVC=tf.split(self.conv4,2,3)

    释义: 把卷积层conv4的输出在第三个维度上平分为2个矩阵

    报错: TypeError: Input 'split_dim' of 'Split' Op has type float32

         that does not match expected type of int32.

    修复方案: TensorFlow-0.12的tf.split函数已经修改为:

     def split(split_dim, num_split, value, name="split"):

     split_dim: 需要平分的是第几维度

     num_split: 平分的份数

     value: 需要平分的矩阵

     修改后的代码为：

     self.streamAC, self.streamVC = tf.split(3, 2, self.conv4)
     备注:TensorFlow-1.21参数顺序又修改了，如果出现问题可以考虑查看下具体api
问题2：   
   
    
          (参考代码157行)
   
      self.actions_onehot = tf.one_hot(self.actions, env.actions, dtype=tf.float32)  
      这句代码TensorFlow在GPU下跑会报错
 
错误提示为：
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:198] Unexpected Event status: 1


Stack Overflow的这篇文章给了我启示  
     https://stackoverflow.com/questions/41115476/
		tensorflow-gpu-cuda-error-launch-failed-on-tf-one-hot
         "On the Windows 10 GPU, tf.matmul, tf.reduce_mean, tf.reduce_sum are run ok. But tf.one_hot is not ok."
               代码修改为:		 
	with tf.device('/cpu:0'):
    	     self.actions_onehot = tf.one_hot(self.actions, env.actions, dtype=tf.float32)
	让tf.one_hot这个函数回到CPU下运行则OK