saver = tf.train.Saver()报错，DataLossError (see above for traceback): Unable to open table file

最新推荐文章于 2023-02-09 11:45:21 发布

转载最新推荐文章于 2023-02-09 11:45:21 发布 · 4.5k 阅读

本文详细解析了在Windows环境下使用TensorFlow时遇到的DataLossError异常，具体表现为无法打开模型文件的问题。通过调整save_path参数，正确指向模型检查点文件中的模型名称，成功解决了模型加载失败的错误。此外，介绍了如何利用tf.train.get_checkpoint_state API自动查找并加载checkpoint文件，简化了模型恢复的过程。

部署运行你感兴趣的模型镜像

DataLossError (see above for traceback): Unable to open table file D:\celebA_64_96_96: Unknown: NewRandomAccessFile failed to Create/Open: D:\celebA_64_96_96: \udcbe\u073e\udcf8\udcb7\udcc3\udcce\u02a1\udca3

; Input/output error

[[Node: save_1/RestoreV2_47 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_47/tensor_names, save_1/RestoreV2_47/shape_and_slices)]]

[[Node: save_1/RestoreV2_20/_3 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_102_save_1/RestoreV2_20", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

刚开始以为是文件权限不够，但是在给了大量的文件权限后，依然报错如上所述，经过搜索后，找到解决方案：

在tf.train.Saver的API中对于restore函数的第二个参数save_path的描述如下：

The save_path argument is typically a value previously returned from a save() call, or a call to latest_checkpoint().

翻译过来就是：save_path参数一般是之前从save()调用返回的数值，或者调用latest_checkpoint()的参数。这里描述的比较清楚的是，save_path是save函数调用的返回值，那么检查save函数的返回值，官方API描述如下：

A string: path at which the variables were saved. If the saver is sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn' is the number of shards created. If the saver is empty, returns None.

它是一个字符串:变量保存的路径。如果saver被共享，那么这个字符串以'-?????-of-nnnnn'结尾，其中'nnnnn'是创建的分片的数量。如果saver是空的，那么返回None。

所以从这里看出save_path直接到模型名的，因此调用restore的save_path，是路径加上模型名，这个模型名的字符串，在save_path中，打开check_point文件，可见model_checkpoint_path字段，其后面就是模型的名称，我这里是DCGAN.model-9495，因此上述报错的代码修改如下：

with tf.Session() as sess:

saver.restore(sess, "D:/celebA_64_96_96/DCGAN.model-9495")

如果我们要更加方便的restore数据该怎么办呢?其实TensorFlow提供了在指定文件夹路径下查询对应的checkpoint的文件的API，其使用如下，假设checkpoint文件路径为my_model_path

saver = tf.train.Saver()

ckpt = tf.train.get_checkpoint_state(my_model_path)

if ckpt and ckpt.model_checkpoint_path:

saver.restore(sess, ckpt.model_checkpoint_path)

对于tf.train.get_checkpoint_state的API官方定义描述如下：

get_checkpoint_state(

checkpoint_dir,

latest_filename=None

)