一、初级
1. Allocation of 31360000 exceeds 10% of system memory
爆显存了.
考虑以下几点:
- 减小batch_size
- 减小句子长度
- 模型上减小, 比如层数, hidden_size, 等等.
- 词表上减小
词表也是个注意点, 因为影响embedding的大小。
2. Cannot interpret feed_dict key as Tensor: Tensor Tensor(“Placeholder:0”, shape=(?, 15, 50), dtype=int32) is not an element of this graph
解决办法:
这是在说:placeholder不在新建的tf.Graph()中,此时只需要将placeholder放入其中即可。
错误示例:
以下为错误代码:
class Graph:
def __init__(self, is_training=True):
self.graph = tf.Graph()
with self.graph.as_default():
"""
网络模型
"""
if __name__ == "__main__":
g = Graph()
# placeholder -> 造成错误的原因.
input = tf.placeholder(tf.int32, shape=(None, hp.max_turn, hp.maxlen))
y = tf.placeholder(tf.int32, shape=(None, hp.maxlen))
sv = tf.train.Supervisor(graph=g.graph, logdir=hp.logdir, save_model_secs=0)
tfconfig = tf.ConfigProto()
tfconfig.gpu_options.allow_growth = True
with sv.managed_session(config=tfconfig) as sess:
"""
训练.
"""
纠正后的代码:
class Graph:
def __init__(self, is_training=True):
self.graph = tf.Graph()
with self.graph.as_default():
# placeholder -> 放到这里.
input = tf.placeholder(tf.int32, shape=(None, hp.max_turn, hp.maxlen))
y = tf.placeholder(tf.int32, shape=(None, hp.maxlen))
"""
网络模型
"""
if __name__ == "__main__":
g = Graph()
sv = tf.train.Supervisor(graph=g.graph, logdir=hp.logdir, save_model_secs=0)
tfconfig = tf.ConfigProto()
tfconfig.gpu_options.allow_growth = True
with sv.managed_session(config=tfconfig) as sess:
"""
训练.
"""
3. UserWarning: Converting sparse IndexedSlices to a dense Tensor with 427557888 elements. This may consume a large amount of memory.
存在原因:
这不是一个错误,但是占用了这么多显存,会爆内存的。
目前遇到的原因有:
- embedding 词表太大.
当时出错的词表大小是77W + 82W.
self.enc_embed = embedding(tf.reshape(self.x, [-1, hp.maxlen]),
vocab_size=data_hp["de2idx_len"],
num_units=embeddingsize,
scale=True,
scope="enc_embed")