-
关键字:
词向量,N-gram神经网络 -
问题描述:在使用N-gram神经网络训练PTB数据集时,手动设置了字典大小,最后在启动训练的时候出现错误,错误提示ids[i]>row_number。
-
报错信息:
<ipython-input-6-daf8837e1db3> in train(use_cuda, train_program, params_dirname)
37 num_epochs=1,
38 event_handler=event_handler,
---> 39 feed_order=['firstw', 'secondw', 'thirdw', 'fourthw', 'nextw'])
/usr/local/lib/python3.5/dist-packages/paddle/fluid/contrib/trainer.py in train(self, num_epochs, event_handler, reader, feed_order)
403 else:
404 self._train_by_executor(num_epochs, event_handler, reader,
--> 405 feed_order)
406
407 def test(self, reader, feed_order):
/usr/local/lib/python3.5/dist-packages/paddle/fluid/contrib/trainer.py in _train_by_executor(self, num_epochs, event_handler, reader, feed_order)
481 exe = executor.Executor(self.place)
482 reader = feeder.decorate_reader(reader, multi_devices=False)
--> 483 self._train_by_any_executor(event_handler, exe, num_epochs, reader)
484
485 def _train_by_any_executor(self, event_handler, exe, num_epochs, reader):
/usr/local/lib/python3.5/dist-packages/paddle/fluid/contrib/trainer.py in _train_by_any_executor(self, event_handler, exe, num_epochs, reader)
510 fetch_list=[
511 var.name
--> 512 for var in self.train_func_outputs
513 ])
514 else:
/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
468
469 self._feed_data(program, feed, feed_var_name, scope)
--> 470 self.executor.run(program.desc, scope, 0, True, True)
471 outs = self._fetch_data(fetch_list, fetch_var_name, scope)
472 if return_numpy:
EnforceNotMet: Enforce failed. Expected ids[i] < row_number, but received ids[i]:2073 >= row_number:32.
at [/paddle/paddle/fluid/operators/lookup_table_op.h:59]
PaddlePaddle Call Stacks:
- 问题复现:在是使用
fluid.layers.embedding接口构造N-gram神经网络,其中设置size参数的值为[32, 32],在执行训练的时候就会报该错误。错误代码如下:
dict_size = 32
embed_first = fluid.layers.embedding(
input=first_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_second = fluid.layers.embedding(
input=second_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_third = fluid.layers.embedding(
input=third_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_fourth = fluid.layers.embedding(
input=fourth_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
- 解决问题:
fluid.layers.embedding接口的size参数的值应该是[数据集的字典大小, 词向量大小]。所以第一个参数的值应该是字典的大小,所以要根据字典的实际大小来进行赋值。正确代码如下:
word_dict = paddle.dataset.imikolov.build_dict()
dict_size = len(word_dict)
embed_first = fluid.layers.embedding(
input=first_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_second = fluid.layers.embedding(
input=second_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_third = fluid.layers.embedding(
input=third_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
embed_fourth = fluid.layers.embedding(
input=fourth_word,
size=[dict_size, EMBED_SIZE],
dtype='float32',
is_sparse=is_sparse,
param_attr='shared_w')
- 问题分析:关于fluid.layers.embedding()方法的更多内容,可以参考API文档相关部分:
http://www.paddlepaddle.org/documentation/docs/zh/1.1/api/layers.html#embedding
本文描述了在使用N-gram神经网络训练PTB数据集时遇到的错误,即'ids[i]>row_number'的问题,并提供了详细的解决方案,包括正确的代码示例和对fluid.layers.embedding接口的正确使用说明。
1854

被折叠的 条评论
为什么被折叠?



