-
关键字:
序列数据
-
问题描述:在使用ml-1m数据集训练个性化推荐模型,在执行训练的时候出现值错误,错误提示使用序列设置数组元素。
-
报错信息:
<ipython-input-8-71a7f986f7ba> in train(use_cuda, train_program, params_dirname)
39 event_handler=event_handler,
40 reader=train_reader,
---> 41 feed_order=feed_order)
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in train(self, num_epochs, event_handler, reader, feed_order)
403 else:
404 self._train_by_executor(num_epochs, event_handler, reader,
--> 405 feed_order)
406
407 def test(self, reader, feed_order):
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_executor(self, num_epochs, event_handler, reader, feed_order)
481 exe = executor.Executor(self.place)
482 reader = feeder.decorate_reader(reader, multi_devices=False)
--> 483 self._train_by_any_executor(event_handler, exe, num_epochs, reader)
484
485 def _train_by_any_executor(self, event_handler, exe, num_epochs, reader):
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_any_executor(self, event_handler, exe, num_epochs, reader)
494 for epoch_id in epochs:
495 event_handler(BeginEpochEvent(epoch_id))
--> 496 for step_id, data in enumerate(reader()):
497 if self.__stop:
498 if self.checkpoint_cfg:
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in __reader_creator__()
275 if not multi_devices:
276 for item in reader():
--> 277 yield self.feed(item)
278 else:
279 num = self._get_number_of_places_(num_places)
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in feed(self, iterable)
196 for each_name, each_converter in six.moves.zip(self.feed_names,
197 converter):
--> 198 ret_dict[each_name] = each_converter.done()
199 return ret_dict
200
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in done(self)
71
72 def done(self):
---> 73 arr = numpy.array(self.data, dtype=self.dtype)
74 if self.shape and len(arr.shape) != len(self.shape):
75 arr = arr.reshape(self.shape)
ValueError: setting an array element with a sequence.
- 问题复现:在使用
fluid.layers.data
接口定义电影名称数据输入,lod_level
使用默认值,然后作为词向量的输入参数。最后在训练的时候,就会出现以上的错误。错误代码如下:
CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())
category_id = layers.data(name='category_id', shape=[1], dtype='int64')
mov_categories_emb = layers.embedding(input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_categories_hidden = layers.sequence_pool(input=mov_categories_emb, pool_type="sum")
MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())
mov_title_id = layers.data(name='movie_title', shape=[1], dtype='int64')
mov_title_emb = layers.embedding(input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_title_conv = nets.sequence_conv_pool(
input=mov_title_emb,
num_filters=32,
filter_size=3,
act="tanh",
pool_type="sum")
- 解决问题:电影的标题和电影的类型都是名称类型的字符串数据,所以数据应该是一个序列数据,
fluid.layers.data
接口的lod_level
参数应该是1,定义这个数据是一个序列数据。正确代码如下:
CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())
category_id = layers.data(name='category_id', shape=[1], dtype='int64', lod_level=1)
mov_categories_emb = layers.embedding(input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_categories_hidden = layers.sequence_pool(input=mov_categories_emb, pool_type="sum")
MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())
mov_title_id = layers.data(name='movie_title', shape=[1], dtype='int64', lod_level=1)
mov_title_emb = layers.embedding(input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_title_conv = nets.sequence_conv_pool(
input=mov_title_emb,
num_filters=32,
filter_size=3,
act="tanh",
pool_type="sum")
- 问题拓展:不仅仅是电影的名称,这个数据集中电影的类别也是字符串数据
paddle.dataset.movielens.movie_categories()
,也需要使用序列数据方式定义。