PaddlePaddle在使用ml-1m数据集训练模型是出现序列数字元素的错误

  • 关键字:序列数据

  • 问题描述:在使用ml-1m数据集训练个性化推荐模型,在执行训练的时候出现值错误,错误提示使用序列设置数组元素。

  • 报错信息:

<ipython-input-8-71a7f986f7ba> in train(use_cuda, train_program, params_dirname)
     39         event_handler=event_handler,
     40         reader=train_reader,
---> 41         feed_order=feed_order)

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in train(self, num_epochs, event_handler, reader, feed_order)
    403         else:
    404             self._train_by_executor(num_epochs, event_handler, reader,
--> 405                                     feed_order)
    406 
    407     def test(self, reader, feed_order):

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_executor(self, num_epochs, event_handler, reader, feed_order)
    481             exe = executor.Executor(self.place)
    482             reader = feeder.decorate_reader(reader, multi_devices=False)
--> 483             self._train_by_any_executor(event_handler, exe, num_epochs, reader)
    484 
    485     def _train_by_any_executor(self, event_handler, exe, num_epochs, reader):

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/contrib/trainer.py in _train_by_any_executor(self, event_handler, exe, num_epochs, reader)
    494         for epoch_id in epochs:
    495             event_handler(BeginEpochEvent(epoch_id))
--> 496             for step_id, data in enumerate(reader()):
    497                 if self.__stop:
    498                     if self.checkpoint_cfg:

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in __reader_creator__()
    275             if not multi_devices:
    276                 for item in reader():
--> 277                     yield self.feed(item)
    278             else:
    279                 num = self._get_number_of_places_(num_places)

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in feed(self, iterable)
    196         for each_name, each_converter in six.moves.zip(self.feed_names,
    197                                                        converter):
--> 198             ret_dict[each_name] = each_converter.done()
    199         return ret_dict
    200 

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/data_feeder.py in done(self)
     71 
     72     def done(self):
---> 73         arr = numpy.array(self.data, dtype=self.dtype)
     74         if self.shape and len(arr.shape) != len(self.shape):
     75             arr = arr.reshape(self.shape)

ValueError: setting an array element with a sequence.
  • 问题复现:在使用fluid.layers.data接口定义电影名称数据输入,lod_level使用默认值,然后作为词向量的输入参数。最后在训练的时候,就会出现以上的错误。错误代码如下:
CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())
category_id = layers.data(name='category_id', shape=[1], dtype='int64')
mov_categories_emb = layers.embedding(input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_categories_hidden = layers.sequence_pool(input=mov_categories_emb, pool_type="sum")

MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())
mov_title_id = layers.data(name='movie_title', shape=[1], dtype='int64')
mov_title_emb = layers.embedding(input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_title_conv = nets.sequence_conv_pool(
    input=mov_title_emb,
    num_filters=32,
    filter_size=3,
    act="tanh",
    pool_type="sum")
  • 解决问题:电影的标题和电影的类型都是名称类型的字符串数据,所以数据应该是一个序列数据,fluid.layers.data接口的lod_level参数应该是1,定义这个数据是一个序列数据。正确代码如下:
CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())
category_id = layers.data(name='category_id', shape=[1], dtype='int64', lod_level=1)
mov_categories_emb = layers.embedding(input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_categories_hidden = layers.sequence_pool(input=mov_categories_emb, pool_type="sum")

MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())
mov_title_id = layers.data(name='movie_title', shape=[1], dtype='int64', lod_level=1)
mov_title_emb = layers.embedding(input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)
mov_title_conv = nets.sequence_conv_pool(
    input=mov_title_emb,
    num_filters=32,
    filter_size=3,
    act="tanh",
    pool_type="sum")
  • 问题拓展:不仅仅是电影的名称,这个数据集中电影的类别也是字符串数据paddle.dataset.movielens.movie_categories(),也需要使用序列数据方式定义。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值