最近项目需要用到对多值离散特征embedding处理,但是网上流行的都是tensorflow1的实现,用tensorflow2实现又踩了颇多API的坑,这里贴一下基于tensorflow2的代码以供朋友们参考。
输入:
0 757713 757713 757713 718096 757713 613698 7577...
1 800752 800752 800752 800752 800752 800752 8007...
2 709909 709909 709909 709909 709909 709909 1133...
3 399879 399879
4 684569 684569 488509 684569 684569 670847 670847
5 918215 918215 918215 918215 207615 836298 5043...
6 117858 594091 117858 488509 488509 488509 4885...
这里是Dataframe里面一列,每个item以空格分隔。
输出结果:
<tf.Tensor: id=6258485, shape=(260864, 16), dtype=float32, numpy=
array([[ 0.41795182, -0.39300975, 0.213124 , ..., -0.27875707,
0.01765781, 0.08868953],
[-0.73458 , -0.02326077, 0.7044893 , ..., 1.2839459 ,
-0.02931389, -0.8411617 ],
[ 0.5692324 , -0.1875325 , -1.018876 , ..., -0.27677986,
-0.14484856, -1.366552 ],
...,
[ 0.31806934, 0.04071464, -0.49344513, ..., 0.47461817,
0.22558866, -0.02738286],
[ 0.7392819 , 0.8259195 , -0.6769146 , ..., 0.38030392,
-0.15495802, 0.22499947],
[ 0.00484937, 1.4204272 , 0.5105733 , ..., 1.7746845 ,
-0.09681775, 1.7567879 ]], dtype=float32)>