基于TF2.0的keras BiLSTM-CRF模型报错
在加载数据进入模型训练时报如下错误
tensorflow.python.framework.errors_impl.InvalidArgumentError:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[8,1] = 1000001 is not in [0, 1502)
[[node sequential/embedding/embedding_lookup (defined at E:\competition\Specification\NerModel.py:60) ]] [Op:__inference_train_function_123177]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/118162 (defined at D:\Anaconda3\envs\tensorflow_2.0\lib\contextlib.py:81)
Function call stack:
train_function
原因:
我的语料字典转换序号时,未知字是用的1000001数字,mapping_dict总长度为1502,而mapping_dict['unk'] = 1000001
。1000001超过了其长度因此报错(原理也很奇怪,为什么值不能超过它的长度呢?欢迎留言,感谢!!!)。
解决:
完成保存语料字典时,要将转换的数字排序。即最后语料字典要进行一下操作。
new_id = 0
for w in list(mapping_dict):
mapping_dict[w] = new_id
new_id += 1