报错
用spark 往es写入数据,总是丢失几条数据,进入日志详细查看后发现一个报错
{"index":"*****","type":"_doc","id":"1267565827","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"medical_record_info\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 123, 112, 97, 116, 105, 101, 110, 116, 95, 117, 105, 110, 58, 49, 50, 54, 55, 53, 54, 53, 56, 50, 55, 44, 116, 114, 101, 97, 116]...', original message: bytes can be at most 32766 in length; got 37920]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 37920]"}},"status":400}
{"index":"*****","type":"_doc","id":"1396085925","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"medical_record_info\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 123, 112, 97, 116, 105, 101, 110, 116, 95, 117, 105, 110, 58, 49, 51, 57, 54, 48, 56, 53, 57, 50, 53, 44, 116, 114, 101, 97, 116]...', original message: bytes can be at most 32766 in length; got 56516]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 56516]"}},"status":400}
{"index":"*****","type":"_doc","id":"1455758148","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"medical_record_info\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 123, 112, 97, 116, 105, 101, 110, 116, 95, 117, 105, 110, 58, 49, 52, 53, 53, 55, 53, 56, 49, 52, 56, 44, 116, 114, 101, 97, 116]...', original message: bytes can be at most 32766 in length; got 41352]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 41352]"}},"status":400}
{"index":"*****","type":"_doc","id":"20000063963","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"medical_record_info\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 123, 112, 97, 116, 105, 101, 110, 116, 95, 117, 105, 110, 58, 50, 48, 48, 48, 48, 48, 54, 51, 57, 54, 51, 44, 116, 114, 101, 97]...', original message: bytes can be at most 32766 in length; got 3178084]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 3178084]"}},"status":400}
whose UTF8 encoding is longer than the max length 32766
异常原因是某个模块的长度太长超过了keyword类型的最大长度,将类型设置成text 就好了
本文探讨了使用Spark将数据写入Elasticsearch时遇到的超长字段问题,通过分析报错发现是由于字段'medical_record_info'的UTF-8编码过长。解决方案是调整分析器并将其类型改为text。
1641

被折叠的 条评论
为什么被折叠?



