在处理用户表的时候,发现用户名称少于用户总数,而且没有null值,
查询发现:用户表出现有用户名为空字符串的数据,即使用空字符串("")查询,能够匹配到数据;
排查清洗代码,并没有发现空字符串从何而来
查询计数语句
GET user_info_index/_doc/_count
{
"query": {
"bool": {
"must": [
{
"term": {
"user_name.keyword": {
"value": ""
}
}
}
]
}
}
}
查询计数结果
{
"count": 63215,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
}
}
查询名称语句
GET user_info_index/_doc/_search
{
"size": 2,
"_source": ["user_name", "user_id"],
"query": {
"bool": {
"must": [
{
"term": {
"user_name.keyword": {
"value": ""
}
}
}
]
}
}
}
查询名称结果
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 63215,
"max_score": 5.054031,
"hits": [
{
"_index": "user_info_index",
"_type": "_doc",
"_id": "07837b0b4bdd48369165432dde747f7a",
"_score": 5.054031,
"_source": {
"user_name": "",
"user_id": "07837b0b4bdd48369165432dde747f7a"
}
},
{
"_index": "user_info_index",
"_type": "_doc",
"_id": "6a76b5c1eac64152d324d81769ef65ea",
"_score": 5.054031,
"_source": {
"user_name": "",
"user_id": "6a76b5c1eac64152d324d81769ef65ea"
}
}
]
}
}
在处理用户表时遇到用户名称数量小于总数的情况,经查询发现存在用户名为空字符串的数据。通过空字符串查询能匹配到这些异常数据,但清洗代码未找到其来源。已执行相关计数和查询语句进行分析。
1228

被折叠的 条评论
为什么被折叠?



