需求来源将如下字符串中pid对应的值,组织成列表形式:
getMerchandiseIds({“code”:1,“msg”:“success”,“tid”:"-6007374682839992451",“data”:{“keepTime”:180,“total”:2042,“totalTxt”:“2042”,“batchSize”:120,“pageSize”:30,“isLast”:0,“showBsFilter”:1,“showRank”:0,“headInfo”:{“isShowLabel”:1,“listStyle”:“1”,“spellCheck”:{“type”:“1”,“originWord”:“口红”}},“products”:[{“pid”:“6918464699139588940”,“isReco”:“0”},{“pid”:“6918383816403572620”,“isReco”:“0”},{“pid”:“6918976367090246728”,“isReco”:“0”},{“pid”:“6918063477853790749”,“isReco”:“0”},{“pid”:“6919173835742442889”,“isReco”:“0”}],“sortTips”:“根据大部分用户喜爱的商品,为您挑选:”}})
上边的字符串是爬虫得到的结果。
有2种解析方法:
- 返回response.text字符串getMerchandiseIds() 可以用正则提取括号里面的数据,字符串转json数据,导入一个json模块,json.loads(字符串)
- 直接使用正则匹配提取 ‘pid’: ‘(\d+)’
1. 正则提取getMerchandiseIds() 中的json字符串
str1 = 'getMerchandiseIds({"code":1,"msg":"success","tid":"-6007374682839992451","data":{"keepTime":180,"total":2042,"totalTxt":"2042","batchSize":120,"pageSize":30,"isLast":0,"showBsFilter":1,"showRank":0,"headInfo":{"isShowLabel":1,"listStyle":"1","spellCheck":{"type":"1","originWord":"口红"}},"products":[{"pid":"6918464699139588940","isReco":"0"},{"pid":"6918383816403572620","isReco":"0"},{"pid":"6918976367090246728","isReco":"0"},{"pid":"6918063477853790749","isReco":"0"},{"pid":"6919173835742442889","isReco":"0"}],"sortTips":"根据大部分用户喜爱的商品,为您挑选:"}})'
p1 = re.compile(r'[(](.*?)[)]',re.S)
result = re.findall(p1,str1)[0]
print(result)
运行结果:
{“code”:1,“msg”:“success”,“tid”:"-6007374682839992451",“data”:{“keepTime”:180,“total”:2042,“totalTxt”:“2042”,“batchSize”:120,“