python scrapy re正则表达式

最新推荐文章于 2025-09-02 00:04:43 发布

原创最新推荐文章于 2025-09-02 00:04:43 发布 · 1.1w 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#python #正则表达式 #re #xpath

python 同时被 2 个专栏收录

50 篇文章

订阅专栏

FE and Scrapy

21 篇文章

订阅专栏

推荐一个unicode转换网址http://tool.chinaz.com/Tools/Unicode.aspx

re正则表达式

re.findall(u'\u8f6c\u53d1\[(\d+)\]',selector.xpath('//div[not(@class)]/span[1]/a/text()').extract_first())
//提取“评论[11]”中的数字11

re.findall(u'\u8bc4\u8bba\[(\d+)\]',selector.xpath('//div[not(@class)]/span[2]/a/text()').extract_first())
//提取“转发[11]”中的数字11

re.findall(u'\u8d5e\[(\d+)\]',selector.xpath('//div[not(@class)]/span[3]/a/text()').extract_first()) 
//提取“赞[11]”中的数字“11”

re.findall(u'\s(\d+)/',selector.xpath('//input[@type="submit"]/text()').extract_first())
//提取“ 11/150”中的数字“11”（“\s”匹配空格，“／”不需要转义）   

re.findall(u'\u5173\u6ce8\[(\d+)\]',selector.xpath('//div[@class="tip2"]/a[1]/text()').extract_first())
//提取“关注[11]”中的数字“11”

re.findall(u'\u7c89\u4e1d\[(\d+)\]',selector.xpath('//div[@class="tip2"]/a[2]/text()').extract_first())
//提取“粉丝[11]”中的数字“11”

re.findall(u'\u5fae\u535a\[(\d+)\]',selector.xpath('//div[@class="tip2"]/span[@class="tc"]/text()').extract_first())
//提取“微博[11]”中的数字“11”