微信爬虫实战

最新推荐文章于 2025-06-24 23:21:53 发布

勇气9601

最新推荐文章于 2025-06-24 23:21:53 发布

阅读量1.7k

点赞数

CC 4.0 BY-SA版权

分类专栏： Python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/mafang9601/article/details/81274609

Python 专栏收录该内容

25 篇文章

订阅专栏

所谓微信爬虫，及自动获取微信的相关文章信息的一种爬虫。微信对我们的限制是很多的，所以，我们需要采取一些手段解决这些限制，主要包括伪装浏览器、使用代理IP等方式。

import re

import urllib.request

import time

import urllib.error

#自定义函数，功能为使用代理服务器爬取一个网址

def use_proxy(IP,url):

try:

req=urllib.request.Request(url)

req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0")

proxy=urllib.request.ProxyHandler({"http":IP})

opener=urllib.request.build_opener(proxy,urllib.request.HTTPHandler)

#添加为全局

urllib.request.install_opener(opener)

data=urllib.request.urlopen(url).read()

data=data.decode("utf-8","ignore")

return data

except urllib.error.URLError as e:

if hasattr(e,"code"):

print(e.code)

if hasattr(e,"reason"):

print(e.reason)

#若为URLError异常，延时10秒执行

time.sleep(10)

except Exception as e:

print("exception: "+str(e))

#若为Exception 异常，延时1秒执行

time.sleep(1)

#设置关键词

key="Python"

#设置代理服务器，该代理服务器有可能失效，需要换成新的有效代理服务器

proxy="139.129.99.9:3128"

#爬多少页

for i in range(0,10):

key=urllib.request.quote(key)

thispageurl="http://weixin.sougou.com/weixin?type=2&query="+key+"&page="+str(i)

#a="http://biog.youkuaiyun.com"

thispagedata=use_proxy(proxy,thispageurl)

print(len(str(thispagedata)))

pat1='<a href="(.*?)"'

#模式修正符re.S设置.可以匹配多行

rs1=re.compile(pat1,re.S).findall(str(thispagedata))

if(len(rs1)==0):

print("此次（"+str(i)+"页）没成功！！！")

continue

for j in range(0,len(rs1)):

thisurl=rs1[j]

#需要将每个文章地址的部分进行替换得到真正的地址

thisurl=thisurl.replace("amp;","")

file=r"C:\Users\Mr.Ma\Desktop\Wei\第"+str(i)+"页第"+str(j)+"篇文章.html"

thisdata=use_proxy(proxy,thisurl)

print(len(thisdata))

try:

fh=open(file,"wb")

fh.write(thisdata)

fh.close()

print("第"+str(i)+"页第"+str(j)+"篇文章成功！")

except Exception as e:

print(e)

print("第"+str(i)+"页第"+str(j)+"篇文章失败！")

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。