selenium;time;requests;json，微信自己的公众号接口，爬取微信公众号文章，简单案例，后期可以自己添加公众号或者构造公众号名称列表来寻找文章

微信公众号爬虫实战

最新推荐文章于 2025-07-14 16:35:09 发布

原创最新推荐文章于 2025-07-14 16:35:09 发布 · 588 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#爬虫案例 #selenium的使用 #使用selenium爬取微信文章

Python的常见库使用同时被 3 个专栏收录

40 篇文章

订阅专栏

Python的爬虫使用案例

34 篇文章

订阅专栏

一只网络上的虫（爬虫实例）

31 篇文章

订阅专栏

本文介绍了一种使用Python和Selenium爬取微信公众号文章的方法，通过登录并利用代理池绕过限制，实现了对指定公众号历史文章的大规模抓取。文章详细讲解了如何设置代理、控制请求频率以及解析数据。

这样做法可以爬虫每个公众号大概能爬60篇，就会起限制，所以要爬取全部的文章的话，要启用下篇文章中的代理池爬取，就是在请求的时候加上代理，然后睡眠时间需要你自己的尝试了。

然后其中的查询关键字‘query’,可以换成是手输入的；或者加个找好的公众号列表，在循环遍历，但是呢，这样做的话，最好在函数中定义，然后循环函数，会比这个更加清晰。

# -*- coding: utf-8 -*-
# @date: 2018\11\20 00200:39 
# @Author  : huangtao！！
# @FileName: get_cook.py
# @Software: PyCharm
# @Blog    ：https://blog.youkuaiyun.com/Programmer_huangtao
from selenium import webdriver
import time
from pprint import pprint
from fake_useragent import UserAgent
import random
import  requests
import re
import time 
import json
cookie = {}
driver = webdriver.Chrome()
driver.get('https://mp.weixin.qq.com')
time.sleep(2)
driver.find_element_by_xpath('./*//input[@name="account"]').clear()
driver.find_element_by_xpath('./*//input[@name="account"]').send_keys('你的公众号账号')
driver.find_element_by_xpath('./*//input[@name="password"]').clear()
time.sleep(5)
driver.find_element_by_xpath('./*//input[@name="password"]').send_keys('密码')
driver.find_element_by_xpath('//label[@class="frm_checkbox_label"]').click()
driver.find_element_by_xpath('//a[@class="btn_login"]').click()
time.sleep(15)
cookies = driver.get_cookies()
for item in cookies:
    cookie[item.get('name')] = item.get('value')
pprint(cookie)
with open('cookie.txt','w',encoding='utf-8')as f:
    f.write(json.dumps(cookie))
headers = {'User-Agent':UserAgent().random}
with open('cookie.txt','r',encoding='utf-8')as f:
    cookie = f.read()
    cookie = json.loads(cookie)
url = 'https://mp.weixin.qq.com'
response = requests.get(url,headers=headers,cookies=cookie)
print(response.url)
# print(response.text)
search_url = 'https://mp.weixin.qq.com/cgi-bin/searchbiz?'
token = re.findall(r'token=(\d*)',response.url)[0]
print(token)
search_data = {
    'action': 'search_biz',
    'token': token ,
    'lang': 'zh_CN',
    'f': 'json',
    'ajax': '1',
    'random': random.random(),
    'query': 'jikexueyuan00',
    'begin': '0',
    'count': '5'
}
# print(response.url)
search_response = requests.get(search_url,cookies=cookie,params=search_data)
# print(search_response.text)
result = search_response.json().get('list')[0]
fakeid = result.get('fakeid')
appmsg_data = {
        'token': token,
        'lang': 'zh_CN',
        'f': 'json',
        'ajax': '1',
        'random': random.random(),
        'action': 'list_ex',
        'begin': '0',
        'count': '5',
        'query': '',
        'fakeid': fakeid,
        'type': '9'
}
appmsg_url = 'https://mp.weixin.qq.com/cgi-bin/appmsg?'
appmsg_response = requests.get(appmsg_url,cookies=cookie,params=appmsg_data)
# print(appmsg_response.text)
page_num = int(int(appmsg_response.json().get('app_msg_cnt')) / 5)
begin = 0
while page_num +1 >0:
    appmsg_data = {
        'token': token,
        'lang': 'zh_CN',
        'f': 'json',
        'ajax': '1',
        'random': random.random(),
        'action': 'list_ex',
        'begin': '{}'.format(str(begin)),
        'count': '5',
        'query': '',
        'fakeid': fakeid,
        'type': '9'}
    print('翻页',begin)
    appmsg_response = requests.get(appmsg_url, cookies=cookie, params=appmsg_data)
    appmsg_response_list = appmsg_response.json().get('app_msg_list')
    for item in appmsg_response_list:
        print('标题:',item.get('title'))
        print('链接',item.get('link'))
        pass
    page_num -= 1
    begin = int(begin)
    begin +=5
time.sleep(2)