selenium;time;requests;json,微信自己的公众号接口,爬取微信公众号文章,简单案例,后期可以自己添加公众号或者构造公众号名称列表来寻找文章

      这样做法可以爬虫每个公众号大概能爬60篇,就会起限制,所以要爬取全部的文章的话,要启用下篇文章中的代理池爬取,就是在请求的时候加上代理,然后睡眠时间需要你自己的尝试了。

      然后其中的查询关键字‘query’,可以换成是手输入的;或者加个找好的公众号列表,在循环遍历,但是呢,这样做的话,最好在函数中定义,然后循环函数,会比这个更加清晰。

# -*- coding: utf-8 -*-
# @date: 2018\11\20 00200:39 
# @Author  : huangtao!!
# @FileName: get_cook.py
# @Software: PyCharm
# @Blog    :https://blog.youkuaiyun.com/Programmer_huangtao
from selenium import webdriver
import time
from pprint import pprint
from fake_useragent import UserAgent
import random
import  requests
import re
import time 
import json
cookie = {}
driver = webdriver.Chrome()
driver.get('https://mp.weixin.qq.com')
time.sleep(2)
driver.find_element_by_xpath('./*//input[@name="account"]').clear()
driver.find_element_by_xpath('./*//input[@name="account"]').send_keys('你的公众号账号')
driver.find_element_by_xpath('./*//input[@name="password"]').clear()
time.sleep(5)
driver.find_element_by_xpath('./*//input[@name="password"]').send_keys('密码')
driver.find_element_by_xpath('//label[@class="frm_checkbox_label"]').click()
driver.find_element_by_xpath('//a[@class="btn_login"]').click()
time.sleep(15)
cookies = driver.get_cookies()
for item in cookies:
    cookie[item.get('name')] = item.get('value')
pprint(cookie)
with open('cookie.txt','w',encoding='utf-8')as f:
    f.write(json.dumps(cookie))
headers = {'User-Agent':UserAgent().random}
with open('cookie.txt','r',encoding='utf-8')as f:
    cookie = f.read()
    cookie = json.loads(cookie)
url = 'https://mp.weixin.qq.com'
response = requests.get(url,headers=headers,cookies=cookie)
print(response.url)
# print(response.text)
search_url = 'https://mp.weixin.qq.com/cgi-bin/searchbiz?'
token = re.findall(r'token=(\d*)',response.url)[0]
print(token)
search_data = {
    'action': 'search_biz',
    'token': token ,
    'lang': 'zh_CN',
    'f': 'json',
    'ajax': '1',
    'random': random.random(),
    'query': 'jikexueyuan00',
    'begin': '0',
    'count': '5'
}
# print(response.url)
search_response = requests.get(search_url,cookies=cookie,params=search_data)
# print(search_response.text)
result = search_response.json().get('list')[0]
fakeid = result.get('fakeid')
appmsg_data = {
        'token': token,
        'lang': 'zh_CN',
        'f': 'json',
        'ajax': '1',
        'random': random.random(),
        'action': 'list_ex',
        'begin': '0',
        'count': '5',
        'query': '',
        'fakeid': fakeid,
        'type': '9'
}
appmsg_url = 'https://mp.weixin.qq.com/cgi-bin/appmsg?'
appmsg_response = requests.get(appmsg_url,cookies=cookie,params=appmsg_data)
# print(appmsg_response.text)
page_num = int(int(appmsg_response.json().get('app_msg_cnt')) / 5)
begin = 0
while page_num +1 >0:
    appmsg_data = {
        'token': token,
        'lang': 'zh_CN',
        'f': 'json',
        'ajax': '1',
        'random': random.random(),
        'action': 'list_ex',
        'begin': '{}'.format(str(begin)),
        'count': '5',
        'query': '',
        'fakeid': fakeid,
        'type': '9'}
    print('翻页',begin)
    appmsg_response = requests.get(appmsg_url, cookies=cookie, params=appmsg_data)
    appmsg_response_list = appmsg_response.json().get('app_msg_list')
    for item in appmsg_response_list:
        print('标题:',item.get('title'))
        print('链接',item.get('link'))
        pass
    page_num -= 1
    begin = int(begin)
    begin +=5
time.sleep(2)

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值