利用requests库模拟访问博客来提升文章阅读量

利用requests库模拟访问博客来提升文章阅读量

一、概述

有的同学在csdn上写了文章之后,看着自己文章的阅读量,少的可怜,不禁希望能把阅读量快速涨上去,不为别的,就是为了看着更舒服。

除了通过正道提升文章质量,扩大影响力之外,也可以耍点小心机。因为通过实际操作可以发现,自己访问一篇文章,也可以使文章的阅读量上涨,不过,大概每30秒才会涨一次,在30秒内的多次重复访问不会提升阅读量。

似乎可以自己写一个工具来模拟博客访问,只要获取到每篇文章的url后,就好办了,直接用get方法访问即可,需要注意的是,get()方法必须带上headers参数,否则拿不到数据。于是,便有了本文。。。

二、简陋版程序

简陋版, 需要手动填写每篇文章的url地址

import requests
import time
import random

class CsdnVistor():
    # 手动把每篇博文的url填好
    url_list = ['https://blog.youkuaiyun.com/hubing_hust/article/details/127882719',
                'https://blog.youkuaiyun.com/hubing_hust/article/details/127882697']

    def __init__(self):
        pass

    def visit_page(self):
        all_article_urls = self.get_all_article_urls()
        while True:
            for url in all_article_urls:
                headers = self.get_headers()
                print(f'visiting article: {url}')
                print(f'using header: {headers}')
                try:
                    res = requests.get(url, headers=headers)
                    print(f'status code: {res.status_code}')
                except Exception as e:
                    print('unexpected error happened')
                # 自己调整sleep时长
                time.sleep(5)

    def get_headers(self):
        user_agent_list = [
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
            "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]

        userAgent = random.choice(user_agent_list)
        headers = {'User-Agent': userAgent}
        return headers

    def get_all_article_urls(self):
        '''
        获取所有的文章链接
        '''
        return self.url_list

if __name__ == '__main__':
    visitor = CsdnVistor()
    visitor.visit_page()

三、升级版程序

简陋版的程序虽然也能工作,但是还不够智能,url需要手动维护,这太麻烦了。那么能不能自动获取呢?

我们从博客主页入手,尝试自动获取所有文章的url。

先通过浏览器打开个人主页试一下,发现并不是所有文章一次性展示全,默认只展现20篇文章,其余的都被隐藏了,需要对页面进行向下滚动,才会把隐藏的项展示出来。这一点,通过浏览器的调试界面可以清楚地看到。

浏览器中按F12打开调试界面,提取css选择器 article[class=“blog-list-box”] ,可以看到一开始并不能将全部结果展示出来,只有20条记录。

20221117113504

那么如何才能获取所有的文章呢?我们通过浏览器的F12调试界面的网络标签来一窥端倪。

先滚动页面,然后看看隐藏的项目是怎么被加载出来的。找出类型为json的GET方法,可以看到浏览器的请求过程。

20221117113911

20221117114604
20221117115638

从这里可以看出,页面展示内容是通过page和size来进行分页的,这就好办了,我们直接把这个url拿出来进行简单修改即可:

  • 可以将page指定为1,size指定为100(只要超过总的文章数即可)
  • 也可以保持size为20,然后我们访问的时候对page进行循环即可

我们选择简单粗暴的第一种方式,直接修改该url:

https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust

简单修改一下简陋版的程序,直接使用上面修改过的url进行访问,并把响应结果打印出来,试试效果,看看能不能取得预期效果

import requests
import time
import random

class CsdnVistor():
    url_list = ['https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust']

    def __init__(self):
        pass

    def visit_page(self):
        all_article_urls = self.get_all_article_urls()

        for url in all_article_urls:
            headers = self.get_headers()
            print(f'using header: {headers}')
            try:
                res = requests.get(url, headers=headers)
                print(f'status code: {res.status_code}')

                # 打印出返回结果,看看是否能够将文章全部返回
                print(res.json())
            except Exception as e:
                print('unexpected error happened')

    def get_headers(self):
        user_agent_list = [
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
            "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]

        userAgent = random.choice(user_agent_list)
        headers = {'User-Agent': userAgent}
        return headers

    def get_all_article_urls(self):
        '''
        获取所有的文章链接
        '''
        return self.url_list

if __name__ == '__main__':
    visitor = CsdnVistor()
    visitor.visit_page()

运行测试程序,可以看到服务器返回的json串如下:
20221117120115

把返回的json串格式化一下:
20221117120328
可以看到完全符合预期,再把格式化后的json串用文本编辑器打开,统计一下数目对不对,确保万无一失
20221117120632

到这一步,基本就完成80%了,拿到了想要的数据,只需要稍微提取一下相应的字段,就可以得到全部文章的url链接:

import requests
import time
import random

class CsdnVistor():
    blog_homepage = 'https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust'

    def __init__(self):
        pass

    def visit_page(self):
        all_article_urls = self.get_all_article_urls()
        while True:
            for url in all_article_urls:
                headers = self.get_headers()
                print(f'visiting article: {url}')
                print(headers)
                try:
                    res = requests.get(url, headers=headers)
                    print(res.status_code)
                except Exception as e:
                    print('unexpected error happened')
                time.sleep(2)

    def get_all_article_urls(self):
        '''
        获取所有的文章链接
        '''
        all_article_list = []

        headers = self.get_headers()
        res = requests.get(url=self.blog_homepage, headers=headers)

        if (200 == res.status_code):
            # 若服务器成功响应,则获取响应内容并解析所有文章的url
            for item in res.json()['data']['list']:
                all_article_list.append(item['url'])
        # print(all_article_list)
        return all_article_list

    def get_headers(self):
        user_agent_list = [
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
            "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
            "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
            "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]

        userAgent = random.choice(user_agent_list)
        headers = {'User-Agent': userAgent}
        return headers

if __name__ == '__main__':
    visitor = CsdnVistor()
    visitor.visit_page()

至此,升级版自动刷阅读量的程序就完成了。执行程序就可以默默地刷阅读量了。

写在最后,虽然阅读量可以刷,但毕竟不具备真正意义上的影响力,写文章贵在坚持。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

smart_cat

你的鼓励将是我写作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值