一、概述
有的同学在csdn上写了文章之后,看着自己文章的阅读量,少的可怜,不禁希望能把阅读量快速涨上去,不为别的,就是为了看着更舒服。
除了通过正道提升文章质量,扩大影响力之外,也可以耍点小心机。因为通过实际操作可以发现,自己访问一篇文章,也可以使文章的阅读量上涨,不过,大概每30秒才会涨一次,在30秒内的多次重复访问不会提升阅读量。
似乎可以自己写一个工具来模拟博客访问,只要获取到每篇文章的url后,就好办了,直接用get方法访问即可,需要注意的是,get()方法必须带上headers参数,否则拿不到数据。于是,便有了本文。。。
二、简陋版程序
简陋版, 需要手动填写每篇文章的url地址
import requests
import time
import random
class CsdnVistor():
# 手动把每篇博文的url填好
url_list = ['https://blog.youkuaiyun.com/hubing_hust/article/details/127882719',
'https://blog.youkuaiyun.com/hubing_hust/article/details/127882697']
def __init__(self):
pass
def visit_page(self):
all_article_urls = self.get_all_article_urls()
while True:
for url in all_article_urls:
headers = self.get_headers()
print(f'visiting article: {url}')
print(f'using header: {headers}')
try:
res = requests.get(url, headers=headers)
print(f'status code: {res.status_code}')
except Exception as e:
print('unexpected error happened')
# 自己调整sleep时长
time.sleep(5)
def get_headers(self):
user_agent_list = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]
userAgent = random.choice(user_agent_list)
headers = {'User-Agent': userAgent}
return headers
def get_all_article_urls(self):
'''
获取所有的文章链接
'''
return self.url_list
if __name__ == '__main__':
visitor = CsdnVistor()
visitor.visit_page()
三、升级版程序
简陋版的程序虽然也能工作,但是还不够智能,url需要手动维护,这太麻烦了。那么能不能自动获取呢?
我们从博客主页入手,尝试自动获取所有文章的url。
先通过浏览器打开个人主页试一下,发现并不是所有文章一次性展示全,默认只展现20篇文章,其余的都被隐藏了,需要对页面进行向下滚动,才会把隐藏的项展示出来。这一点,通过浏览器的调试界面可以清楚地看到。
浏览器中按F12打开调试界面,提取css选择器 article[class=“blog-list-box”] ,可以看到一开始并不能将全部结果展示出来,只有20条记录。
那么如何才能获取所有的文章呢?我们通过浏览器的F12调试界面的网络标签来一窥端倪。
先滚动页面,然后看看隐藏的项目是怎么被加载出来的。找出类型为json的GET方法,可以看到浏览器的请求过程。
从这里可以看出,页面展示内容是通过page和size来进行分页的,这就好办了,我们直接把这个url拿出来进行简单修改即可:
- 可以将page指定为1,size指定为100(只要超过总的文章数即可)
- 也可以保持size为20,然后我们访问的时候对page进行循环即可
我们选择简单粗暴的第一种方式,直接修改该url:
https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust
简单修改一下简陋版的程序,直接使用上面修改过的url进行访问,并把响应结果打印出来,试试效果,看看能不能取得预期效果
import requests
import time
import random
class CsdnVistor():
url_list = ['https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust']
def __init__(self):
pass
def visit_page(self):
all_article_urls = self.get_all_article_urls()
for url in all_article_urls:
headers = self.get_headers()
print(f'using header: {headers}')
try:
res = requests.get(url, headers=headers)
print(f'status code: {res.status_code}')
# 打印出返回结果,看看是否能够将文章全部返回
print(res.json())
except Exception as e:
print('unexpected error happened')
def get_headers(self):
user_agent_list = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]
userAgent = random.choice(user_agent_list)
headers = {'User-Agent': userAgent}
return headers
def get_all_article_urls(self):
'''
获取所有的文章链接
'''
return self.url_list
if __name__ == '__main__':
visitor = CsdnVistor()
visitor.visit_page()
运行测试程序,可以看到服务器返回的json串如下:
把返回的json串格式化一下:
可以看到完全符合预期,再把格式化后的json串用文本编辑器打开,统计一下数目对不对,确保万无一失
到这一步,基本就完成80%了,拿到了想要的数据,只需要稍微提取一下相应的字段,就可以得到全部文章的url链接:
import requests
import time
import random
class CsdnVistor():
blog_homepage = 'https://blog.youkuaiyun.com/community/home-api/v1/get-business-list?page=1&size=100&businessType=blog&orderby=&noMore=false&year=&month=&username=hubing_hust'
def __init__(self):
pass
def visit_page(self):
all_article_urls = self.get_all_article_urls()
while True:
for url in all_article_urls:
headers = self.get_headers()
print(f'visiting article: {url}')
print(headers)
try:
res = requests.get(url, headers=headers)
print(res.status_code)
except Exception as e:
print('unexpected error happened')
time.sleep(2)
def get_all_article_urls(self):
'''
获取所有的文章链接
'''
all_article_list = []
headers = self.get_headers()
res = requests.get(url=self.blog_homepage, headers=headers)
if (200 == res.status_code):
# 若服务器成功响应,则获取响应内容并解析所有文章的url
for item in res.json()['data']['list']:
all_article_list.append(item['url'])
# print(all_article_list)
return all_article_list
def get_headers(self):
user_agent_list = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"]
userAgent = random.choice(user_agent_list)
headers = {'User-Agent': userAgent}
return headers
if __name__ == '__main__':
visitor = CsdnVistor()
visitor.visit_page()
至此,升级版自动刷阅读量的程序就完成了。执行程序就可以默默地刷阅读量了。
写在最后,虽然阅读量可以刷,但毕竟不具备真正意义上的影响力,写文章贵在坚持。