杰伦发了新歌,文山微博炸了......
1,准备
在https://m.weibo.cn/登录微博,找到要爬的微博
按F12找到这个界面,注意黄色的部分,没有内容的话就F5刷一下
2,开始
import requests
import json
import pymongo
import re
client = pymongo.MongoClient('localhost', 27017)
weibo = client['weibo']
comment_ = weibo['comment_fangwens']
headers = {
"Cookies":'你的cookie,见马赛克部分',
"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'
}
for i in range (0,101):
url_comment = 'https://m.weibo.cn/api/comments/show?id=4237925061907207&page=%d'%(i)
wb_data = requests.get(url_comment,headers=headers).text
data_comment = json.loads(wb_data)
data = data_comment['data']
for a in data['data']:
print (re.sub('<[^>]*>', '',a['text']))
3、结果
只要评论内容不关注评论人,时间,点赞数等
结果是这样的,比较简陋,但差不多够用了……
做成词云
求求文山老师救救杰伦吧!
over