【requests:动态网页爬取】慕课评价

本文介绍了一种使用Python爬取特定在线课程评论的方法,并将其保存为CSV文件以供进一步分析。通过定义请求头、URL及POST数据参数,实现对指定课程页面的评论抓取。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在这里插入图片描述

import requests
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", None)
# 定义表头、url和post的data参数
header = {'cookie': 'EDUWEBDEVICE=bb5489f443964ee181e9a14c09814664; __yadk_uid=LqHB5kOEBbbL0kPIwxiZueVnLhai0zBk; WM_TID=tqU6%2FbJoxCZAVVFQFQJ6VjsKmzlbjpwz; bpmns=1; hasVolume=true; videoVolume=1; videoRate=2; NTESSTUDYSI=ac857d3f7aa1456f9fb9c3405297f4b9; Hm_lvt_77dc9a9d49448cf5e629e5bebaa5500b=1603701352,1603770381; WM_NI=q7QCsILAV4vfIImy2UAvO4mSrgpg1iZ8UugKWgYZn7COdjI8ycH8ubbHV2TXgaXDwXNRhprTmXmZK6eC4%2BGlDnFPBKjSNbwYiXTSzw3zQQdgMPPSjJuoXE9bs644ix58dEk%3D; WM_NIKE=9ca17ae2e6ffcda170e2e6eeb0ae49a5bab9b7db2592a88ab3c44f978a9faff842afed9aabb43f898b8cb1ec2af0fea7c3b92ab3e99fa4f243909f9ad5f56e8b97a39ab15f83aa81d5f059b2b9c088b4639ae89b96e64195b997bbed3d8191ac8bcc34bbbd88d9f86aa2989ad3d754a5bf98a5f6548e8ea891f160af88a3bbca3ea79dfcccc66888b28ab1c141aa8d8283f653ab889cb4f97df1a69fb2e76d8f90a0b4e942b7b198a6bc6ab1eafcd5ec74f7e8aca6ea37e2a3; Hm_lpvt_77dc9a9d49448cf5e629e5bebaa5500b=1603784565',
          'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'}
url = 'https://www.icourse163.org/web/j/mocCourseV2RpcBean.getCourseEvaluatePaginationByCourseIdOrTermId.rpc?csrfKey=ac857d3f7aa1456f9fb9c3405297f4b9'
dat = {'courseId': '1002421002',
       'pageIndex': '48',
       'pageSize': '20',
       'orderBy': '3'}
# 定义爬取函数 
def get_comment(dat):
    res = requests.post(url, headers=header, data=dat)
    result_1 = res.json()
    mark = [i['mark'] for i in result_1['result']['list']]
    content = [i['content'] for i in result_1['result']['list']]
    commentorId = [i['commentorId'] for i in result_1['result']['list']]
    userNickName = [i['userNickName'] for i in result_1['result']['list']]
    termId = [i['termId'] for i in result_1['result']['list']]
    gmtModified = [i['gmtModified'] for i in result_1['result']['list']]
    comment = pd.DataFrame({'mark': mark, 'content': content, 'commentorId': commentorId,
                            'userNickName': userNickName, 'termId': termId, 'gmtModified': gmtModified})
    return comment
# 进行测试
test = get_comment(dat)
comments = pd.DataFrame(columns=test.columns)
# 数据爬取
for i in range(49):
    dat_new = {'courseId': '1002421002',
               'pageIndex': i,
               'pageSize': '20',
               'orderBy': '3'}
    comment = get_comment(dat_new)
    comments = comments.append(comment)
# 本地化保存
comments.reset_index(drop=True, inplace=True)
comments.to_csv('comments.csv', encoding='utf-8-sig')
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值