在blackboard上下载xxx课程的所有作业
需求
因为新冠疫情,学校使用blackboard作为网课平台。这个平台的直播做的还可以,但是批改作业实在是不方便。自动加载的慢,每次都要点击进去手动下载贼麻烦。然而网页几乎全是JavaScript动态生成的,分析文件的下载链接比较麻烦,各种奇奇怪怪的字段还有请求头。于是就想到让chrome自动化下载。
技术
python+selenium+chromedriver,selenium配合chrome的驱动能够自动化进行测试。
selenium中文文档:Selenium with Python中文翻译文档,官方文档:Selenium with Python.
谷歌驱动淘宝镜像: ChromeDriver Mirror.
过程
-
下驱动,更新环境变量
将下载的驱动(chromewebdriver.exe)放入到chrome的安装目录中,我的是
C:\\Program Files(x86)\\Google\\Chrome\\Application
。然后更新环境变量,将安装目录放入到path中。 -
安装selenium
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple selenium
-
写代码
如下为常用代码
# 启动 browser = webdriver.Chrome() # 打开网页 browser.get("https://www.bb.ustc.edu.cn") # 查找某个元素 loginElement = browser.find_element_by_id("xxx") # 查找元素集合 loginElements = browser.find_elements_by_tag_name("xxx") # 模拟点击 loginElement.click() # 模拟输入 loginElement.send_keys(userName)
问题与解决
-
等待元素加载
动态加载的元素需要等待浏览器加载完成,对于已经加载但是
display: none
的元素,需要判断可见性,如下:try: # 判断某个元素是否已经加载 (但可能未显示) WebDriverWait(browser, 5).until( EC.presence_of_element_located((By.ID, "id")) ) # 或者直接判断是否可见 WebDriverWait(browser, 5).until( EC.visibility_of( browser.find_element_by_id("controlpanel.grade.center_groupContents")) ) except BaseException as exc: browser.quit() print("wait id timeout") print(exc) exit(0)
-
获取元素属性
元素的
attribute
与property
稍微不太一样attribute
: 表示HTML文档中写的属性,比如value
property
: 需要计算出来的属性,比如childElementCount
代码
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import selenium.common.exceptions as se_comm_except
import os
userName = "xxxx"
passWord = "xxx"
homeWorkIndex = 1
browser = webdriver.Chrome()
browser.get(
"https://www.bb.ustc.edu.cn")
browser.maximize_window()
try:
# 需要认证
goLoginElement = browser.find_element_by_tag_name('a')
goLoginElement.click()
# 登录界面
loginElement = browser.find_element_by_id("username")
passwdElement = browser.find_element_by_id("password")
loginBtn = browser.find_element_by_id("login")
loginElement.send_keys(userName)
passwdElement.send_keys(passWord)
loginBtn.click()
# 登陆后还需要再次点击
loginForm = browser.find_element_by_id("login")
loginForm.find_element_by_tag_name("a").click()
except EC.NoSuchElementException:
print("已登录")
# 进入bb
myClass = browser.find_element_by_id("Courses.label")
myClass.find_element_by_tag_name("a").click()
try:
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, "coursefakeclass"))
)
except BaseException as exc:
browser.quit()
print("wait coursefakerclass timeout")
print(exc)
exit(0)
# 进入我的课程,coursefakeclass 包含 自己的课程 和 身份是助教的课程
myAssistantTeachClass = browser.find_elements_by_class_name(
"coursefakeclass")
myAssistantTeachClass = myAssistantTeachClass[1]
# 进入 信息安全导论
myAssistantTeachClass.find_element_by_tag_name("a").click()
try:
WebDriverWait(browser, 5).until(
EC.presence_of_element_located(
(By.ID, "controlpanel.grade.center_groupExpanderLink"))
)
except BaseException as exc:
browser.quit()
print("wait controlpanel.grade.center_groupExpanderLink timeout")
print(exc)
exit(0)
# 点击评分中心
browser.find_element_by_id(
"controlpanel.grade.center_groupExpanderLink").click()
try:
WebDriverWait(browser, 5).until(
EC.visibility_of(
browser.find_element_by_id("controlpanel.grade.center_groupContents"))
)
except BaseException as exc:
browser.quit()
print("wait controlpanel.grade.center_groupContents timeout")
print(exc)
exit(0)
ul = browser.find_element_by_id("controlpanel.grade.center_groupContents")
# 点击需要评分
tmpAs = ul.find_elements_by_tag_name("a")
tmpAs[0].click()
try:
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.ID, "itemFilter"))
)
except BaseException as exc:
browser.quit()
print("wait itemFilter timeout")
print(exc)
exit(0)
# 选择第x次作业
homeWorkSelect = webdriver.support.select.Select(
browser.find_element_by_id("itemFilter"))
homeWorkSelect.select_by_index(homeWorkIndex)
browser.find_element_by_class_name("genericButton").click()
try:
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.ID, "listContainer_itemcount"))
)
except BaseException as exc:
browser.quit()
print("wait listContainer_itemcount timeout")
print(exc)
exit(0)
# 显示全部
div = browser.find_element_by_id("listContainer_itemcount")
div.find_elements_by_tag_name("a")[0].click()
# ------------------------- 包裹到函数内 ------------------------
try:
# 等待列表加载完成
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.ID, "listContainer_databody"))
)
except BaseException as exc:
browser.quit()
print("wait listContainer_databody timeout")
print(exc)
exit(0)
# 获取所有作业个数
tbody = browser.find_element_by_id("listContainer_databody")
count = tbody.get_property("childElementCount")
# 点击每一个,下载,回到上个页面
for i in range(0, count):
try:
# 等待列表加载完成
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.ID, "listContainer_databody"))
)
except BaseException as exc:
browser.quit()
print("wait listContainer_databody timeout")
print(exc)
exit(0)
# 获得第i行的a标签并点击
tmpId = "listContainer_row:" + str(i)
tr = browser.find_element_by_id(tmpId)
aElement = tr.find_element_by_tag_name("th").find_element_by_tag_name("a")
aElement.click()
# 等待元素加载出来dwnldBtn
try:
# 等待列表加载完成
WebDriverWait(browser, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, "dwnldBtn"))
)
except BaseException as exc:
browser.quit()
print("wait dwnldBtn timeout")
print(exc)
exit(0)
browser.find_element_by_class_name("dwnldBtn").click()
browser.back()
while True:
pass
后记
很无脑