爬取学校成绩(哈尔滨工程大学)

使用语言:python

一、登录POST接口

首先F12观察登录按钮,然后查看请求的url
在这里插入图片描述
我们得到以下信息:

Request URL: https://cas.hrbeu.edu.cn/cas/login
Request Method: POST

在这里插入图片描述

二、验证码的生成和验证代码分析:

① 两张验证码数据照片
data:image/jpeg;base64,/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0a%0AHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIy%0AMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAeAFADASIA%0AAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA%0AAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3%0AODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm%0Ap6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA%0AAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx%0ABhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK%0AU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3%0AuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iii%0AgBskiRLuc4BOB3yaRJUdC6sNoyDnjGPX0pWQFg+0F1B2596rQokslwsqjzCw3r2wPu/WpbdyHJp2%0AJobmGckRuGI5Ixik+1wed5PmDfnGPf60wAz3gkwRHDkA9Mt0P4VA+E0YFGPAU5B6HIP86nmdjNzk%0Alf1/AvO6oBuIGSAPcmnVTvUXzLZ+dwlUde1XKtPVo1Um20FFFFMoKKKKAIpojJtZJDHIvRhz9eO9%0AN+zsIpAspEr43SbRmp6KXKiXBN3K0FvNCVH2gNGv8AjA/Wg2YJC+Y3kg7vLwMZzn8varNFLlWwvZ%0AxtYimh87y/mxscP064qWiinYqyvcKKKKYz//2Q==
data:image/jpeg;base64,/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0a%0AHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIy%0AMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAeAFADASIA%0AAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA%0AAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3%0AODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm%0Ap6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA%0AAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx%0ABhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK%0AU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3%0AuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iii%0AgChq+tafoVot1qVx5ELuI1bYzZYgnGFBPQGjSNa0/XbR7rTbjz4Ucxs2xlwwAOMMAehFVNZ0/Vbz%0AVNJnsLq2hhtXkklE8ZfLFCikAEE4DP8AxDrnnGKy7nxVf2dhrkL2kVzqumPEiiFWEc3nECMhcls8%0A8rk8jg88MR00N7BcXVzbRMxktmVZcowAJUMAGIwTgg8E4yM9aoHxRow1VdNN8oumlMCgo2wyAAlA%0A+Nu4ZAxnOSB14rI0aDV/Dd5pmm3EdjPaXu/zrmEOsv2nYXZnLMTJuCH5sD6KAAcC0l1Kz8D2Sajb%0AxQ6Gl1C/2uM7J/s29XR/LQ/KS20FgxYZJ2k/MSwXO71HXtN0qXyrudhJ5TTlI4nkKxjguwQHavuc%0ADr6VfiljniSWJ1kjdQyOhyGB5BB7is6/trixhvrzRbC2m1K5+aTzpCm8qmF5wc4wAFyo5PIyTVfw%0AcLAeEdNGmszWoi4Z1Kktk7yQScHdu4yR6HGKQzcooooAKKKKAMjVtFlvblL6wv20/UUiMK3CwpIG%0AQsrFWDDkfLxgjBJ9aqx+ELR9H1Cyv7ie8l1Fg91dPhXZgBjbgcKpGVU5A6ciuhooCxgaboOoR38V%0A5rGuS6m9vk26CBYEjYgqWIU/McEgZ6ZPrxQHgq5a3j0ufX7m40FNo+wywoXZVIIUyjBxuA6AYHAx%0A1rrqKdxWMO90K8fVWvdM1mfT1nZTdxLEsolKgAFd+QjbRgkAg4XjjnS03T7fStNt7C1XbDAgReAC%0AcdScADJPJPck1aopDCiiigD/2Q==
②验证码的html格式
<input data-v-4a4fe570="" id="captcha" name="captcha" placeholder="验证码" accesskey="c" type="text" value="" size="10" autocomplete="off" class="input1 required input4" style="width: 140px; margin-left: -3px;">
③验证码的识别

使用了python第三方库

import muggle_ocr

具体识别代码如下

url='https://cas-443.wvpn.hrbeu.edu.cn/sso/apis/v2/open/captcha?captchaSize=4'
      response=requests.get(url).content
      response=response.decode(encoding="utf-8")
      response=json.loads(response)
      response=response['img']
      response=b64decode(response)
      sdk = muggle_ocr.SDK(model_type=muggle_ocr.ModelType.Captcha)
      return sdk.predict(response)
接受格式

三、登录的后台逻辑(疯狂找bug)

按钮的按下事件
在这里插入图片描述

!hn.a.prototype.$isServer && document.addEventListener("click", function(e) 
 t = o._wrapper = function(e)

结果屁也没找到………………

四、根据数据常量寻找逻辑

USER_NOT_FOUND: "用户未找到",
ACCOUNT_NOT_FOUND: "账号不存在"
INVALID_PASSWORD: "账号或密码错误"
ACCOUNT_LOCK: "账号锁定"
IN_NOTIFICATION_INTERVAL: "发送通知过于频繁,请1分钟之后再试",
INVALID_CAPTCHA: "无效的验证码",
VERIFICATION_FAILED: "验证失败",                   

然后找到了一个发送错误信息的函数: 注意到按照"errorcode"发送错误信息

showError: function() {
                    var e = document.getElementById("errorcode").getAttribute("value")
                      , t = "";
                    t = e ? h.a.showError(e) : document.getElementById("errormes").getAttribute("value"),
                    this.errorMessage = t,
                    t && this.$message({
                        type: "warning",
                        message: this.errorMessage
                    })
                },

五、识别验证码(使用opencv手写或者使用api)

六、查找成绩查询事件

function queryKscj() {
	document.forms["kscjQueryForm"].action = "/jsxsd/kscj/cjcx_list";
	document.forms["kscjQueryForm"].submit();
}

七、部分网站代码分析,学习

进入代码中寻找点击事件,弹出消息框

 submitClick: function() {
                        if (!this.validate()) return !1;
                        this.transform(), document.getElementById("login-form").submit()
                    },

给弹出的消息赋值,三个码不能为空(账户,密码,验证码)

validate: function() {
                        var e = !0;
                        return e && (e = this.valiFlag(h.a.checkNull(document.getElementById("username").value, "您的用户名不能为空"))), e && (e = this.valiFlag(h.a.checkNull(document.getElementById("password").value, "您的密码不能为空"))), this.ishasCode && e && (e = this.valiFlag(h.a.checkNull(document.getElementById("captcha").value, "您的验证码不能为空"))), e
                    },
transform: function() {
                        for (var e = document.getElementsByClassName("for-form"), t = document.getElementById("login-form"), n = 0; n < e.length; n++) t.appendChild(e[n]);
                        this.loginLoginWay()
                    },

(3)设置登录类型,(属于是删除元素,初始化??)

单点登录(SingleSignOn,SSO),就是通过用户的一次性鉴别登录。
CAS (Central Authentication Service)中心授权服务,本身是一个开源协议

 loginLoginWay: function() {
                        if ((document.querySelector("#login-type").getAttribute("value") ? document.querySelector("#login-type").getAttribute("value") : "").toLowerCase().indexOf("cas") >= 0) {
                            document.querySelector("form#login-form").setAttribute("action", "/cas/login");
                            var e = document.querySelector('input[name="pid"]');
                            e.parentNode.removeChild(e)
                        } else {
                            document.querySelector("form#login-form").setAttribute("action", "/sso/login");
                            var t = document.querySelector('input[name="execution"]');
                            t.parentNode.removeChild(t);
                            var n = document.querySelector('input[name="_eventId"]');
                            n.parentNode.removeChild(n);
                            var i = document.querySelector('input[name="lt"]');
                            i.parentNode.removeChild(i)
                        }
                    },
名称用法
document.querySelector(.CSS)返回文档中匹配指定 CSS 选择器的一个元素
document.querySelector(#id)返回文档中匹配指定 id的一个元素
getAttribute获得指定属性名的值
条件?满足执行:否则三目运算符
toLowerCase()字符串变为小写
indexOf指定的字符串在字符串中首次出现的位置
appendChild在列表中插入新的元素

如果id是login-type的控件的value里面含有cas:

if ((document.querySelector("#login-type").getAttribute("value") ? document.querySelector("#login-type").getAttribute("value") : "").toLowerCase().indexOf("cas") >= 0)
名称用法
setAttribute设置对应属性的值
removeChild从列表中删除对应的元素
.parentNode返回 <li> 元素的 parentNode(父节点)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值