本文借鉴58同城的招聘网站,进行学习,爬取字体加密的部分
如图可视,源码中的字体信息都被问号 ?代替,所以正常爬去的话只会爬取到这些问号。
右键页面,选择查看源代码。
可以看到在<style>
里有一段复杂的代码,其中可以看到base64的字样,便可以猜测这个字体加密就是base64字体加密
可以先去了解一下base64加密的原理
知道base64加密后,我们即可进行对应的字体映射
import requests
import re
import base64
from fontTools.ttLib import TTFont
if __name__ == '__main__':
url = "https://wh.58.com/searchjob/"
headers = {
"authority": "wh.58.com",
"method": "GET",
"path": "/searchjob/pn2/?param8616=0&PGTID=0d302409-0009-e61a-20f5-358c3716a941&ClickID=15",
"scheme": "https",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"cache-control": "max-age=0",
"cookie": 'f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; myLat=""; myLon=""; id58=ETqdRGE6++PYpQ6p3yNSQw==; mcity=wh; f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; city=wh; 58home=wh; commontopbar_ipcity=wh%7C%E6%AD%A6%E6%B1%89%7C0; 58tj_uuid=47caedb4-55a5-4a44-bb57-91bb7ce7b1c8; als=0; wmda_uuid=1cdfdf35506967429f31843b6f1bd427; wmda_new_uuid=1; xxzl_deviceid=Sr88BtIOyaCrETs%2Btdz%2FOETVatRQ8IXBAmHknmK2TR%2FArCWLa1XEpB11ixkOtJEH; sessionid=ec391b2a-d1fc-43f9-bd13-0d1772279374; param8716kop=1; wmda_visited_projects=%3B11187958619315%3B1731916484865%3B10104579731767; www58com="UserID=64731485788170&UserName=a7nfk1acl"; 58cooper="userid=64731485788170&username=a7nfk1acl"; 58uname=a7nfk1acl; passportAccount="atype=0&bstate=0"; fzq_h=cb094e9752a76e0476beca234279f9ad_1631255925348_b4fcd7a5eb1849ee921d695bc765d9ce_974345439; xxzl_smartid=e5fa2f7b05d87c127291a8b9c869d017; Hm_lvt_5bcc464efd3454091cf2095d3515ea05=1631255952; gr_user_id=c1d86aed-2c52-438e-ae31-2b78bafda3b6; fzq_js_zhaopin_list_pc=9186665da5bdeef3325b098477c9314f_1631256409752_7; Hm_lpvt_5bcc464efd3454091cf2095d3515ea05=1631256410; fzq_js_infodetailweb=a262eaeae94a0e65caa2c2f5a37bb7ff_1631257205157_6; Hm_lvt_b2c7b5733f1b8ddcfc238f97b417f4dd=1631257205; Hm_lpvt_b2c7b5733f1b8ddcfc238f97b417f4dd=1631257205; ppStore_fingerprint=C335BE454FB81AD76CE728603E7B7AB9401A07A52F0B3EBF%EF%BC%BF1631257206262; Hm_lvt_a3013634de7e7a5d307653e15a0584cf=1631259334; isSmartSortTipShowed=true; param8616=0; ljrzfc=1; wmda_session_id_1731916484865=1631263358082-ff45bcc2-a08f-52b5; utm_source=; new_uv=3; init_refer=https%253A%252F%252Fwh.58.com%252F; spm=; new_session=0; PPU="UID=64731485788170&UN=a7nfk1acl&TT=69a7a0ac8dd2aadfd2c4fae1f8a8d5a4&PBODY=dFCDDZaugxF1PCdTPzBsEOwwvP98N8Eh0F3ukdBsvIvHs_rTw7-qXNvOcxnPe4ghLDGAhk7Y0tD6xi62ED9_zEe5elBOieXR677BUFJd6nrDbawquBCM9IqSxdGv7Z0RyNZ1R8jTu5COSveRJM290xuWxtlCNSih1bZnJHtSWjE&VER=1&CUID=a04Tf6vLsnilkXjPOsrH0Q"; JSESSIONID=D6B42D72127A83BC043DFEDA6EE25517; jl_list_left_banner=9; Hm_lpvt_a3013634de7e7a5d307653e15a0584cf=1631265120; xxzl_cid=20862149cfa1444ab8db2d95095022c9; xzuid=168585f4-7646-4762-b558-26c3057c9b32',
"referer": "https://wh.58.com/searchjob/?param8616=0&PGTID=0d302409-0009-e9c1-b0be-53e53305b2b2&ClickID=5",
"sec-ch-ua": '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "'Windows'",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
}
response = requests.get(url=url, headers=headers).text
# print(response)
result = re.search(r"base64,(.*?)\)", response, flags=re.S).group(1)
# print(result)
b = base64.b64decode(result)
# print(b)
with open("ztku2.woff", "wb") as f:
f.write(b)
fonts = TTFont("ztku2.woff")
fonts.saveXML("ztku2.xml")
生成两个文件,woff和xml文件,接下来需要用到FontCreator 这个软件,可以去网上下载免费的资源
下载完成后打开我们的FontCreator,将woff两个文件(程序运行两次,让其生成两个不一样的文件)打开
打开后是这个页面,然后我们点击第一个woff1文件的第一个字——生 看到的提示框中的最后一行
$E082
第二个图中的 生 字的最后一行代码为
$EC04
记下和两个数字,我们再打开两个xml文件
分别找到这两个代码对应的<contour>
,发现想x, y就像是坐标一样,
我们试着用图中框选的两个坐标 x1-x2 y1-y2
两个都相减一下 就会有惊奇的发现
发现两个相同的值,他们的前两个值坐标相减是同一个值,因此我们就可以找到他们的规律了。我们把每个规律都找出来,然后用一个字典存储起来,再让其每一个
-年
就可以用replace进行替换,把源代码中的特殊代码换写成我们的字体,即可完成破解啦!