58同城的字体解密(一)

在爬虫的时候,经常会遇到一些反爬机制,但在反爬中字体加密属于比较难解决的一部分。今天介绍一个比较简单的解密方法。

1、首先找到加密的字体,打开58的一个链接:https://zz.58.com/pinpaigongyu/?utm_source=market&spm=u-LlFBrx8a1luDwQM.sgppzq_zbt&PGTID=0d100000-0015-67a3-d744-1bb7d66dd6e2&ClickID=2,如下图

2、700的字体是加密的,然后找到这个标签,然后找到与之对应的标签【1】,复制,然后打开网页源码,找到标签【1】。,如下图:

3、复制AAAAAA~AAAAAA标注的这些内容。然后书写代码:

from fontTools.ttLib import TTFont
import base64
from io import BytesIO
from PIL import Image,ImageDraw,ImageFont
str = 'AAEAAAALAIAAAwAwR1NVQiCLJXoAAAE4AAAAVE9TLzL4XQjtAAABjAAAAFZjbWFwq8R/YwAAAhAAAAIuZ2x5ZuWIN0cAAARYAAADdGhlYWQT0/0FAAAA4AAAADZoaGVhCtADIwAAALwAAAAkaG10eC7qAAAAAAHkAAAALGxvY2ED7gSyAAAEQAAAABhtYXhwARgANgAAARgAAAAgbmFtZTd6VP8AAAfMAAACanBvc3QFRAYqAAAKOAAAAEUAAQAABmb+ZgAABLEAAAAABGgAAQAAAAAAAAAAAAAAAAAAAAsAAQAAAAEAAOs1n4RfDzz1AAsIAAAAAADYJlj6AAAAANgmWPoAAP/mBGgGLgAAAAgAAgAAAAAAAAABAAAACwAqAAMAAAAAAAIAAAAKAAoAAAD/AAAAAAAAAAEAAAAKADAAPgACREZMVAAObGF0bgAaAAQAAAAAAAAAAQAAAAQAAAAAAAAAAQAAAAFsaWdhAAgAAAABAAAAAQAEAAQAAAABAAgAAQAGAAAAAQAAAAEERAGQAAUAAAUTBZkAAAEeBRMFmQAAA9cAZAIQAAACAAUDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBmRWQAQJR2n6UGZv5mALgGZgGaAAAAAQAAAAAAAAAAAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAAAAAABQAAAAMAAAAsAAAABAAAAaYAAQAAAAAAoAADAAEAAAAsAAMACgAAAaYABAB0AAAAFAAQAAMABJR2lY+ZPJpLnjqeo59kn5Kfpf//AACUdpWPmTyaS546nqOfZJ+Sn6T//wAAAAAAAAAAAAAAAAAAAAAAAAABABQAFAAUABQAFAAUABQAFAAUAAAACQAHAAgABAAKAAEAAwAFAAIABgAAAQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAAAAAiAAAAAAAAAAKAACUdgAAlHYAAAAJAACVjwAAlY8AAAAHAACZPAAAmTwAAAAIAACaSwAAmksAAAAEAACeOgAAnjoAAAAKAACeowAAnqMAAAABAACfZAAAn2QAAAADAACfkgAAn5IAAAAFAACfpAAAn6QAAAACAACfpQAAn6UAAAAGAAAAAAAAACgAPgBmAJoAvgDoASQBOAF+AboAAgAA/+YEWQYnAAoAEgAAExAAISAREAAjIgATECEgERAhIFsBEAECAez+6/rs/v3IATkBNP7S/sEC6AGaAaX85v54/mEBigGB/ZcCcwKJAAABAAAAAAQ1Bi4ACQAAKQE1IREFNSURIQQ1/IgBW/6cAicBWqkEmGe0oPp7AAEAAAAABCYGJwAXAAApATUBPgE1NCYjIgc1NjMyFhUUAgcBFSEEGPxSAcK6fpSMz7y389Hym9j+nwLGqgHButl0hI2wx43iv5D+69b+pwQAAQAA/+YEGQYnACEAABMWMzI2NRAhIzUzIBE0ISIHNTYzMhYVEAUVHgEVFAAjIiePn8igu/5bgXsBdf7jo5CYy8bw/sqow/7T+tyHAQN7nYQBJqIBFP9uuVjPpf7QVwQSyZbR/wBSAAACAAAAAARoBg0ACgASAAABIxEjESE1ATMRMyERNDcjBgcBBGjGvv0uAq3jxv58BAQOLf4zAZL+bgGSfwP8/CACiUVaJlH9TwABAAD/5gQhBg0AGAAANxYzMjYQJiMiBxEhFSERNjMyBBUUACEiJ7GcqaDEx71bmgL6/bxXLPUBEv7a/v3Zbu5mswEppA4DE63+SgX42uH+6kAAAAACAAD/5gRbBicAFgAiAAABJiMiAgMzNjMyEhUUACMiABEQACEyFwEUFjMyNjU0JiMiBgP6eYTJ9AIFbvHJ8P7r1+z+8wFhASClXv1Qo4eAoJeLhKQFRj7+ov7R1f762eP+3AFxAVMBmgHjLfwBmdq8lKCytAAAAAABAAAAAARNBg0ABgAACQEjASE1IQRN/aLLAkD8+gPvBcn6NwVgrQAAAwAA/+YESgYnABUAHwApAAABJDU0JDMyFhUQBRUEERQEIyIkNRAlATQmIyIGFRQXNgEEFRQWMzI2NTQBtv7rAQTKufD+3wFT/un6zf7+AUwBnIJvaJLz+P78/uGoh4OkAy+B9avXyqD+/osEev7aweXitAEohwF7aHh9YcJlZ/7qdNhwkI9r4QAAAAACAAD/5gRGBicAFwAjAAA3FjMyEhEGJwYjIgA1NAAzMgAREAAhIicTFBYzMjY1NCYjIga5gJTQ5QICZvHD/wABGN/nAQT+sP7Xo3FxoI16pqWHfaTSSgFIAS4CAsIBDNbkASX+lf6l/lP+MjUEHJy3p3en274AAAAAABAAxgABAAAAAAABAA8AAAABAAAAAAACAAcADwABAAAAAAADAA8AFgABAAAAAAAEAA8AJQABAAAAAAAFAAsANAABAAAAAAAGAA8APwABAAAAAAAKACsATgABAAAAAAALABMAeQADAAEECQABAB4AjAADAAEECQACAA4AqgADAAEECQADAB4AuAADAAEECQAEAB4A1gADAAEECQAFABYA9AADAAEECQAGAB4BCgADAAEECQAKAFYBKAADAAEECQALACYBfmZhbmdjaGFuLXNlY3JldFJlZ3VsYXJmYW5nY2hhbi1zZWNyZXRmYW5nY2hhbi1zZWNyZXRWZXJzaW9uIDEuMGZhbmdjaGFuLXNlY3JldEdlbmVyYXRlZCBieSBzdmcydHRmIGZyb20gRm9udGVsbG8gcHJvamVjdC5odHRwOi8vZm9udGVsbG8uY29tAGYAYQBuAGcAYwBoAGEAbgAtAHMAZQBjAHIAZQB0AFIAZQBnAHUAbABhAHIAZgBhAG4AZwBjAGgAYQBuAC0AcwBlAGMAcgBlAHQAZgBhAG4AZwBjAGgAYQBuAC0AcwBlAGMAcgBlAHQAVgBlAHIAcwBpAG8AbgAgADEALgAwAGYAYQBuAGcAYwBoAGEAbgAtAHMAZQBjAHIAZQB0AEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAAIAAAAAAAAAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwECAQMBBAEFAQYBBwEIAQkBCgELAQwAAAAAAAAAAAAAAAAAAAAA'
fanti = '餼麣麣'
def make_font_file(base64_string:str):
    #将base64编码的字体字符串解码成二进制编码
    bin_data = base64.decodebytes(base64_string.encode())
    with open('textwoff.woff','wb') as f:
        f.write(bin_data)
    return bin_data

#第一种:生成XML文件,查看对应编码
def convert_font_to_xml(bin_data):
    #ByteIO把一个二进制内存块当成文件来操作,
    font = TTFont(BytesIO(bin_data))
    #将解码字体保存为xml
    font.saveXML("text.xml")

s = make_font_file(base64_string=str)
convert_font_to_xml(s)

4、然后生成一个xml文件,找到文件中的cmap模块:

<cmap>
    <tableVersion version="0"/>
    <cmap_format_4 platformID="0" platEncID="3" language="0">
      <map code="0x9476" name="glyph00009"/><!-- CJK UNIFIED IDEOGRAPH-9476 -->
      <map code="0x958f" name="glyph00007"/><!-- CJK UNIFIED IDEOGRAPH-958F -->
      <map code="0x993c" name="glyph00008"/><!-- CJK UNIFIED IDEOGRAPH-993C -->
      <map code="0x9a4b" name="glyph00004"/><!-- CJK UNIFIED IDEOGRAPH-9A4B -->
      <map code="0x9e3a" name="glyph00010"/><!-- CJK UNIFIED IDEOGRAPH-9E3A -->
      <map code="0x9ea3" name="glyph00001"/><!-- CJK UNIFIED IDEOGRAPH-9EA3 -->
      <map code="0x9f64" name="glyph00003"/><!-- CJK UNIFIED IDEOGRAPH-9F64 -->
      <map code="0x9f92" name="glyph00005"/><!-- CJK UNIFIED IDEOGRAPH-9F92 -->
      <map code="0x9fa4" name="glyph00002"/><!-- CJK UNIFIED IDEOGRAPH-9FA4 -->
      <map code="0x9fa5" name="glyph00006"/><!-- CJK UNIFIED IDEOGRAPH-9FA5 -->
    </cmap_format_4>
    <cmap_format_12 platformID="0" platEncID="4" format="12" reserved="0" length="136" language="0" nGroups="10">
      <map code="0x9476" name="glyph00009"/><!-- CJK UNIFIED IDEOGRAPH-9476 -->
      <map code="0x958f" name="glyph00007"/><!-- CJK UNIFIED IDEOGRAPH-958F -->
      <map code="0x993c" name="glyph00008"/><!-- CJK UNIFIED IDEOGRAPH-993C -->
      <map code="0x9a4b" name="glyph00004"/><!-- CJK UNIFIED IDEOGRAPH-9A4B -->
      <map code="0x9e3a" name="glyph00010"/><!-- CJK UNIFIED IDEOGRAPH-9E3A -->
      <map code="0x9ea3" name="glyph00001"/><!-- CJK UNIFIED IDEOGRAPH-9EA3 -->
      <map code="0x9f64" name="glyph00003"/><!-- CJK UNIFIED IDEOGRAPH-9F64 -->
      <map code="0x9f92" name="glyph00005"/><!-- CJK UNIFIED IDEOGRAPH-9F92 -->
      <map code="0x9fa4" name="glyph00002"/><!-- CJK UNIFIED IDEOGRAPH-9FA4 -->
      <map code="0x9fa5" name="glyph00006"/><!-- CJK UNIFIED IDEOGRAPH-9FA5 -->
    </cmap_format_12>
    <cmap_format_0 platformID="1" platEncID="0" language="0">
    </cmap_format_0>
    <cmap_format_4 platformID="3" platEncID="1" language="0">
      <map code="0x9476" name="glyph00009"/><!-- CJK UNIFIED IDEOGRAPH-9476 -->
      <map code="0x958f" name="glyph00007"/><!-- CJK UNIFIED IDEOGRAPH-958F -->
      <map code="0x993c" name="glyph00008"/><!-- CJK UNIFIED IDEOGRAPH-993C -->
      <map code="0x9a4b" name="glyph00004"/><!-- CJK UNIFIED IDEOGRAPH-9A4B -->
      <map code="0x9e3a" name="glyph00010"/><!-- CJK UNIFIED IDEOGRAPH-9E3A -->
      <map code="0x9ea3" name="glyph00001"/><!-- CJK UNIFIED IDEOGRAPH-9EA3 -->
      <map code="0x9f64" name="glyph00003"/><!-- CJK UNIFIED IDEOGRAPH-9F64 -->
      <map code="0x9f92" name="glyph00005"/><!-- CJK UNIFIED IDEOGRAPH-9F92 -->
      <map code="0x9fa4" name="glyph00002"/><!-- CJK UNIFIED IDEOGRAPH-9FA4 -->
      <map code="0x9fa5" name="glyph00006"/><!-- CJK UNIFIED IDEOGRAPH-9FA5 -->
    </cmap_format_4>
    <cmap_format_12 platformID="3" platEncID="10" format="12" reserved="0" length="136" language="0" nGroups="10">
      <map code="0x9476" name="glyph00009"/><!-- CJK UNIFIED IDEOGRAPH-9476 -->
      <map code="0x958f" name="glyph00007"/><!-- CJK UNIFIED IDEOGRAPH-958F -->
      <map code="0x993c" name="glyph00008"/><!-- CJK UNIFIED IDEOGRAPH-993C -->
      <map code="0x9a4b" name="glyph00004"/><!-- CJK UNIFIED IDEOGRAPH-9A4B -->
      <map code="0x9e3a" name="glyph00010"/><!-- CJK UNIFIED IDEOGRAPH-9E3A -->
      <map code="0x9ea3" name="glyph00001"/><!-- CJK UNIFIED IDEOGRAPH-9EA3 -->
      <map code="0x9f64" name="glyph00003"/><!-- CJK UNIFIED IDEOGRAPH-9F64 -->
      <map code="0x9f92" name="glyph00005"/><!-- CJK UNIFIED IDEOGRAPH-9F92 -->
      <map code="0x9fa4" name="glyph00002"/><!-- CJK UNIFIED IDEOGRAPH-9FA4 -->
      <map code="0x9fa5" name="glyph00006"/><!-- CJK UNIFIED IDEOGRAPH-9FA5 -->
    </cmap_format_12>
  </cmap>

5、查看加密字体的编码:


print(fanti[0].encode('unicode-escape'))

#输出结果
b'\\u993c'

6、查看与code对应的name中的最后一个数字:减一,然后即为加密后的数字。本文为700.

 

### 58同城数字乱码解决方案 在处理58同城或其他类似网站上的数字乱码问题时,通常涉及字体加密技术。这种技术通过自定义字体文件(如woff、ttf等)映射特定字符到实际数值,从而达到隐藏真实信息的目的。以下是具体的解决方法: #### 字体解密原理 网页源码中显示的特殊字符实际上是通过CSS `@font-face`规则加载的自定义字体文件渲染出来的[^1]。这些字符并非普通的Unicode字符,而是经过重新编码后的替代字符。例如,“鑶”这个字符实际上代表的是数字“6”。要还原真实的数字,需执行以下操作: 1. **提取字体文件**:找到页面使用的自定义字体文件链接,并下载该文件。 2. **解析字体映射关系**:利用工具(如FontForge或Glyphhanger)打开字体文件,查看其内部的glymph-id与字符之间的映射表。 #### Python自动化解密脚本 为了提高效率,可以编写Python脚本来自动完成这过程。下面是个简单的示例代码,用于获取并解析字体文件中的映射关系: ```python import requests from fontTools.ttLib import TTFont def download_font(font_url, save_path): response = requests.get(font_url) with open(save_path, 'wb') as f: f.write(response.content) def parse_font_mapping(font_file): font = TTFont(font_file) mapping = {glyph: char for glyph, char in font['cmap'].getcmap(3, 1).cmap.items()} return mapping # 示例调用 font_url = "http://example.com/path/to/font.woff" save_path = "./font.woff" download_font(font_url, save_path) mapping = parse_font_mapping(save_path) print(mapping) ``` 这段代码实现了两个核心功能:是从指定URL下载字体文件;二是读取并打印出字体文件内的字符映射表[^2]。 #### 动态监测与适配 考虑到目标站点可能会频繁更改其字体文件版本号或者结构调整方式,在长期维护爬虫项目时建议加入动态检测逻辑。比如定期对比新旧字体文件是否有差异,如果发现变化则重新生成最新的映射字典。 --- ### 注意事项 - 确保遵守相关法律法规以及目标网站的服务条款,在合法范围内开展活动。 - 对于复杂的前端框架(如React/Vue),可能还需要额外考虑JS混淆等因素的影响。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

程序员日子

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值