builtwith.parse时报UnicodeDecodeError错

本文介绍如何使用Builtwith库解析网站所使用的技术,并解决了特定网页解析时出现的UnicodeDecodeError错误。通过更换为网站的顶级域名,成功获取了网站使用的多项技术信息。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在使用builtwith识别网站所用技术时,如果具体到某个网页,如下图

cmd
pip install --upgrade builtwith
import builtwith
builtwith.parse('http://data.eastmoney.com/zjlx/300409.html')

就会报如下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "F:\TopQuant\zwPython\py35\python35\lib\site-packages\builtwith\__init__.py", line 65, in builtwith
    if contains(html, snippet):
  File "F:\TopQuant\zwPython\py35\python35\lib\site-packages\builtwith\__init__.py", line 110, in contains
    v = v.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1249: invalid start byte
>>> help(builtwith.parse)
Help on function builtwith in module builtwith:

builtwith(url, headers=None, html=None, user_agent='builtwith')
    Detect the technology used to build a website

    >>> builtwith('http://wordpress.com')
    {u'blogs': [u'PHP', u'WordPress'], u'font-scripts': [u'Google Font API'], u'web-servers': [u'Nginx'], u'javascript-frameworks': [u'Modernizr'], u'programming-languages': [u'PHP'], u'cms': [u'WordPress']}
    >>> builtwith('http://webscraping.com')
    {u'javascript-frameworks': [u'jQuery', u'Modernizr'], u'web-frameworks': [u'Twitter Bootstrap'], u'web-servers': [u'Nginx']}
    >>> builtwith('http://microsoft.com')
    {u'javascript-frameworks': [u'jQuery'], u'mobile-frameworks': [u'jQuery Mobile'], u'operating-systems': [u'Windows Server'], u'web-servers': [u'IIS']}
    >>> builtwith('http://jquery.com')
    {u'cdn': [u'CloudFlare'], u'web-servers': [u'Nginx'], u'javascript-frameworks': [u'jQuery', u'Modernizr'], u'programming-languages': [u'PHP'], u'cms': [u'WordPress'], u'blogs': [u'PHP', u'WordPress']}
    >>> builtwith('http://joomla.org')
    {u'font-scripts': [u'Google Font API'], u'miscellaneous': [u'Gravatar'], u'web-servers': [u'LiteSpeed'], u'javascript-frameworks': [u'jQuery'], u'programming-languages': [u'PHP'], u'web-frameworks': [u'Twitter Bootstrap'], u'cms': [u'Joomla'], u'video-players': [u'YouTube']}

通过帮助,我们看到例子中builtwith.parse的参数网址并没有具体某个网页,而是在整个网站级别的网址。因为我们的目的就是识别整个网站所使用的技术,所以,将以上网址替换成其所在网站的总网址即可。如下,

>>> builtwith.parse('http://eastmoney.com')
{'analytics': ['TrackJs'], 'web-servers': ['Nginx'], 'javascript-frameworks': ['jQuery', 'RightJS']}


C:\Users\陈柯嘉>pip install pandas Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting pandas Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/d6/9f8431bacc2e19dca897724cd097b1bb224a6ad5433784a44b587c7c13af/pandas-2.2.3.tar.gz (4.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 20.4 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error × Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 2 ╰─> [32 lines of output] + meson setup C:\Users\陈柯嘉\AppData\Local\Temp\pip-install-1zmbipo9\pandas_f3dee660ca06481fa639fc991b6efdb4 C:\Users\陈柯嘉\AppData\Local\Temp\pip-install-1zmbipo9\pandas_f3dee660ca06481fa639fc991b6efdb4\.mesonpy-bg46k2qm\build -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --vsenv --native-file=C:\Users\陈柯嘉\AppData\Local\Temp\pip-install-1zmbipo9\pandas_f3dee660ca06481fa639fc991b6efdb4\.mesonpy-bg46k2qm\build\meson-python-native-file.ini Traceback (most recent call last): File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\mesonmain.py", line 194, in run return options.run_func(options) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\msetup.py", line 358, in run app.generate() File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\msetup.py", line 178, in generate env = environment.Environment(self.source_dir, self.build_dir, self.options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\environment.py", line 552, in __init__ config = coredata.parse_machine_files(self.coredata.config_files) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\coredata.py", line 1030, in parse_machine_files parser = MachineFileParser(filenames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\coredata.py", line 972, in __init__ self.parser.read(filenames) File "C:\Users\陈柯嘉\AppData\Local\Temp\pip-build-env-9lvekt6r\overlay\Lib\site-packages\mesonbuild\coredata.py", line 959, in read return super().read(filenames, encoding) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\陈柯嘉\AppData\Local\Programs\Python\Python312-32\Lib\configparser.py", line 684, in read self._read(fp, filename) File "C:\Users\陈柯嘉\AppData\Local\Programs\Python\Python312-32\Lib\configparser.py", line 999, in _read for lineno, line in enumerate(fp, start=1): ^^^^^^^^^^^^^^^^^^^^^^ File "<frozen codecs>", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 33: invalid start byte ERROR: Unhandled python exception This is a Meson bug and should be reported! [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. 怎么解决
05-22
(base) hushengkai@2fa4991e510b:~$ pip3 install torch --trusted-host pypi.org --trusted-host files.pythonhosted.org Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))': /simple/torch/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))': /simple/torch/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))': /simple/torch/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))': /simple/torch/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))': /simple/torch/ Could not fetch URL https://pypi.tuna.tsinghua.edu.cn/simple/torch/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Max retries exceeded with url: /simple/torch/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))) - skipping ERROR: Could not find a version that satisfies the requirement torch (from versions: none) ERROR: No matching distribution found for torch Could not fetch URL https://pypi.tuna.tsinghua.edu.cn/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1006)'))) - skipping (base) hushengkai@2fa4991e510b:~$ pip3 install torch -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn Defaulting to user installation because normal site-packages is not writeable Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple ERROR: Exception: Traceback (most recent call last): File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper status = run_func(*args) ^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/cli/req_command.py", line 248, in wrapper return func(self, options, args) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/commands/install.py", line 377, in run requirement_set = resolver.resolve( ^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve result = self._result = resolver.resolve( ^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve state = resolution.resolve(requirements, max_rounds=max_rounds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve self._add_to_criteria(self.state.criteria, r, parent=None) File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria if not criterion.candidates: File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in __bool__ return bool(self._sequence) ^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__ return any(self) ^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr> return (c for c in iterator if id(c) not in self._incompatible_ids) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 44, in _iter_built for version, func in infos: File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 279, in iter_index_candidate_infos result = self._finder.find_best_candidate( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/package_finder.py", line 890, in find_best_candidate candidates = self.find_all_candidates(project_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/package_finder.py", line 831, in find_all_candidates page_candidates = list(page_candidates_it) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/sources.py", line 134, in page_candidates yield from self._candidates_from_page(self._link) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/package_finder.py", line 795, in process_project_url page_links = list(parse_links(index_response)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/collector.py", line 223, in wrapper_wrapper return list(fn(page)) ^^^^^^^^^^^^^^ File "/home/shared/anaconda3/lib/python3.11/site-packages/pip/_internal/index/collector.py", line 246, in parse_links parser.feed(page.content.decode(encoding)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 264: invalid start byte WARNING: There was an error checking the latest version of pip.为什么还是出问题了
07-08
Installing build dependencies ... done Getting requirements to build wheel ... error ERROR: Command errored out with exit status 1: command: /usr/bin/python3 /tmp/tmpcs883rgu get_requires_for_build_wheel /tmp/tmp_eewwpk1 cwd: /tmp/pip-install-q66rq54l/pyulog Complete output (87 lines): configuration error: `project.license` must be valid exactly by one definition (2 matches found): - keys: 'file': {type: string} required: ['file'] - keys: 'text': {type: string} required: ['text'] DESCRIPTION: `Project license <https://peps.python.org/pep-0621/#license>`_. GIVEN VALUE: "BSD-3-Clause" OFFENDING RULE: 'oneOf' DEFINITION: { "oneOf": [ { "properties": { "file": { "type": "string", "$$description": [ "Relative path to the file (UTF-8) which contains the license for the", "project." ] } }, "required": [ "file" ] }, { "properties": { "text": { "type": "string", "$$description": [ "The license of the project whose meaning is that of the", "`License field from the core metadata", "<https://packaging.python.org/specifications/core-metadata/#license>`_." ] } }, "required": [ "text" ] } ] } Traceback (most recent call last): File "/tmp/tmpcs883rgu", line 280, in <module> main() File "/tmp/tmpcs883rgu", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/tmp/tmpcs883rgu", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 303, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 319, in run_setup exec(code, locals()) File "<string>", line 16, in <module> File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 117, in setup return distutils.core.setup(**attrs) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 157, in setup dist.parse_config_files() File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 655, in parse_config_files pyprojecttoml.apply_configuration(self, filename, ignore_option_errors) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/config/pyprojecttoml.py", line 71, in apply_configuration config = read_configuration(filepath, True, ignore_option_errors, dist) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/config/pyprojecttoml.py", line 139, in read_configuration validate(subset, filepath) File "/tmp/pip-build-env-ajs_q7bf/overlay/lib/python3.8/site-packages/setuptools/config/pyprojecttoml.py", line 60, in validate raise ValueError(f"{error}\n{summary}") from None ValueError: invalid pyproject.toml config: `project.license`. configuration error: `project.license` must be valid exactly by one definition (2 matches found): - keys: 'file': {type: string} required: ['file'] - keys: 'text': {type: string} required: ['text'] ---------------------------------------- ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmpcs883rgu get_requires_for_build_wheel /tmp/tmp_eewwpk1 Check the logs for full command output.
最新发布
07-10
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值