使用selenium解决使用send_request请求只拿到script的内容(返回码412)

最新推荐文章于 2023-05-09 10:10:40 发布

原创

最新推荐文章于 2023-05-09 10:10:40 发布 · 3.3k 阅读

12 ·

CC 4.0 BY-SA版权

文章标签：

#selenium #python

在爬取网页时遇到send_request请求返回412错误，只得到script内容。通过使用selenium模拟浏览器成功绕过该问题，获取到完整HTML。文章介绍了如何安装selenium和Firefox驱动，并提供了完整的Python代码示例。

最近爬虫遇到一个比较奇怪的现象:
使用send_resquest方法去请求得到html里面body里面只有script,并且打印返回状态码为412(tips:412表示缺失发送时候的先决条件)

请求的代码(请求url:http://www.nhc.gov.cn/zwgk/tian/ejlist.shtml)
在这里插入图片描述

请求拿到的内容大概是下面这样的
!](https://img-blog.csdnimg.cn/20200912091229436.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2RfYXBwZW5k,size_16,color_FFFFFF,t_70#pic_center)

后来经过摸索发现可以使用selenium模拟浏览器来绕过那个先决条件,就不会请求只拿到script而是拿到真正的内容.

使用selenium需要安装浏览器驱动,我使用的firfox（也可以chrome网上可搜）。可以参考下面这篇文章进行安装。（这里我个人配置了环境变量还是有点问题，我就在爬的代码（在下面）中browser = webdriver.Firefox(executable_path=“C:\Program Files\Mozilla Firefox\geckodriver.exe”)设置了executable_path，executable_path路径为我的firfox安装路径）https://blog.youkuaiyun.com/qq471011042/article/details/79514908

另外还要pip install一些包

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selen

最低0.47元/天解锁文章

使用selenium解决使用send_request请求只拿到script的内容(返回码412)

4 条评论