需求:获取公司官网数据
question1:部分网站是通过js动态加载的,如果直接只用requests.get(url),就会出现获取信息不全的问题,举个例子:
import requests
def main():
r = requests.get('https://www.tee.com/index')
print(r.text)
if __name__ == '__main__':
main()
运行结果
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>TEE</title>
</head>
<body>
<div id="app"></div>
<script src="/dist/build.js"></script>
</body>
</html>
answer1: 解决办法是有很多种的,这里一种介绍个人认为比较简单的方法,使用selenium 的webdriver去请求(这个过程是需要安装selenium和对应版本的chromedriver,自行百度)
直接上代码:
from selenium import webdriver
def main():
driver = webdriver.Chrome()
driver.get('https://www.tee.com/index')
html = driver.page_source
print(html)
if __name__ == '__main__':
main()
运行结果
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<meta charset="utf-8" />
<title>TEE</title>
<style type="text/css">.el-breadcrumb:after,.el-breadcrumb:before,.el-button-group:after,.el-button-group:before,.el-form-item:after,.el-form-item:before,.el-form-item__content:after,.el-form......(此处省略一万字)
<body>
<div id="app"><div data-v-affeac48="" id="pageheader"><header data-v-affeac48="" class="container"><div data-v-c89fb01a="" data-v-affeac48="" id="pageNav"><nav data-v-c89fb01a=""><div data-v-c89fb01a="" class="nav_logo"><img data-v-c89fb01a="" src="/dist/TEE_logo.png?56cbbf3b79c907ee1c1c25b2cf175639" alt="" class="nav_logo_img" /> <p data-v-c89fb01a="" class="nav_logo_p">Trusted Execution Environment</p></div> <ul data-v-c89fb01a="" class="nav_ul"><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/index" class="router-link-exact-active router-link-active">首页</a></li> <li data-v-c89fb01a="" class="product"><a data-v-c89fb01a="" href="#">产品中心</a> <div data-v-c89fb01a="" id="product_detail1" class="product_detail"><div data-v-c89fb01a="" class="product_detail_out"><div data-v-c89fb01a="" class="product_detail_check"><h3 data-v-c89fb01a="" class="product_h3">机器视觉</h3> <ul data-v-c89fb01a="" class="check_ul"><li data-v-c89fb01a="" class="check_ul_li2"><a data-v-c89fb01a="" href="/pointsMachine" target="_blank">AI智能分板机</a></li></ul></div> <div data-v-c89fb01a="" class="product_detail_check"><h3 data-v-c89fb01a="" class="product_h3">AI教育应用</h3> <ul data-v-c89fb01a="" class="check_ul"><li data-v-c89fb01a="" class="check_ul_li2"><a data-v-c89fb01a="" href="/logistics" target="_blank">无人物流系统</a></li></ul></div> <div data-v-c89fb01a="" class="solve_detail_trafic"><h3 data-v-c89fb01a="" class="product_h3">交通安全</h3> <ul data-v-c89fb01a=""><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/DMS" target="_blank" class="headhover_a" style="padding: 0px 3px;">驾驶员状态监测</a></li></ul></div> <div data-v-c89fb01a="" class="product_detail_stick"><h3 data-v-c89fb01a="" class="product_h3">硬件类/服务类</h3> <ul data-v-c89fb01a=""><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/teeEssential" target="_blank">神经元计算棒</a></li> <li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/greenrouter" target="_blank">绿盾路由器</a></li></ul></div> <div data-v-c89fb01a="" class="product_detail_check"><h3 data-v-c89fb01a="" class="product_h3">内容审核</h3> <ul data-v-c89fb01a="" class="check_ul"><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/imagesensor" target="_blank">图像审核</a></li> <li data-v-c89fb01a="" class="check_ul_li2"><a data-v-c89fb01a="" href="/videosensor" target="_blank">视频审核</a></li></ul></div> <div data-v-c89fb01a="" class="product_detail_face"><h3 data-v-c89fb01a="" class="product_h3">人脸技术</h3> <ul data-v-c89fb01a=""><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/faceID" target="_blank">人证合一</a></li> <li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/faceRecnitn" target="_blank">人脸识别</a></li></ul></div> <div data-v-c89fb01a="" class="solve_detail_buy"><h3 data-v-c89fb01a="" class="product_h3">智能导购</h3> <ul data-v-c89fb01a=""><li data-v-c89fb01a=""><a data-v-c89fb01a="" href="/snapshop&#