Python爬虫之selenuim的应用
这是一个最基本的爬虫selenium应用
以下是应用谷歌浏览器打开百度搜索python的代码
from selenium import webdriver
driver = webdriver.Chrome('D:\chromedriver_win32\chromedriver.exe')
driver.get('https://www.baidu.com/')
search_box = driver.find_element_by_xpath('//*[@id="kw"]')
search_box.send_keys('python')
submit = driver.find_element_by_xpath('//*[@id="su"]')
submit.click()
实战演练—爬取新浪微博网站
- 先导入需要模块
- 并编写保存函数
from selenium import webdriver
import csv
import time
def csv_writer(item):
with open('weibo.csv', 'a', encoding='gbk', newline='') as csvfile:
writer = csv.writer(csvfile)
try:
writer.writerow(item)
except:
print('写入失败')
- 编写登录新浪微博网站函数
驱动程序
= webdriver 。Chrome ('D:\ chromedriver_win32 \ chromedriver.exe' )
驱动程序。
hiddenly_wait (10 )#隐式等待10s def login (): 驱动程序。获取('https://weibo.com' ) 驱动程序。set_window_size (1920 ,1080 )#设置浏览器大小#找到用户名输入框 的用户名=驱动器。find_element_by_xpath ('// * [@ id =“ loginname”]')
用户名。send_keys (“您的姓名” )
userpassword = driver 。find_element_by_xpath ('// * [@ id =“ pl_login_form”] / div / div [3] / div [2] / div / input' )用户
密码。send_keys (“您的密码” )
submit = driver 。find_element_by_xpath ('// * [@ ID = “pl_login_form”] / DIV / DIV [3] / DIV [6] / A' )的打印('准备登录......' ) 提交。点击()#预设登录```
## 编写爬虫函数
```python
def spider (): 驱动程序。获取('https://weibo.com' ) all_weibo = driver 。find_elements_by_xpath ( '// * [@ ID = “v6_pl_content_homefeed”] / DIV / DIV [4] / DIV [1] / DIV [1]' )为微博在all_weibo : 的pub_id =微博。find_elements_by_xpath ('div [4] / div [1] / a [1]' )[ 0 ] 。文字 pub_id_url =微博。
find_elements_by_xpath ('div [4] / div [1] / a [1]' )[ 0 ] 。get_attribute ('href' )
pub_content =微博。find_elements_by_xpath ('div [4] / div [4]' )[ 0 ] 。文字
项= [ pub_id , pub_id_url , pub_content ] print ('抓取成功' ) csv_writer ( item )```
## 编写主函数
```python
def main (): login ()而True : 蜘蛛() 时间。睡觉(20 )```