Selenium的基础使用

最新推荐文章于 2025-08-04 14:54:10 发布

彬小二

最新推荐文章于 2025-08-04 14:54:10 发布

阅读量294

点赞数

CC 4.0 BY-SA版权

分类专栏： python_spider 文章标签： Selenium 基础操作

本文链接：https://blog.youkuaiyun.com/qq_39884947/article/details/89740534

python_spider 专栏收录该内容

11 篇文章

订阅专栏

Selenium是Web自动化测试工具，可在浏览器运行，支持主流浏览器，能让浏览器自动加载页面、获取数据等。本文介绍了其必备基础操作，如打开网页、退出、元素定位、行为链、Cookie操作、页面等待、多窗口切换等，下次将进行爬取网页实践。

安装:
pip install selenium

为什么要用selenium?
在这里插入图片描述
什么是selenium？

Selenium是一个Web的自动化测试工具，最初是为网站自动化测试而开发的，Selenium
可以直接运行在浏览器上，它支持所有主流的浏览器（包括PhantomJS这些无界面的浏览器），可以接收指令，让浏览器自动加载页面，获取需要的数据，甚至页面截屏

以下是一些必备的基础操作惠存并请实操：

pip install selenium

打开网页

from selenium import webdriver
browser=webdriver.Chrome()
browser.get('https://www.baidu.com')
print(browser.page_source)#获取源码
browser.close()

退出

driver.close()关闭当前页面 
driver.quit()退出整个浏览器

-查看请求信息：

driver.page_source
driver.get_cookies
driver.current_url

元素定位：

用法：
find_element_by_id (返回一个)
find_elements_by_xpath （返回一个列表）
find_elements_by_link_text 
find_elements_by_partial_link_text 
find_elements_by_tag_name 
find_elements_by_class_name 
find_elements_by_css_selector

*find_element_bt_XXX通过XX的属性获取定位信息
注意点:
find_element 和find_elements的区别：返回一个和返回一个列表
by_link_text和by_partial_link_text的区别：全部文本和包含某个文本
by_css_selector的用法： #food span.dairy.aged
by_xpath中获取属性和文本需要使用get_attribute() 和.text

表单元素 input type=‘text/password/email/number’
chechbox:input =‘checkbox’
button type=submit
select 下拉表

输入框text/password/email/number：
inputTag=browser.find_element_by_id('kw')#获取表单
inputTag.send_keys('python')#设置表单的值
inputTag.clear()#清楚

checkbox:
btn=browser.find_element_by_name('remember')
btn.click()#选中找到的checkbox的按钮
btn.click()#两次click()就是补选中

select下拉框：
from selenium.webdriver.support.select import Select
Select提供了三种选择某一项的方法
1 select_by_index          # 通过索引定位
2 select_by_value          # 通过value值定位
3 select_by_visible_text   # 通过文本值定位
index索引是从“0”开始；
value是option标签的一个属性值，并不是显示在下拉框中的值；
visible_text是在option标签中间的值，是显示在下拉框的值；

Select提供了三种返回options信息的方法
1 options                  # 返回select元素所有的options
2 all_selected_options     # 返回select元素中所有已选中的选项
3 first_selected_options   # 返回select元素中选中的第一个选项
options：提供所有选项的元素列表；
all_selected_options：提供所有被选中选项的元素列表；
first_selected_option：提供第一个被选中的选项元素；

行为链
在页面操作有很多步骤，可以使用鼠标行为链来完成

from selenium.webdriver import ActionChains
browser=webdriver.Chrome()
browser.get('https://www.baidu.com')
#print(browser.page_source)
inputTag=browser.find_element_by_id('kw')
btn=browser.find_element_by_id('su')
#inputTag.send_keys('python')
action =ActionChains(browser)
action.move_to_element(inputTag)
action.send_keys_to_element(inputTag,'python')
action.move_to_element(btn)
action.click(btn)
action.perform()

更多方法还是要去看文档

Cookie操作

1.获取所有的cookie
for cookie in browser.get_cookies():
    print(cookie)
2.根据cookie的value获取对应的cookie一条信息：
value=browser.get_cookie('PSTM')
3.删除所有的cookie
browser.delete_all_cookies()
4.删除对应value值的cookie
browser.delete_cookie('PSTM')

页面等待
程序不能确定何时某个元素完全加载出来。实际等待时间过长导致某个元素还没出来，程序会抛出异常

1.隐式等待  等待一个确切的时间
browser.implicitly_wait(20)
#等待20s后执行对元素的搜索定位啥的
2.显示等待：等待时间超出最大限度则会抛出异常
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
elem=WebDriverWait(browser,10).until(
    EC.presence_of_element_located((By.ID,'asdsd'))
)
print(elem)

多窗口页面切换

有时候页面有多子tab页面，需要进行切换

browser.get('https://www.baidu.com')
browser.execute_script('window.open("https://www.douban.com/")')#打开第二个界面豆瓣
#但是代码层次还是停留在第一个url上即豆瓣
#要切到其他页面则就要切换
browser.switch_to_window(browser.window_handles[1])#window_handle从0开始

下一次将会对Selenium爬取网页做一个实践操作Selenium实践->拉钩网招聘信息

没有一蹴而就的成功，只有不断努力的拼搏