WebDriver协议
早期Selenium和WebDriver是两个独立的项目,Selenium早期主要依靠将操作转换为JavaScript然后注入浏览器实现浏览器的自动化,WebDriver将浏览器原生API封装成一套更加面向对象的API,但是由于不同浏览器内核的差异,因此为了适配,必须对不同浏览器实现不同的API。在Selenium2之后两个项目合并,我们常说的Selenium其实主要指的就是WebDriver API。当前Selenium稳定版本的主版本为3,beta版本的Selenium4已经在路上。
Selenium或者WebDriver与浏览器交互的基础是WebDriver协议(W3C推荐标准)。WebDriver协议是一个与操作系统、编程语言无关的通过HTTP协议使用JSON作为传输格式的RESTful Web服务。
WebDriver协议目前有两个版本:
WebDriver 1:https://www.w3.org/TR/webdriver1/WebDriver 2:https://www.w3.org/TR/webdriver/
WebDriver协议之前还有一个废弃的JsonWireProtocol协议,具体见https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol
在WebDriver协议模型中牵扯三个对象:
- 客户端(Client或local end):可以简单认为就是调用WebDriver API的程序或机器。我们常用的Python WebDriver API就是客户端的一个Python语言绑定。
- 服务端(Server或remote end):运行
RemoteWebDriver的机器,或者支持WebDriver协议的浏览器,比如支持geckodriver的Firefox浏览器、支持chromedriver的Chrome浏览器。WebDriver协议其实只定义了服务端的行为。 WebDriver:即各种浏览器的WebDriver协议实现,比如geckodriver、chromedriver,WebDriver则相当于客户端和服务端的中介,提供WebDriver协议RESTful 服务。客户端通过WebDriver操控服务端(浏览器)。比如Firefox依靠webdriver.xpi扩展来控制浏览器。
客户端运行后,启动1个会话(session),客户端在该会话中通过WebDriver协议 RESTful 服务端点(endpoint,可简单理解为url)向服务端请求,服务器端接受请求后向客户端返回响应。
!
WebDriver协议将端点映射为命令(command)。端点与命令的关系如下(https://www.w3.org/TR/webdriver/#endpoints):
| Method | URI Template | Command |
|---|---|---|
| POST | /session | New Session |
| DELETE | /session/{session id} | Delete Session |
| GET | /status | Status |
| GET | /session/{session id}/timeouts | Get Timeouts |
| POST | /session/{session id}/timeouts | Set Timeouts |
| POST | /session/{session id}/url | Navigate To |
| GET | /session/{session id}/url | Get Current URL |
| POST | /session/{session id}/back | Back |
| POST | /session/{session id}/forward | Forward |
| POST | /session/{session id}/refresh | Refresh |
| GET | /session/{session id}/title | Get Title |
| GET | /session/{session id}/window | Get Window Handle |
| DELETE | /session/{session id}/window | Close Window |
| POST | /session/{session id}/window | Switch To Window |
| GET | /session/{session id}/window/handles | Get Window Handles |
| POST | /session/{session id}/window/new | New Window |
| POST | /session/{session id}/frame | Switch To Frame |
| POST | /session/{session id}/frame/parent | Switch To Parent Frame |
| GET | /session/{session id}/window/rect | Get Window Rect |
| POST | /session/{session id}/window/rect | Set Window Rect |
| POST | /session/{session id}/window/maximize | Maximize Window |
| POST | /session/{session id}/window/minimize | Minimize Window |
| POST | /session/{session id}/window/fullscreen | Fullscreen Window |
| GET | /session/{session id}/element/active | Get Active Element |
| POST | /session/{session id}/element | Find Element |
| POST | /session/{session id}/elements | Find Elements |
| POST | /session/{session id}/element/{element id}/element | Find Element From Element |
| POST | /session/{session id}/element/{element id}/elements | Find Elements From Element |
| GET | /session/{session id}/element/{element id}/selected | Is Element Selected |
| GET | /session/{session id}/element/{element id}/attribute/{name} | Get Element Attribute |
| GET | /session/{session id}/element/{element id}/property/{name} | Get Element Property |
| GET | /session/{session id}/element/{element id}/css/{property name} | Get Element CSS Value |
| GET | /session/{session id}/element/{element id}/text | Get Element Text |
| GET | /session/{session id}/element/{element id}/name | Get Element Tag Name |
| GET | /session/{session id}/element/{element id}/rect | Get Element Rect |
| GET | /session/{session id}/element/{element id}/enabled | Is Element Enabled |
| GET | /session/{session id}/element/{element id}/computedrole | Get Computed Role |
| GET | /session/{session id}/element/{element id}/computedlabel | Get Computed Label |
| POST | /session/{session id}/element/{element id}/click | Element Click |
| POST | /session/{session id}/element/{element id}/clear | Element Clear |
| POST | /session/{session id}/element/{element id}/value | Element Send Keys |
| GET | /session/{session id}/source | Get Page Source |
| POST | /session/{session id}/execute/sync | Execute Script |
| POST | /session/{session id}/execute/async | Execute Async Script |
| GET | /session/{session id}/cookie | Get All Cookies |
| GET | /session/{session id}/cookie/{name} | Get Named Cookie |
| POST | /session/{session id}/cookie | Add Cookie |
| DELETE | /session/{session id}/cookie/{name} | Delete Cookie |
| DELETE | /session/{session id}/cookie | Delete All Cookies |
| POST | /session/{session id}/actions | Perform Actions |
| DELETE | /session/{session id}/actions | Release Actions |
| POST | /session/{session id}/alert/dismiss | Dismiss Alert |
| POST | /session/{session id}/alert/accept | Accept Alert |
| GET | /session/{session id}/alert/text | Get Alert Text |
| POST | /session/{session id}/alert/text | Send Alert Text |
| GET | /session/{session id}/screenshot | Take Screenshot |
| GET | /session/{session id}/element/{element id}/screenshot | Take Element Screenshot |
| POST | /session/{session id}/print | Print Page |
WebDriver Python API执行流程(反推)
WebDriver客户端API其实就是实现属性或方法到命令,再到WebDriver协议端点的变换过程。
例如:WebDriver类的current_url特性
-
current_url特性其实是命令Command.GET_CURRENT_URL的执行结果。@property def current_url(self): return self.execute(Command.GET_CURRENT_URL)['value'] -
WebDriver类的execute方法通过command_executor.execute(driver_command, params)执行命令。
command_executor是一个字符串(服务器url)或者remote_connection.RemoteConnection对象。 -
remote_connection.RemoteConnection类位于selenium\webdriver\remote\remote_connection.py。
remote_connection.RemoteConnection类的_commands属性定义了命令和端点之间的关系。
Command.GET_CURRENT_URL: ('GET', '/session/$sessionId/url')
remote_connection.RemoteConnection类的execute方法控制服务端执行命令。 -
WebDriver类在实例化时会传入command_executor参数,command_executor参数即服务端地址。
WebDriver Python API执行流程(正推)
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.baidu.com")
# 服务端地址
url = driver.command_executor._url
print(url)
上述代码的执行流程大致如下:
-
driver = webdriver.Firefox()即对selenium\webdriver\firefox\webdriver.py中的WebDriver类实例化,WebDriver类继承自selenium\webdriver\remote\webdriver.py中的WebDriver类。WebDriver类构造方法中command_executor参数即服务端地址。
此时WebDriver启动浏览器,加载webdriver.xpi扩展,监听基于WebDriver协议的请求。
在日志文件geckodriver.log中可以看到类似1116855012551 geckodriver INFO Listening on 127.0.0.1:54499的记录。 -
driver.get("http://www.baidu.com")相当于执行Command.GET命令。WebDriver对象依靠remote_connection.RemoteConnection对象command_executor执行命令。remote_connection.RemoteConnection类中实现了命令到WebDriver协议的映射Command.GET: ('POST', '/session/$sessionId/url')。执行命令相当于向浏览器发送对应请求。WebDriver将请求发送给浏览器。def get(self, url): self.execute(Command.GET, {'url': url}) -
浏览器在接受请求后执行操作(打开百度页面)并返回响应(服务端地址)。
WebDriver将响应结果传递给代码。
本文详细介绍了Selenium和WebDriver的历史、合并及其工作原理,重点解析了WebDriver协议,包括其作为W3C推荐标准的角色、协议的两个版本以及与浏览器交互的三层对象模型。此外,还展示了WebDriver协议的命令和端点映射,并通过Python WebDriver API执行流程举例说明其工作过程。
2401





