Scraping JavaScript install(安装) selenium

本文详细介绍如何使用Selenium进行自动化测试,包括安装配置Selenium、ChromeDriver,通过Python脚本控制浏览器操作,以及使用WebDriverWait和XPath定位元素,实现页面加载等待和元素查找的高级技巧。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

anaconda prompt

go to scripts to install: pip install selenium

check your chrome version

Then go to http://npm.taobao.org/mirrors/chromedriver/

for downloading chromedriver

wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==
wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==
Add the folder director to system environment's "Path"

wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==

###########################################################################################

Example 1:



from selenium import webdriver
import time

# driver = webdriver.PhantomJS(executable_path='D:/phantomjs-2.1.1-windows/bin/phantomjs')
driver = webdriver.Chrome(executable_path='D:/chromedriver/chromedriver')

driver.get("http://pythonscraping.com/pages/javascript/ajaxDemo.html")
time.sleep(1)
print(driver.find_element_by_id('content').text)
driver.close()

###########################################################################################

Example2:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path='D:/chromedriver/chromedriver')

driver.get('http://pythonscraping.com/pages/javascript/ajaxDemo.html')
try:
# WebDriverWait and expected_conditions(EC),
# both of which are combined here to form what Selenium calls an implicit wait.
    element = WebDriverWait(driver, 10).until(
    # locator is an abstract query language, using the By object,
    # which can be used in a variety of ways, including to make selectors.

    #<button id="loadedButton">A button to click!</button>
        EC.presence_of_element_located( (By.ID, 'loadedButton') )
    )
finally:
    #<div id="content">
    #print(driver.find_element(By.ID, "content").text)
    print(driver.find_element_by_id('content').text )
    driver.close()

#improvement##########################

from selenium import webdriver
import time
from selenium.webdriver.remote.webelement import WebElement
from selenium.common.exceptions import StaleElementReferenceException
                                          ##
def waitForLoad(driver):
    elem = driver.find_element_by_tag_name('html')###################
    count = 0
    while True:
        count +=1
        if count>20:  #with a time-out of 10 seconds=20*0.5seconds = 10seconds
            print('Timing out after 10 seconds and returning')
            return
        time.sleep(.5) #This script checks the page every half second
        try:
            #“watching” an element in the DOM when the page initially loads,
            # and then repeatedly calling that element
            elem == driver.find_element_by_tag_name('html')###################
        except StateElementReferenceException:
            #the element is no longer attached to the page’s DOM and the site has redirected
            return

driver = webdriver.Chrome(executable_path='D:/chromedriver/chromedriver')
driver.get('http://pythonscraping.com/pages/javascript/redirectDemo1.html')
waitForLoad(driver)
print(driver.page_source)

################################################################################

#my favorite -- xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome(executable_path='D:/chromedriver/chromedriver')
driver.get('http://pythonscraping.com/pages/javascript/redirectDemo1.html')

try:
# Here you’re providing it a time-out of 15 seconds
# and an XPath selector that looks for the page body content to accomplish the
# same task:
    bodyElement = WebDriverWait(driver, 15).until(EC.presence_of_element_located(
        (By.XPATH, '//body[contains( text(), "This is the page you are looking for!" )]')
    )     )
    print(bodyElement.text)
except TimeoutException:
    print('Did not find the element')

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值