Python Crawler(4)Selenium

本文介绍了一个简单的Python脚本,该脚本利用Selenium自动化浏览器操作来抓取Walmart网站上的商品链接。通过PhantomJS无头浏览器,脚本能够模拟真实用户的浏览行为,从而获取页面上所有商品标题链接。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Python Crawler(4)Selenium

>pip install selenium

Simple open_url.py
from selenium import webdriver
import os

path2phantom = '/opt/phantomjs/bin/phantomjs'
browser = webdriver.PhantomJS(path2phantom)
browser.get('https://www.walmart.com/search/?grid=false&page=2&query=computer#searchProductResult')

links = browser.find_elements_by_css_selector('a.product-title-link')

count = 0

for link in links:
count = count + 1
print str(count) + ' ' + link.get_attribute('href')

browser.quit()

>python open_url.py
1 https://www.walmart.com/ip/Dell-Inspiron-15-6-Laptop-AMD-A9-8GB-AMD-Radeon-R5-Graphics-1TB-HD-Red/54527141
2 https://www.walmart.com/ip/Dell-Inspiron-15-6-Laptop-AMD-A9-8GB-AMD-Radeon-R5-Graphics-1TB-HD-Red/54527141
3 https://www.walmart.com/ip/Refurbished-HP-15-f387wm-15-6-Laptop-Touchscreen-Windows-10-Home-AMD-Quad-Core-A8-7410-APU-Processor-4GB-RAM-500GB-Hard-Drive/54476821


References:
http://www.marinamele.com/selenium-tutorial-web-scraping-with-selenium-and-python
https://gxnotes.com/article/52544.html
http://selenium-python.readthedocs.io/locating-elements.html
https://selenium-python.readthedocs.io/api.html#selenium.webdriver.remote.webdriver.WebDriver.find_element_by_class_name
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值