51job_selenium测试2

最新推荐文章于 2025-09-08 18:49:10 发布

转载最新推荐文章于 2025-09-08 18:49:10 发布 · 90 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/webRobot/p/5302439.html

文章标签：

#测试 #爬虫 #python

本文介绍了一个使用Python和Selenium实现的51job网站爬虫实例，展示了如何从网页抓取招聘信息并保存到Excel文件中。文章详细解释了元素定位的方法及WebElement对象的操作。

Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门

https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

import requests,bs4,openpyxl,time,selenium
from openpyxl.cell import get_column_letter,column_index_from_string
from selenium import webdriver
excelName="51job.xlsx"
sheetName="Sheet1"
wb1=openpyxl.load_workbook(excelName)
sheet=wb1.get_sheet_by_name(sheetName)
start=1

charset="gb2312"
site="http://jobs.51job.com/all/co198308.html"
browser=webdriver.Firefox()
browser.get(site)
linkElem=browser.find_element_by_link_text("下一页")
linkElem.click()
#elem = browser.find_element_by_class_name('el')
#返回标签的值
#elem.text
#elems = browser.find_elements_by_class_name('el')
elem=browser.find_elements_by_class_name('el')
div1=elem[0].text
div2=elem[1].text



#每个网站爬取相应数据
def Craw(site):
     
    res=requests.get(site)
    res.encoding = charset
    soup1=bs4.BeautifulSoup(res.text,"lxml")
    div=soup1.select('.el')
    len_div=len(div)
    for i in range(len_div):
        #print ("i:",i)
        content=div[i].getText()
        content_list=content.split('\n')
         
        name=content_list[1]
        #print ("name:",name)
        education=content_list[2]
        #print ("education:",education)
        position=content_list[3]
        #print ("position:",position)
        salary=content_list[4]
        #print ("salary:",salary)
        date=content_list[5]
        #print ("date:",date)
    
        sheet['A'+str(i+2)].value=name
        sheet['B'+str(i+2)].value=education
        sheet['C'+str(i+2)].value=position
        sheet['D'+str(i+2)].value=salary
        sheet['E'+str(i+2)].value=date

''' 
Craw(site)       
wb1.save(excelName)
    '''

Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the find_element_* and find_elements_* methods. Thefind_element_* methods return a single WebElement object, representing the first element on the page that matches your query. The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.

Table 11-3 shows several examples of find_element_* and find_elements_* methods being called on a WebDriver object that’s stored in the variable browser.

Table 11-3. Selenium’s WebDriver Methods for Finding Elements

Method name	WebElement object/list returned
browser.find_element_by_class_name(name) browser.find_elements_by_class_name(name)	Elements that use the CSS class `name`
browser.find_element_by_css_selector(selector) browser.find_elements_by_css_selector(selector)	Elements that match the CSS `selector`
browser.find_element_by_id(id) browser.find_elements_by_id(id)	Elements with a matching `id` attribute value
browser.find_element_by_link_text(text) browser.find_elements_by_link_text(text)	`<a>` elements that completely match the `text`provided
browser.find_element_by_partial_link_text(text) browser.find_elements_by_partial_link_text(text)	`<a>` elements that contain the `text` provided
browser.find_element_by_name(name) browser.find_elements_by_name(name)	Elements with a matching `name` attribute value
browser.find_element_by_tag_name(name) browser.find_elements_by_tag_name(name)	Elements with a matching tag `name` (case insensitive; an `<a>` element is matched by `'a'`and `'A'`)

Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no elements exist on the page that match what the method is looking for, the selenium module raises a NoSuchElement exception. If you do not want this exception to crash your program, add try and except statements to your code.

Once you have the WebElement object, you can find out more about it by reading the attributes or calling the methods in Table 11-4.

Table 11-4. WebElement Attributes and Methods

Attribute or method	Description
`tag_name`	The tag name, such as `'a'` for an `<a>` element
`get_attribute(``name``)`	The value for the element’s `name` attribute
`text`	The text within the element, such as `'hello'` in `<span>hello</span>`
`clear()`	For text field or text area elements, clears the text typed into it
`is_displayed()`	Returns `True` if the element is visible; otherwise returns `False`
`is_enabled()`	For input elements, returns `True` if the element is enabled; otherwise returns `False`
`is_selected()`	For checkbox or radio button elements, returns `True` if the element is selected; otherwise returns `False`
`location`	A dictionary with keys `'x'` and `'y'` for the position of the element in the page

Table 11-5. Commonly Used Variables in the selenium.webdriver.common.keysModule

Attributes	Meanings
`Keys.DOWN`, `Keys.UP`, `Keys.LEFT`, `Keys.RIGHT`	The keyboard arrow keys
`Keys.ENTER`, `Keys.RETURN`	The ENTER and RETURN keys
`Keys.HOME`, `Keys.END`, `Keys.PAGE_DOWN`,`Keys.PAGE_UP`	The `home`, `end`, `pagedown`, and `pageup` keys
`Keys.ESCAPE`, `Keys.BACK_SPACE`, `Keys.DELETE`	The ESC, BACKSPACE, and DELETE keys
`Keys.F1`, `Keys.F2`,..., `Keys.F12`	The F1 to F12 keys at the top of the keyboard
`Keys.TAB`	The TAB key