51job_selenium测试2

本文介绍了一个使用Python和Selenium实现的51job网站爬虫实例,展示了如何从网页抓取招聘信息并保存到Excel文件中。文章详细解释了元素定位的方法及WebElement对象的操作。

 

Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门

https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865

 

 

 

 

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

import requests,bs4,openpyxl,time,selenium
from openpyxl.cell import get_column_letter,column_index_from_string
from selenium import webdriver
excelName="51job.xlsx"
sheetName="Sheet1"
wb1=openpyxl.load_workbook(excelName)
sheet=wb1.get_sheet_by_name(sheetName)
start=1

charset="gb2312"
site="http://jobs.51job.com/all/co198308.html"
browser=webdriver.Firefox()
browser.get(site)
linkElem=browser.find_element_by_link_text("下一页")
linkElem.click()
#elem = browser.find_element_by_class_name('el')
#返回标签的值
#elem.text
#elems = browser.find_elements_by_class_name('el')
elem=browser.find_elements_by_class_name('el')
div1=elem[0].text
div2=elem[1].text



#每个网站爬取相应数据
def Craw(site):
     
    res=requests.get(site)
    res.encoding = charset
    soup1=bs4.BeautifulSoup(res.text,"lxml")
    div=soup1.select('.el')
    len_div=len(div)
    for i in range(len_div):
        #print ("i:",i)
        content=div[i].getText()
        content_list=content.split('\n')
         
        name=content_list[1]
        #print ("name:",name)
        education=content_list[2]
        #print ("education:",education)
        position=content_list[3]
        #print ("position:",position)
        salary=content_list[4]
        #print ("salary:",salary)
        date=content_list[5]
        #print ("date:",date)
    
        sheet['A'+str(i+2)].value=name
        sheet['B'+str(i+2)].value=education
        sheet['C'+str(i+2)].value=position
        sheet['D'+str(i+2)].value=salary
        sheet['E'+str(i+2)].value=date

''' 
Craw(site)       
wb1.save(excelName)
    '''

  

 

Finding Elements on the Page

WebDriver objects have quite a few methods for finding elements on a page. They are divided into the find_element_* and find_elements_* methods. Thefind_element_* methods return a single WebElement object, representing the first element on the page that matches your query. The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.

Table 11-3 shows several examples of find_element_* and find_elements_* methods being called on a WebDriver object that’s stored in the variable browser.

Table 11-3. Selenium’s WebDriver Methods for Finding Elements

 

Method name

WebElement object/list returned

browser.find_element_by_class_name(name)
browser.find_elements_by_class_name(name)

Elements that use the CSS class name

browser.find_element_by_css_selector(selector)
browser.find_elements_by_css_selector(selector)

Elements that match the CSS selector

browser.find_element_by_id(id)
browser.find_elements_by_id(id)

Elements with a matching id attribute value

browser.find_element_by_link_text(text)
browser.find_elements_by_link_text(text)

<a> elements that completely match the textprovided

browser.find_element_by_partial_link_text(text)
browser.find_elements_by_partial_link_text(text)

<a> elements that contain the text provided

browser.find_element_by_name(name)
browser.find_elements_by_name(name)

Elements with a matching name attribute value

browser.find_element_by_tag_name(name)
browser.find_elements_by_tag_name(name)

Elements with a matching tag name (case insensitive; an <a> element is matched by 'a'and 'A')

 

 

Except for the *_by_tag_name() methods, the arguments to all the methods are case sensitive. If no elements exist on the page that match what the method is looking for, the selenium module raises a NoSuchElement exception. If you do not want this exception to crash your program, add try and except statements to your code.

Once you have the WebElement object, you can find out more about it by reading the attributes or calling the methods in Table 11-4.

Table 11-4. WebElement Attributes and Methods

Attribute or method

Description

tag_name

The tag name, such as 'a' for an <a> element

get_attribute(name)

The value for the element’s name attribute

text

The text within the element, such as 'hello' in <span>hello</span>

clear()

For text field or text area elements, clears the text typed into it

is_displayed()

Returns True if the element is visible; otherwise returns False

is_enabled()

For input elements, returns True if the element is enabled; otherwise returns False

is_selected()

For checkbox or radio button elements, returns True if the element is selected; otherwise returns False

location

A dictionary with keys 'x' and 'y' for the position of the element in the page

 

 

Table 11-5. Commonly Used Variables in the selenium.webdriver.common.keysModule

Attributes

Meanings

Keys.DOWNKeys.UPKeys.LEFTKeys.RIGHT

The keyboard arrow keys

Keys.ENTERKeys.RETURN

The ENTER and RETURN keys

Keys.HOMEKeys.ENDKeys.PAGE_DOWN,Keys.PAGE_UP

The homeendpagedown, and pageup keys

Keys.ESCAPEKeys.BACK_SPACEKeys.DELETE

The ESC, BACKSPACE, and DELETE keys

Keys.F1Keys.F2,..., Keys.F12

The F1 to F12 keys at the top of the keyboard

Keys.TAB

The TAB key

 

 

 

 

 

 

转载于:https://www.cnblogs.com/webRobot/p/5302439.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值