使用selenium +PhantomJS()/Chrome爬取 淘宝页面,存储到mongdb中
使用config进行一些设置
MONGO_URL = 'localhost'
MONGO_DB = 'taobao'
MONGO_TABLE = 'product'
# 将图片设定为不下载
SERVICE_ARGS = ['--load-images=false', '--disk-cache=true']
# 搜索名称
KEYWORD = '情人节礼物'
在编写代码的时候要注意,可以先使用Chrome 运行,当将代码调试完毕后,在改为PhantomJS
代码如下
import re
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq
from config import *
import pymongo
client = pymongo.MongoClient(MONGO_URL)
db = client[MONGO_DB]
browser = webdriver.PhantomJS()
wait = WebDriverWait(browser, 10)
browser.set_window_size(1400, 900)
def search():
&nb