关于 pyspider Web预览界面太小的解决方法
修改C:\Program Files\python3.6.5\Lib\site-packages\pyspider\webui\static\debug.min.css的第一行
清理chrome缓存,页面大小终于正常了。。。。
CSS selector无法工作,待解决。。。。:
pyspider debug.min.js:1 Uncaught DOMException: Failed to execute 'querySelectorAll' on 'Document':
将pyspider爬取的数据存到本地.csv:
1.win下装好了MongoDB,于是在服务里开启MongoDB Database Server就可以了
2.先把.csv文件读取为字典,存入MongoDB,有点繁琐,不过先这样吧。。。。
居然运行1.7s,写入数据库约1s,这就是传说中的NoSQL?mysql没查过,改天试试,应该没这么吊把?
import pymongo
import time
import csv
import sys
# 处理异常:_csv.Error: field larger than field limit (131072)
# 处理异常:OverflowError: Python int too large to convert to C long
maxINT = sys.maxsize
decrement = True
while decrement:
decrement = False
try:
print(maxINT)
csv.field_size_limit(maxINT)
except OverflowError:
maxINT = int(maxINT/10)
decrement = True
def read_csv(file):
with open(file, 'r', encoding='utf-8') as csvFile:
reader = csv.reader(csvFile)
for read in reader:
# 去掉最后一项'...',这个无法插入MongoDB
yield read[:10]
csvFile.close()
def insert2db():
# 连接MongoDB,创建数据库:qunar,创建攻略collection:strategy
client = pymongo.MongoClient(host='localhost', port=27017)
db = client.qunar
collection = db.strategy
# 读取生成器对象,取出第一列为keys
file = 'qunar.csv'
con = read_csv(file)
keys = next(con)
try:
while True:
# 读取每一行的value
value = next(con)
# 将keys和value组成字典item,插入strategy
item = dict(zip(keys, value))
collection.insert(item)
except StopIteration:
print('done.')
if __name__ == '__main__':
start = time.time()
insert2db()
print("cost %.2f s" % float(time.time()-start))