今天要做数据清洗的时候,要更新一个数据库字段,考虑到用多进程去更新数据库,也许程序会跑得快一些,结果开了64个进程,
结果是其他程序更新的时候,速度非常慢,最后发现的原因是,数据库中有64个SQL语句执行更新,这样就导致了对数据库进行增删改查的速度很慢。
这是一个血的教训,所有以后的操作尽量少用多进程更新数据库。即使是想用多进程进行SQL update,可以少开几个进程,提升效果比较明显
粘贴查来代码,以供以后学习参考
#-*-coding:utf-8-*-
from common.contest import *
import time
def spider(item):
print "正在清晰地url是:", item['item_url']
item_url = item['item_url']
item_lotnum1 = item['item_lotnum']
item_sold = item['item_sold']
artron_session_url = item['artron_session_url']
artfoxlive_session_url = item['artfoxlive_session_url']
print item_lotnum1
print item_sold
try:
item_lotnum2 = "@@@" + item_lotnum1 + "@@@"
item_lotnum = re.findall('@@@000(.*?)@@@',item_lotnum2)[0]
except:
try:
item_lotnum2 = "@@@" + item_lotnum1 + "@@@"
item_lotnum = re.findall('@@@00(.*?)@@@', item_lotnum2)[0]
except:
try:
item_lotnum2 = "@@@" + item_lotnum1 + "@@@"
item_lotnum = re.findall('@@@0(.*?)@@@', item_lotnum2)[0]
except:
item_lotnum = item_lotnum1
item_sold_cur_spider = ""
if '流拍' in item_sold:
item_sold = -2
item_sold_cur_spider = -2
elif '撤拍' in item_sold:
item_sold = -3
item_sold_cur_spider = -3
elif '落槌价' in item_sold:
item_sold1 = str(item_sold).replace('落槌价', '').replace(':', '').replace(',', '').replace(':', '').replace(' ', '').replace(' ', '')
item_sold = re.findall('\d+', item_sold1)[0]
item_sold_cur_spider = re.findall('[^\d]+', item_sold1)[0]
else:
pass
print item_sold
print item_sold_cur_spider
print artron_session_url
print artfoxlive_session_url
item_lotnum = item_lotnum.replace('@','')
print item_lotnum
sql = 'update spider_yachang_2017_2_update_sold_price set item_sold_price_spider2 = %s, item_sold_cur_spider2 = %s
where session_url=%s and item_lotnum= %s '
data = (str(item_sold), str(item_sold_cur_spider), str(artron_session_url), str(item_lotnum))
update_data1(sql, data=data)
if __name__ == "__main__":
time1 = time.time()
sql = """
SELECT
*
FROM
oversea_artfoxlive_2017_2_detail_info
"""
resultList = select_data(sql)
print len(resultList)
pool = multiprocessing.Pool(64)
for item in resultList:
# print "正在爬取的位置是:",resultList.index(item)
# spider(item)
pool.apply_async(spider, (item,))
pool.close()
pool.join()

被折叠的 条评论
为什么被折叠?



