python< 解决OperationalError (2006, 'MySQL server has gone away')>

本文介绍了一种在使用爬虫将数据写入MySQL过程中遇到的连接超时问题及解决方案。通过在代码中加入self.db.ping(True),可以有效避免因长时间爬取导致的MySQL连接断开问题。
部署运行你感兴趣的模型镜像

今天在将爬虫数据写入MySQL的时候,发现了一个很蛋疼的问题
在管道中写操作MySQL的逻辑的时候,有一部分数据能写入数据库,但是另外一部分数据写不进去,经过检查 发现并不是数据格式的问题。

 OperationalError (2006, 'MySQL server has gone away')

经过多方排查最终确定为由于爬虫的时候有些网页可能需要 很长时间才能打开,导致MySQL连接超时。

解决的办法:

在代码中加入:

self.db.ping(True)

仔细看这个方法:

    def ping(self): # real signature unknown; restored from __doc__
        """
        Checks whether or not the connection to the server is
        working. If it has gone down, an automatic reconnection is
        attempted.

        This function can be used by clients that remain idle for a
        long while, to check whether or not the server has closed the
        connection and reconnect if necessary.

        New in 1.2.2: Accepts an optional reconnect parameter. If True,
        then the client will attempt reconnection. Note that this setting
        is persistent. By default, this is on in MySQL<5.0.3, and off
        thereafter.

        Non-standard. You should assume that ping() performs an
        implicit rollback; use only when starting a new transaction.
        You have been warned.
        """
        pass

说的而是检测MySQL的连接是否断开,如果断开就自动获取新的连接。

管道的完整代码如下:

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import MySQLdb
import sys

reload(sys)
sys.setdefaultencoding('utf-8')


class LetvShowPipeline(object):
    def __init__(self):
        self.db = MySQLdb.connect("IP地址", "用户名", "密码", "vboxDB", charset='utf8', port=端口)
        self.cursor = self.db.cursor()

    def process_item(self, item, spider):
        # 提取字段
        area = item['area']
        host = item['host']
        live_channel = item['live_channel']
        name = item['name']
        programs = item['programs']
        selfproduced = item['selfproduced']
        type = item['type']
        number = item['number']
        vv = item['vv']
        year = item['year']
        print "==============================================="
        allhost = ''
        for h in host:
            allhost += h + '$'
        print area, allhost, live_channel, name, programs, selfproduced, type, vv, year
        # 写进数据库
        self.db.ping(True)
        # 先查询数据库中有没有数据
        sql = "SELECT * FROM letv_show WHERE name='%s'" % name
        self.cursor.execute(sql)
        result = self.cursor.fetchall()
        if len(result) != 0:
            print 'updating data....'
            # 如果数据库中已经存在该卡通,更新vv和number字段
            sql = "UPDATE letv_show  SET vv= '%s',number='%s'  WHERE  name='%s'" % (vv, number, name)
        else:
            print 'insert new data ....'
            # 如果数据库中没有该卡通将数据写入MySQL
            sql = "INSERT INTO letv_show(area, host, live_channel, name, programs, selfproduced, type, vv, year,number) \
                                                VALUES ('%s','%s','%s','%s','%s','%s','%s','%s','%s','%s')" % \
                  (area, allhost, live_channel, name, programs, selfproduced, type, vv, year, number)
        try:
            # 执行sql语句
            self.cursor.execute(sql)
            # 提交到数据库执行
            self.db.commit()
        except:
            # Rollback in case there is any error
            self.db.rollback()

        return item

您可能感兴趣的与本文相关的镜像

Python3.9

Python3.9

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值