Tools-lqp-优快云博客

原创 ERROR: Get https://registry-1.docker.io/v2/: proxyconnect tcp: net/http: TLS handshake timeout

4.重启 docker。

2024-05-28 14:07:18 577 1

转载 python2 生成 13位，16位时间戳

import datetimeimport timedef get_float_time_stamp(): datetime_now = datetime.datetime.now() return datetime_now.timestamp()def get_time_stamp16(): # 生成16时间戳 eg:1540281250399895 ...

2020-02-28 14:41:05 1268

原创 Privoxy 安装配置

场景：项目部署在内网的服务器，但是项目需要外网，这时找一台有外网的服务器搭建一个代理服务。1. 安装apt-get install privoxy2. 配置修改绑定地址，搜索 listen-address ，修改需要绑定的IPlisten-address 0.0.0.0:8118设置socks5 转发，搜索 forward-socks5t ，去掉注释，修改对应IP（注意后面的点...

2019-12-23 16:09:42 3532 1

原创 selenium, pyppeteer 如何避免被服务器检测

1.selenium 处理方法from selenium import webdriverfrom selenium.webdriver import ChromeOptionsdef get_cookie(): option = ChromeOptions() option.add_experimental_option('excludeSwitches', ['ena...

2019-11-27 15:33:46 2282

原创 python2 用pymysql代替MySQLdb

安装 pymysqlpip install pymysqlimport pymysqlpymysql.install_as_MySQLdb()

2019-11-07 17:27:33 3553

原创 python AES 解密， JS AES 解密

# -*- encoding: utf-8 -*-from Cryptodome.Util.Padding import unpad, padimport execjsfrom Crypto.Cipher import AESfrom binascii import b2a_hex, a2b_hexdef js_aes(text): jscode = """ var...

2019-10-17 20:08:54 399

原创 scrapy twisted.python.failure.Failure OpenSSL.SSL.Error

scrapy twisted.python.failure.Failure OpenSSL.SSL.Errorfrom OpenSSL import SSLfrom scrapy.core.downloader.contextfactory import ScrapyClientContextFactoryclass CustomContextFactory(ScrapyClientCo...

2019-09-29 19:41:29 3775 11

原创 redis.conf 常见配置

redis.conf 常见配置https://blog.youkuaiyun.com/Calvin_1016280226/article/details/79683283

2018-06-15 14:53:28 248

原创 scrapy重写下载img方法记录存储位置

重写下载img方法记录存储位置from scrapy.pipelines.images import ImagesPipelineclass download_img(ImagesPipeline):def item_completed(self, results, item, info): # 判断有URL过来 if 'image_urls' in item: ...

2018-05-03 20:25:44 508 1

原创 scrapy调用JsonItemPipline类写入json文件中

调用JsonItemPipline类from scrapy.exporters import JsonItemExporterclass JsonExporterPipline(object):def __init__(self): self.file = open('article.json', 'wb') self.expore = JsonItemExporter...

2018-05-03 13:31:50 669 1

原创 python 通过异步存储到数据库 mysql

通过异步存储到数据库from twisted.enterprise import adbapiclass MysqlTwistedPipline(object):def __init__(self, dbpool): self.dbpool = dbpool@classmethoddef from_settings(cls, settings): data_info...

2018-05-03 11:19:43 1936

原创 python 爬虫时间的处理

if publish_time == '1天前': today = datetime.date.today() yesterday = today - datetime.timedelta(days=1) list.append(str(yesterday)) elif publish_tim...

2018-05-03 11:16:30 2042

原文：https://blog.youkuaiyun.com/qq_28205153/article/details/55798628AES简介高级加密标准(AES,Advanced Encryption Standard)为最常见的对称加密算法(微信小程序加密传输就是用这个加密算法的)。对称加密算法也就是加密和解密用相同的密钥，具体的加密流程如下图：加密流程图下面简单介绍下各个部分的作用与...

2018-04-12 18:53:05 822

原创 scrapy xpath css 经典使用

test() 函数 from scrapy import Selector doc = “”” … “”” sel = Selector(text=doc, type=”html”) sel.xpath(‘//li//@href’).extract() [u’link1.html’, u’link...

2018-04-06 18:43:27 611

转载爬虫使用MongoDB存储数据怎么去除重复数据

这种情况请使用MongoDB的update来更新数据，而非用insert插入，具体如下：db.collection.update( query, update, { upsert: , multi: , writeConcern: })参数说明： query : update的查询条件，类似sql update查询内where后面的。 upda...

2018-03-29 19:41:41 4336

原创 python 使用pdfminer3k处理PDF

*_encoding:utf-8_*author: lqpfrom pdfminer.converter import PDFPageAggregator from pdfminer.layout import LAParams from pdfminer.pdfparser import PDFDocument,PDFParser from pdfminer.pdfin...

2018-03-14 21:33:19 4374

原创 scrapy_redis分布式爬虫从redis写到mysql数据库中

import redis import MySQLdb import jsondef process_item(): # 创建redis数据库连接 rediscli = redis.Redis(host = “127.0.0.1”, port = 6379, db = 0)# 创建mysql数据库连接mysqlcli = MySQLdb.connect(host ...

2018-02-24 13:57:48 2114

原创 scrapy_redis分布式爬虫从redis数据库写入MongoDB中

import redis import pymongo import jsondef process_item(): # 创建redis数据库连接 rediscli = redis.Redis(host = “127.0.0.1”, port = 6379, db = “0”)# 创建MongoDB数据库连接mongocli = pymongo.MongoClie...

2018-02-24 13:47:35 1644 1

原创 python爬虫插入MySQL数据库前去除重复数据的几种方法

在数据存储过程中，可能会遇到数据主键重复的情况，我们可以通过下面几个方法进行处理： 1. 若数据不存在插入，存在更新 2. 使用duplicate key关键字，如插入数据时发生主键冲突就更新数据 3. 使用Ingore关键字 4. 使用replace into关键字一、若数据不存在插入，存在更新：sql = "select name from table where name = ?";if： ...

2018-02-20 22:31:11 10786

原创 re模快的或方法

line = 'xxx出生于2001年6月'line1 = 'xxx出生于2001/6/1'line2 = 'xxx出生于2001-6-1'line3 = 'xxx出生于2001-06-01'line4 = 'xxx出生于2001-06'import re# 最后或方法用小括号（可有可无）pattern = '.*出生于(\d{4}[年/-]\d{1,2}([月/-]$|$|[月/...

2018-02-16 21:15:43 173

原创笔记 urllib,pip,

from urllib import parseparse.urljoin()有两个参数1.base 域名2.子url如果子url 已经有了域名第一个参数的url不起作用。。。反之。。二. 安装pip install -i https://pypi.douban.com/simple 包名...

2018-02-09 21:16:07 457

原创 MONGODB find 的使用

find方法update方法 db.collection.update() 接收三个参数 1. 更新哪个文件（条件） 2. 怎么改。3.如果没有是否要插入（bool）如下图：其他操作符

2018-02-07 09:31:23 330

转载 python charts的使用

charts库实际是对调用Highcharts API 进行封装,通过python生成Highcharts脚本Highcharts中文网:http://v1.hcharts.cn/demo/index.php?p=10Highcharts官网:http://api.highcharts.com/highcharts/titlehttp://nbviewer.jupyter.org/gi

2018-02-06 15:20:14 2462

原创 charts 和 jupyter结合使用

主要是对两个库的使用使用pip进行安装jupyter安装后再cmd下输入 jupyter notebook成功的话会跳转到浏览器：确实好用方便在jupyter中编写有个库是string中的。是标点符号的库。 from string import punctuation 可以做数据的过滤条件。如果不是标点符号清洗数据后，更新 update

2018-02-05 18:07:24 593

原创 scrapy里的pipline里fields方法

使用fields方法处理json数据def user_parse(self, response): # 加载下来的json数据 results = json.loads(response.text) # 调用item文件 items = ZhihuuserItem() # 循环item文件里的字段 fields方法 for item in items

2018-02-05 12:34:47 661

原创 scrapy 中xpath匹配中的精髓

匹配规则的展示xpath匹配完后跟re匹配response.xpath(...).re_first()根据文章内容查询链接的匹配规则response.xpath('//a[contains(.,"汉字")]//@href').extract_first()

2018-02-05 12:19:06 904

转载 pyspider 的使用（1）

pyspider功能强大所以很多朋友刚打开pyspider控制台不知道怎么操作了，尤其是用过scrapy的朋友更是摸不到头脑．为了让大家快速入门，特此分享pyspider控制台的使用说明．首页：说明：队列统计是为了方便查看爬虫状态，优化爬虫爬取速度新增的状态统计．每个组件之间的数字就是对应不同队列的排队数量．通常来是０或是个位数．如果达到了几十甚至一百说明下游组件出现了瓶颈或错误，

2018-01-28 10:12:17 324

青鹏的博客