- 博客(31)
- 收藏
- 关注
转载 SQL kaggle learn : WHERE AND
WHERE trip_start_timestamp Between '2017-01-01' And '2017-07-01' and trip_seconds > 0 and trip_miles > 0WHERE trip_start_timestamp > '2017-01-01' and trip_start_timestamp <...
2019-04-04 16:49:00
229
转载 SQL kaggle learn with as excercise
rides_per_year_query = """SELECT EXTRACT(YEAR FROM trip_start_timestamp) AS year ,COUNT(unique_key) AS num_tripsFROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`GROUP BY yearORD...
2019-04-04 11:37:00
258
转载 SQL count(1)
If you are ever unsure what to put inside aCOUNT()aggregation, you can doCOUNT(1)to count the rows in each group. Most people find it especially readable, because we know it's not focusing ...
2019-04-02 20:58:00
257
转载 sql where 里面判定要加 ' '
WHERE year>=2010 and year<=2017 and indicator_code = 'SE.XPD.TOTL.GD.ZS'转载于:https://www.cnblogs.com/bamboozone/p/10644973.html
2019-04-02 20:09:00
219
转载 kaggle learn python
def has_lucky_number(nums): return any([num % 7 == 0 for num in nums])def menu_is_boring(meals): """Given a list of meals served over some period of time, return True if the...
2019-03-24 21:47:00
330
转载 pandas
df = reviews.loc[:99,['country','variety']] or df = reviews.loc[[1,2,3,4],['country','variety']]df = reviews.loc[[0,1,10,100],['country','province','region_1','region_2']] 两颜色不能互换,必须index在前iloc...
2019-03-24 19:17:00
100
转载 实习僧的字体加密破解
1,https://www.hitoy.org/tool/file_base64.php 选择base64编码生成文件格式选择ttf2,https://fontdrop.info/ 上传ttf鼠标悬停会显示所有的被加密字和对应关系转载于:https://www.cnblogs.com/bamboozone/p/10555027.html...
2019-03-18 21:13:00
459
转载 cookiejar
referer:https://www.cnblogs.com/why957/p/9297779.html文章介绍了四种模拟登陆方法yield Request()可以将一个新的请求返回给爬虫执行在发送请求时cookie的操作,meta={'cookiejar':1}表示开启cookie记录,首次请求时写在Request()里meta={'cookiejar':response...
2019-03-09 11:54:00
808
转载 煎蛋ooxx
pipeline.pyclass Jiandanline(FilesPipeline): def get_media_requests(self, item, info): for file_url in item['file_urls']: yield scrapy.Request(file_url) de...
2019-03-08 20:04:00
391
转载 纪念一下学写pipeline时脑子里的坑
用的是filespipeline,用的存储地址是images的地址测试煎蛋ooxx首页,shell测试的时候返回很多列表,但是实际爬的时候一直只返回一条,很烦,一直测一直测,就是不行,后来才发现,首页已经刷新了就是只有一条。。。。def file_path 写不好的话,会被def item_completed当成无效文件过滤掉file path只是写一个路径名,只是一个路径名...
2019-03-08 15:58:00
209
转载 scrapy流程图
refer:https://blog.yongli1992.com/2015/02/08/python-scrapy-module/这里是一张Scrapy架构图的展示。Scrapy Engine负责整个程序的运行。Scheduler负责调度要访问的网址。Downloader负责从网络获取响应。Spider负责分析响应,从响应中解析出我们要的数据,同时也负责找出接下来要访问的后续网...
2019-03-08 13:36:00
139
转载 改写pipeline
为什么要改写方法:get_media_requests,他们的区别在哪里def get_media_requests(self, item, info):#原始的 return [Request(x) for x in item.get(self.images_urls_field, [])]def get_media_requests(self, ...
2019-03-08 13:30:00
164
转载 super()
fromhttps://mozillazg.com/2016/12/python-super-is-not-as-simple-as-you-thought.html# 这个作者真的牛逼在单继承中super就像大家所想的那样,主要是用来调用父类的方法的。class A: def __init__(self): self.n = 2 ...
2019-03-07 10:15:00
207
转载 os.path.join
os.path.join()函数:第一个以”/”开头的参数开始拼接,之前的参数全部丢弃。以上一种情况为先。在上一种情况确保情况下,若出现”./”开头的参数,会从”./”开头的参数的上一个参数开始拼接import os print("1:",os.path.join('aaaa','/bbbb','ccccc.txt')) print("2:...
2019-03-06 21:51:00
120
转载 scrapy item pipeline
item pipelineprocess_item(self, item, spider) #这个是所有pipeline都必须要有的方法在这个方法下再继续编辑具体怎么处理另可以添加别的方法open_spider(self, spider) This method is called when the spider is opened.close...
2019-03-05 21:05:00
252
转载 学习使用scrapy itemspipeline过程
开始非常不理解fromhttps://www.jianshu.com/p/18ec820fe706 找到了一个比较完整的借鉴,然后编写自己的煎蛋pipeline首先在items里创建image_urls = scrapy.Field() #images = scrapy.Field() #这两个是必须的image_paths = sc...
2019-03-05 20:16:00
106
转载 dygod.net
# -*- coding: utf-8 -*-import scrapyfrom scrapy.linkextractors import LinkExtractorfrom scrapy.spiders import CrawlSpider, Ruleclass DgSpider(CrawlSpider): name = 'dg' # a...
2019-03-03 10:08:00
3424
转载 https://scrapingclub.com/exercise/detail_sign/
def parse(self, response): # pattern1 = re.compile('token=(.*?);') # token = pattern1.findall(response.headers.getlist("set-cookie")[1].decode("utf-8"))[0] patt...
2019-03-02 11:21:00
188
转载 https://scrapingclub.com/exercise/basic_captcha/
def parse(self, response): # set_cookies = response.headers.getlist("set-cookie").decode("utf-8") pattern1 = re.compile('csrftoken=(.*?);') pattern2 = re.compil...
2019-03-01 16:52:00
626
转载 https://scrapingclub.com/exercise/basic_login/
遇到的问题:csrftoken cfduid 是在request.headers里面的,一直在找怎么在scrapy里get request.header,从scrapy shell ,then fetch then request.headers可以get正确的内容,但是scrapy project中,不知道怎么写代码,网上找到response.request.headers,这个写...
2019-03-01 11:21:00
303
转载 Python scrapy - Login Authenication Issue
https://stackoverflow.com/questions/37841409/python-scrapy-login-authenication-issuefrom scrapy.crawler import CrawlerProcessimport scrapyfrom scrapy.http import Requestclass FirstS...
2019-03-01 10:44:00
161
转载 https://scrapingclub.com/exercise/detail_cookie/
def parse(self, response): pattern=re.compile('token=(.*?);') token=pattern.findall( response.headers.get("set-cookie").decode("utf-8"))[0] cookie = { ...
2019-02-27 14:47:00
349
转载 scrapy:get cookie from response
scrapy shellfetch('your_url')response.headers.getlist("Set-Cookie")https://stackoverflow.com/questions/46543143/scrapy-get-cookies-from-response-request-headers response.headers 返回...
2019-02-27 10:04:00
527
转载 css selectors tips
from https://saucelabs.com/resources/articles/selenium-tips-css-selectorsSauce Labs uses cookies to give you the best online experience. If you continue to use this site, you agree to the use o...
2019-02-24 18:30:00
410
转载 css选择问题
<div class="col-lg-4 col-md-6 mb-4"><div class="card"><a href="/exercise/list_basic_detail/90008-E/"><img class="card-img-top img-fluid" src="/static/img/90008-E.jpg"...
2019-02-23 19:32:00
165
转载 从js中提取数据
<script language="JavaScript" type="text/javascript+gk-onload"> SKART = (SKART) ? SKART : {}; SKART.analytics = SKART.analytics || {}; SKART.analytics["category"] = "tele...
2019-02-21 12:35:00
1106
转载 materials
http://interactivepython.org/runestone/static/pythonds/index.htmlhttps://blog.michaelyin.info/scrapy-exercises-make-you-prepared-for-web-scraping-challenge/https://scrapingclub.com/https://...
2019-02-21 09:00:00
207
转载 xpath ,css
https://docs.scrapy.org/en/latest/intro/tutorial.htmlxpath @选择属性 .当前目录下选择 //任意路径选择/bookstore/book[position()<3],选取最前面的两个属于 bookstore 元素的子元素的 book 元素cssspan.text::textresponse.css("...
2019-02-13 20:32:00
100
转载 chromedriver 全屏 翻页 错误
from selenium import webdriverfrom selenium.common.exceptions import TimeoutException, StaleElementReferenceExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.sup...
2019-01-29 14:51:00
248
转载 Pycharm学习python路
import 模块之后是灰色的表明没有被引用过lxml找不到的话用anaconda prompt :pip uninstall lxml 重新安装用request时,写的reg无法正确解析网页,先print然后再写regpyquery 的attr()获取不到值,因为只获取第一个值,具体参照https://www.cnblogs.com/airnew/p/10056551...
2019-01-29 14:01:00
349
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人
RSS订阅