shengblue-优快云博客

原创常见的dos命令

//盘符：进入指定的盘下面dir 列出当前控制台所在的路径下的所有文件以及文件夹cd 进入指定的路径下cd … 回退到上一级目录cd \ 回退到当前路径的根目录md 文件夹的名字创建一个文件夹rd 文件夹的名字删除一个空文件夹。注意：不能用于删除非空的文件夹e...

2019-08-12 16:10:46 95

原创 Java基础知识

// 单行注释/* …/ 多行注释/**…/ 文档注释

2019-08-12 13:44:08 91

原创命令

pip install xxx 安装xxx依赖包pip list 查看所有的依赖包pip freeze 查看新安装的包pip uninstall xxx 卸载xxx包

2019-08-12 13:27:32 99

原创 python爬虫的selenium库无头操作

from selenium import webdriverfrom selenium.webdriver import ChromeOptionsco=ChromeOptions()# 新建一个参数对象co.add_argument('--headless')# 添加参数dirver=webdriver.Chrome(chrome_options=co)dirver.get("h...

2019-03-22 19:16:24 718

原创 python爬虫的selenium库的使用

from selenium import webdriverfrom selenium.webdriver.common.keys import Keysbroesor =webdriver.Chrome()# 创建一个chrome调试浏览器broesor.get('http://www.baidu.com/index.html')broesor.find_element_by_nam...

2019-03-22 17:52:47 135

原创 python使用urllib库进行爬虫

from urllib import request# 新建一个http的处理对象http_handler=request.HTTPHandler(debuglevel=0)# 新建一个打开器opener=request.build_opener(http_handler)req=request.Request('http://www.baidu.com')res=opener.op...

2019-03-21 20:31:07 114

原创 python爬虫中xpath的用法

import lxmlfrom lxml import etreehtmlfiel=''' <html> <body> <ul> <li class="item-0"><a href="link0.html">first item1</a>&...

2019-03-21 20:28:03 386

原创经常使用到的一些命令

退出虚拟环境的命令deactivate

2019-03-21 19:15:18 89

原创将爬取的数据写入mysql数据库的代码

def save_to_database(object): db=pymysql.connect(host="127.0.0.1",database="company",user='root',password='123456') # 连接数据库 cursor=db.cursor() # 建立游标 for job in job_list: n...

2019-03-21 19:00:57 1041

原创爬虫中代理的使用

import requestsfrom fake_useragent import UserAgentfrom bs4 import BeautifulSoupproxy_get=requests.get('http://193.112.219.93:5000/get')proxy={ 'http':'http://'+proxy_get.text}ua=UserAgent(...

2019-03-20 22:05:55 249

原创 beautifulsoup的用法

from bs4 import BeautifulSoup# 新建一个soup对象doc_html="""<title class='title'><a></a></title>"""soup=BeautifulSoup(doc_html,'lxml')print(soup,type(soup))print(soup.head)# 文...

2019-03-20 21:45:24 222

原创深度策略进行爬取页面，其中的知识点有re的使用，返回状态码的使用，递归爬取，字典的使用

import reimport requestsfrom fake_useragent import UserAgentua=UserAgent()headers={ 'user-agent':ua.random}def getHTML(url): # 获取这个url的响应 res = requests.get(url=url, headers=header...

2019-03-20 20:04:09 169

原创爬虫下载cookie

from urllib import requestimport urllibfrom http import cookiejarmy_cookiejar=cookiejar.LWPCookieJar(filename='baidu.txt')cookie_handler=request.HTTPCookieProcessor(my_cookiejar)opener=request....

2019-03-20 19:16:53 316

shengblue的博客