
Python网络爬虫与信息提取
爬虫相关
cycy小陈
进一步有一步的欢喜。
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Python爬虫 爬取网页的例子
>>> import requests >>> r = requests.get("https://item.jd.com/2967929.html") >>> r.status_code 200 >>> >>> r.encoding 'gbk' >>原创 2018-09-11 18:16:38 · 2147 阅读 · 0 评论 -
Python爬虫 淘宝商品信息定向爬虫
代码: import requests import re def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text...原创 2018-09-13 17:25:50 · 775 阅读 · 1 评论 -
re贪婪匹配
原创 2018-09-13 16:46:48 · 290 阅读 · 0 评论 -
Python爬虫 match对象
原创 2018-09-13 16:34:01 · 395 阅读 · 0 评论 -
Python爬虫 正则表达式RE
>>> import re >>> match = re.search(r'[1-9]\d{5}', 'BIT 100081') >>> if match: print(match.group(0)) 100081 #match匹配开始位置 >>> match = re.ma...原创 2018-09-13 16:27:34 · 247 阅读 · 0 评论 -
Python爬虫 ruquests库的几种方法
post函数 >>> payload = {'key1':'value1','key2':'value2','key3':'value3'} >>> r = requests.post('http://httpbin.org/post', data = payload) >>> print(r.text) { "args": {}, ...原创 2018-09-11 18:00:42 · 849 阅读 · 0 评论 -
Python 爬虫通用代码框架
import requests def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: ...原创 2018-09-10 21:50:48 · 1701 阅读 · 0 评论 -
Python爬虫 中国大学排名爬虫
案例: import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url): #爬取最好大学排名网站内容 try: r = requests.get(url, timeout = 30) r.raise_for_status() r.enco...原创 2018-09-12 23:05:28 · 2500 阅读 · 1 评论 -
Python爬虫 Scrapy 股票信息爬取
PS G:\pycourse> scrapy startproject BaiduStocks New Scrapy project 'BaiduStocks', using template directory 'c:\\python\\python37\\lib\\site-packages\\scrapy\\templates\\project', created in: G...原创 2018-09-15 17:20:24 · 1063 阅读 · 0 评论 -
scrapy crawl douban_spider这个出错 def write(self, data, async=False)
。。。 。。。 from twisted.conch import manhole, telnet File "d:\jsuk\python37\lib\site-packages\twisted\conch\manhole.py", line 154 def write(self, data, async=False): ...原创 2018-09-15 16:08:07 · 219 阅读 · 0 评论 -
Python爬虫 Html标签树
>>> r = requests.get("https://python123.io/ws/demo.html") >>> from bs4 import BeautifulSoup >>> demo = r.text >>> soup = BeautifulSoup(demo, "html.parser&qu原创 2018-09-11 22:50:01 · 1333 阅读 · 0 评论 -
Python爬虫 BeautifulSoup
>>> import requests >>> r = requests.get("https://python123.io/ws/demo.html") >>> r.text '<html><head><title>This is a python demo page&原创 2018-09-11 22:38:16 · 280 阅读 · 0 评论 -
Python爬虫 IP地址查询
原创 2018-09-11 20:33:32 · 1171 阅读 · 0 评论 -
Python爬虫 将网络图片爬去并保存到本地
代码: import requests import os url = "https://gss0.baidu.com/-Po3dSag_xI4khGko9WTAnF6hhy/zhidao/wh%3D600%2C800/sign=bc75fc5640a7d933bffdec759d7bfd2b/d009b3de9c82d1587f799ff3820a19d8bd3e42fd.jpg" root...原创 2018-09-11 18:41:35 · 1931 阅读 · 1 评论 -
Python爬虫 百度360信息搜索并爬取
对百度输入要搜索的信息,并怕去返回的网页信息 import requests keyword = "Python" try: kv = {'wd': keyword} r = requests.get('https://www.baidu.com/s', params=kv) print(r.request.url) r.raise_for_status() ...原创 2018-09-11 18:27:53 · 2076 阅读 · 0 评论 -
Python爬虫 股票数据定向爬虫
import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url, code='utf-8'): try: r = requests.get(url) r.raise_for_status() r.encoding =...原创 2018-09-13 21:06:09 · 339 阅读 · 0 评论