Python Monitor Water Falls(1)Ideas from Scrapy and Twisted

本文介绍如何使用Python的Scrapy框架和Twisted库进行网页数据抓取,并通过Selenium加载动态网页内容。此外,还展示了如何构建RESTful API服务,包括GET、POST、PUT和DELETE等操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Python Monitor Water Falls(1)Ideas from Scrapy and Twisted

>python -V
Python 2.7.13

>pip --version
pip 9.0.1 from /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages (python 2.7)

Make sure we have the Scrapy Installed
>pip install scrapy
>pip install scrapyd

>scrapy version
Scrapy 1.4.0

>scrapyd --version
twistd (the Twisted daemon) 17.5.0

Visit the Web Page https://hydromet.lcra.org/riverreport

Basic method to load the content open_url.py
from selenium import webdriver

path2phantom = '/opt/phantomjs/bin/phantomjs'
browser = webdriver.PhantomJS(path2phantom)
browser.get('https://hydromet.lcra.org/riverreport')

tables = browser.find_elements_by_css_selector('table.table-condensed')

tbody = tables[5].find_element_by_tag_name("tbody")
for row in tbody.find_elements_by_tag_name("tr"):
cells = row.find_elements_by_tag_name("td")
if(cells[0].text == 'Marble Falls (Starcke)'):
print cells[1].text

browser.quit()

By executing the command directly, it will tell you the content
>python open_url.py
Jan 21 2018: No releases

Twisted RESTful
download the latest version
>wget https://twistedmatrix.com/Releases/Twisted/17.9/Twisted-17.9.0.tar.bz2
Unzip and go into that directory
>python setup.py install

Check the version
>>>> import twisted
>>> print twisted.version
[Twisted, version 17.5.0]

Maybe long time ago, I already install 17.5.0
This RESTful library is a little old
https://github.com/iancmcc/txrestapi

RESTful with twisted
BookAPI.py
from txrestapi.methods import GET, POST, PUT, ALL, DELETE
from txrestapi.resource import APIResource
import json

class BookResource(APIResource):
@GET('^/book/(?P<id>[^/]+)')
def getBook(self, request, id):
return 'Pick up one book with id %s' % id

@GET('^/book/')
def books(self,request):
return "books"

@PUT('^/book/(?P<id>[^/]+)')
def updateBook(self,request, id):
book = json.loads(request.content.read())
book['id'] = id
return json.dumps(book)

@POST('^/book/')
def saveBook(self, request):
book = json.loads(request.content.read())
book['id'] = 1
return json.dumps(book)

@DELETE('^/book/(?P<id>[^/]+)')
def deleteBook(self,request,id):
return "Delete book with id %s" % id

@ALL('^/')
def default_view(self, request):
return "I fail to match other URLs."

RestAPI.py
from twisted.web.resource import Resource
from twisted.web.server import Site
from twisted.internet import reactor

from BookAPI import BookResource

bookResource = BookResource()

rootResource = Resource()
rootResource.putChild("book", bookResource)

site = Site(rootResource, timeout=None)
reactor.listenTCP(8888,site)
reactor.run()


References:
https://scrapy.org/
https://hydromet.lcra.org/riverreport

Twisted
http://sillycat.iteye.com/blog/2243749
http://sillycat.iteye.com/blog/2243775
http://sillycat.iteye.com/blog/2244167
http://sillycat.iteye.com/blog/2244169

python json
https://code.tutsplus.com/tutorials/how-to-work-with-json-data-using-python--cms-25758
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值