Requests
网络资源(URLs)拮取套件
改善Urllib2的缺点,让使用者更加方便的获取网络资源
可以使用REST操作(POST,PUT,GET DELETE)存取网络资源
requests 使用
import requests
res = requests.get('http://news.sina.com.cn/china/')
res.encoding = 'utf-8'
print(res.text)
BeautifulSoup4 范例
from bs4 import BeautifulSoup
html_sample = '\
<html> \
<body> \
<h1 id = "title">Hello world</h1> \
<a href = "#" class = "link">This is link1</a> \
<a href = "# link2" class = "link">This is link2</a> \
</body> \
</html>'
soup = BeautifulSoup(html_sample, 'html.parser')
print(soup.text)
使用select 找出含有h1标签的元素
soup = BeautifulSoup(html_sample)
header = soup.select('h1')
print(header)
使用select找出含有a标签的元素
soup = BeautifulSoup(html_sample)
alink = soup.select('a')
print(alink)