BeautifulSoup 善于网页html 解析
请参考: http://www.crummy.com/software/BeautifulSoup/bs3/documentation.zh.html
例如: 抓取优快云极客头条内容 soup.py
import urllib2, re
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://geek.youkuaiyun.com/new")
soup = BeautifulSoup(page)
for h4 in soup.findAll('h4'):
if h4.a is not None:
text = h4.a.text
href = h4.a.get('href')
print text
print href
page.close()