python基础学习-7(简单爬虫)

转载于 2013-04-18 21:18:00 发布 · 84 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://my.oschina.net/OQKuDOtsbYT2/blog/123741

文章标签：

#python #爬虫

本文提供了一个使用Python编写的简单脚本，该脚本利用正则表达式从指定URL中抓取所有jpg格式的图片并下载到本地。通过用户输入目标网址，脚本能够自动解析网页源代码并提取图片链接。

实例：下载某URL下的jpg图片

#!/usr/bin/python
import re#导入正则模块
import urllib#导入url模块

def getHtml(url):#获取url页面源代码
    page = urllib.urlopen(url)
    html = page.read()
    return html

def getImg(html):#下载url页面的jpg
    reg = r'src="(.*?\.jpg)" .* width'#括号.*?里面为非贪婪匹配
    imgre = re.compile(reg)#编译正则加快运行速度
    imgList = re.findall(imgre,html)
    x = 1
    for imgurl  in imgList:
        urllib.urlretrieve(img,'%s.jpg' % x)#下载jpg，并命名
        x = x+1
url = raw_input("please input your download url:")#输入url
html = getHtml(url)
getImg(html)

转载于:https://my.oschina.net/OQKuDOtsbYT2/blog/123741