python简单爬虫，爬取整个页面

The_theme

于 2020-10-26 18:06:19 发布

阅读量3.2k

点赞数 4

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.youkuaiyun.com/weixin_51343683/article/details/109295084

版权

python 专栏收录该内容

19 篇文章

订阅专栏

本文介绍了一个简单的Python爬虫程序，该程序能够抓取指定网址的全部HTML内容并将其保存为本地文件。文章中提供了完整的代码示例，并指出了一些网站可能存在的反爬虫机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

简单的爬虫，爬取整个页面，修改代码中的url可爬取指定网站。

import urllib.request  # 导入包


def getHtml(url):  # 获取html的内容
    html = urllib.request.urlopen(url).read()  # bytes 如果不用read()，html会是一个↓
    return html                                # http.client.HTTPResponse的变量


def saveHtml(fileName, fileContent):
    with open(fileName, "wb") as f:  # 以wb打开文件
        f.write(fileContent)  # 写入


def main():
    url = "https://www.zhihuishu.com/"  # url
    html = getHtml(url)  # 调用函数获取bytes
    saveHtml("theHtml.html", html)  # 保存
    print("保存网页完成")  # 提示语


if __name__ == "__main__":  # 主函数
    main()