Python Challenge 第 4 关攻略：follow the chain

最新推荐文章于 2023-12-20 23:12:18 发布

原创最新推荐文章于 2023-12-20 23:12:18 发布 · 461 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Python #Python Challenge #攻略

腾蛇起陆同时被 2 个专栏收录

42 篇文章

订阅专栏

大蛇挑战

6 篇文章

订阅专栏

通过解析网页提示和使用Python的urllib库进行400次迭代请求，逐步解开链表谜题，最终找到通往下一关的线索。

`Python Challenge` 第 `4` 关攻略：follow the chain

题目地址
http://www.pythonchallenge.com/pc/def/linkedlist.php

题目内容

题目解法

网页的标题是 follow the chain 追随链条
网页的 URL 地址是 linkedlist 链表
图中也是链条

首先查看网页源代码，发现注释：

<!-- urllib may help. DON'T TRY ALL NOTHINGS, since it will never 
end. 400 times is more than enough. -->

提示使用 urllib 库，还说不要尝试所有的 nothings ，因为它永远不会结束， 400 次足够了。

发现 a 标签里有链接，点击图片跳转，得到如下的 URL ：
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
发现网页内容是 and the next nothing is 44827
把 nothing 改了应该可以继续跳转，下面用 urllib 库试一下，获取 400 次会有什么样的响应。

from urllib.request import urlopen
import re

suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'

for i in range(400):
    response = urlopen(url)
    html = str(response.read())
    content = re.search(r"'(.+)'", html).group(1)
    try:
        suffix = re.search(r'\d+', content).group()
    except:
        print(content)
        break
    contents.append(suffix + '\n')
    url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
    print(suffix)

发现获取到一半的时候报错了，加入异常处理，然后打印出出错的内容，得到：
Yes. Divide by two and keep going. 即没错，除以二然后继续。

那么把初始的 URL 设置为它的上一个数字除以二，即 16044 / 2 = 8022 ，继续循环。
相应地修改代码如下：

from urllib.request import urlopen
import re

suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'

for i in range(400):
    response = urlopen(url)
    html = str(response.read())
    content = re.search(r"'(.+)'", html).group(1)
    try:
        suffix = re.search(r'\d+', content).group()
    except:
        print(content)
        contents.append(content + '\n')
        suffix = str(int(int(suffix) / 2))
    contents.append(suffix + '\n')
    url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
    print(suffix)

发现又报错了，报错之前的数字后缀是 82683 ，于是访问以下网址：
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82683
发现提示信息： You've been misleaded to here. Go to previous one and check.
即你被误导到这里了，返回上一页检查。
按照提示返回上一页：http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82682
网页显示： There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
即文本中可能存在误导数字。一个例子就是 82683 。

所以为了提取正确的数字，需要修改正则表达式，另外我加入了文本写入的代码，方便以后查看，修改后的代码如下：

from urllib.request import urlopen
import re

suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'

for i in range(400):
    response = urlopen(url)
    html = str(response.read())
    content = re.search(r"'(.+)'", html).group(1)
    try:
        suffix = re.search(r'next nothing is (\d+)', content).group(1)
    except:
        print(content)
        contents.append(content + '\n')
        suffix = str(int(int(suffix) / 2))
    contents.append(suffix + '\n')
    url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
    print(suffix)

with open('level4.txt', 'w', encoding = 'utf-8') as fp:
    fp.writelines(contents)