Python Challenge 第 4 关攻略:follow the chain
题目地址
http://www.pythonchallenge.com/pc/def/linkedlist.php
题目内容

题目解法
- 网页的标题是
follow the chain追随链条 - 网页的
URL地址是linkedlist链表 - 图中也是链条
首先查看网页源代码,发现注释:
<!-- urllib may help. DON'T TRY ALL NOTHINGS, since it will never
end. 400 times is more than enough. -->
提示使用 urllib 库,还说不要尝试所有的 nothings ,因为它永远不会结束, 400 次足够了。
发现 a 标签里有链接,点击图片跳转,得到如下的 URL :
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
发现网页内容是 and the next nothing is 44827
把 nothing 改了应该可以继续跳转,下面用 urllib 库试一下,获取 400 次会有什么样的响应。
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'\d+', content).group()
except:
print(content)
break
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
发现获取到一半的时候报错了,加入异常处理,然后打印出出错的内容,得到:
Yes. Divide by two and keep going. 即没错,除以二然后继续。
那么把初始的 URL 设置为它的上一个数字除以二,即 16044 / 2 = 8022 , 继续循环。
相应地修改代码如下:
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'\d+', content).group()
except:
print(content)
contents.append(content + '\n')
suffix = str(int(int(suffix) / 2))
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
发现又报错了,报错之前的数字后缀是 82683 ,于是访问以下网址:
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82683
发现提示信息: You've been misleaded to here. Go to previous one and check.
即你被误导到这里了,返回上一页检查。
按照提示返回上一页:http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82682
网页显示: There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
即文本中可能存在误导数字。一个例子就是 82683 。
所以为了提取正确的数字,需要修改正则表达式,另外我加入了文本写入的代码,方便以后查看,修改后的代码如下:
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'next nothing is (\d+)', content).group(1)
except:
print(content)
contents.append(content + '\n')
suffix = str(int(int(suffix) / 2))
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
with open('level4.txt', 'w', encoding = 'utf-8') as fp:
fp.writelines(contents)
检查输出,发现 peak.html ,修改网址,进入下一关:
http://www.pythonchallenge.com/pc/def/peak.html

通过解析网页提示和使用Python的urllib库进行400次迭代请求,逐步解开链表谜题,最终找到通往下一关的线索。
570

被折叠的 条评论
为什么被折叠?



