question:
Training: WWW-Robots (HTTP, Training)
WWW-Robots
In this little training challenge, you are going to learn about the
Robots_exclusion_standard.
The robots.txt file is used by web crawlers to check if they are allowed to crawl and index your website or only parts of it.
Sometimes these files reveal the directory structure instead protecting the content from being crawled.
Enjoy!
The robots.txt file is used by web crawlers to check if they are allowed to crawl and index your website or only parts of it.
Sometimes these files reveal the directory structure instead protecting the content from being crawled.
Enjoy!
solution:
go to the website : www.WeChall.net/robots.txt
and we saw this:
User-agent: *
Disallow: /challenge/training/www/robots/T0PS3CR
CP the directory after www.WeChall.net and go. SUCCESS!
本文介绍如何使用Robots.txt文件控制网站爬虫的访问权限,包括了解其标准、查看特定网站的Robots.txt文件内容,并提供一个实际操作示例。
6924

被折叠的 条评论
为什么被折叠?



