IIS6/IIS7以上、Nginx、Apache拦截屏蔽垃圾蜘蛛UA爬行降低负载方法IIS7.5如何限制某UserAgent 禁止访问

网站访问慢、CPU和服务器负载高,原因是不知名蜘蛛爬行。作者根据情况写规则屏蔽后负载下降,并整理了IIS、nginx及apache环境下屏蔽不知名蜘蛛ua的方法,还列出各大蜘蛛名字和常见垃圾UA列表。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近网站访问非常慢,cpu占用非常高,服务器负载整体也非常高,打开日志发现有很多不知名的蜘蛛一直在爬行我的站点,根据经验肯定是这里的问题,于是根据我的情况写了规则做了屏蔽,屏蔽后负载降下来了,下面整理下iis及nginx及apache环境下如何屏蔽不知名的蜘蛛ua。海宁育婴师

注意(请根据自己的情况调整删除或增加ua信息,我提供的规则中包含了不常用的蜘蛛ua,几乎用不着,若您的网站比较特殊,需要不同的蜘蛛爬取,建议仔细分析规则,将指定ua删除即可)

IIS7.5测试ok

指定特征禁止UA访问,返回代码403

<rule name="NoUserAgent" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="|特征1|特征2|特征3" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You did not present a User-Agent header which is required for this site" />
</rule>

例如只禁止空UA

<add input="{HTTP_USER_AGENT}" pattern="|^$|特征2|特征3" />

例如禁止其他UA+空UA

<add input="{HTTP_USER_AGENT}" pattern="^$|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot" />

禁止特定蜘蛛

<rewrite>  
<rules>  
<rule name="Block Some Ip Adresses OR Bots" stopProcessing="true">  
<match url="(.*)" />  
<conditions logicalGrouping="MatchAny">  
<add input="{HTTP_USER_AGENT}" pattern="蜘蛛名称" ignoreCase="true" /> <!-- 来禁止特定蜘蛛 -->  
<add input="{HTTP_USER_AGENT}" pattern="^$" /> <!-- 禁止空 UA 访问 -->  
<add input="{REMOTE_ADDR}" pattern="单独IP或使用正则表达的IP地址" />  
</conditions>  
<!--  你也可以使用<action type="AbortRequest" />来直接代替下面这段代码  -->  
<action type="CustomResponse" statusCode="403" statusReason="Access is forbidden." statusDescription="Access is forbidden." />  
</rule>  
</rules>  
</rewrite>  

禁止浏览某文件

<rule name="Block spider">  
      <match url="(^robotssss.txt$)" ignoreCase="false" negate="true" /> <!-- 禁止浏览某文件 -->  
      <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />  
</rule>






1、nginx禁止垃圾蜘蛛访问,把下列代码放到你的nginx配置文件里面。
#禁止Scrapy等工具的抓取

if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {
return 403;
}
#禁止指定UA及UA为空的访问
if ($http_user_agent ~ "opensiteexplorer|MauiBot|FeedDemon|SemrushBot|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|semrushbot|alphaseobot|semrush|Feedly|UniversalFeedParser|webmeup-crawler|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|^$" ) {
return 403;
}
#禁止非GET|HEAD|POST方式的抓取
if ($request_method !~ ^(GET|HEAD|POST)$) {
return 403;
}



2、IIS7/IIS8/IIS10及以上web服务请在网站根目录下创建web.config文件,并写入如下代码即可;

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rules>
<rule name="Block spider">
<match url="(^robots.txt$)" ignoreCase="false" negate="true" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$"
ignoreCase="true" />
</conditions>
<action type="AbortRequest" />
</rule>
</rules>
</rewrite>
</system.webServer>
</configuration>



3、apache请在.htaccess文件中添加如下规则即可:

<IfModule mod_rewrite.c>
RewriteEngine On
#Block spider
RewriteCond %{HTTP_USER_AGENT} "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" [NC]
RewriteRule !(^robots\.txt$) - [F]
</IfModule>



注:规则中默认屏蔽部分不明蜘蛛,要屏蔽其他蜘蛛按规则添加即可

附各大蜘蛛名字:

google蜘蛛:googlebot

百度蜘蛛:baiduspider

百度手机蜘蛛:baiduboxapp

yahoo蜘蛛:slurp

alexa蜘蛛:ia_archiver

msn蜘蛛:msnbot

bing蜘蛛:bingbot

altavista蜘蛛:scooter

lycos蜘蛛:lycos_spider_(t-rex)

alltheweb蜘蛛:fast-webcrawler

inktomi蜘蛛:slurp

有道蜘蛛:YodaoBot和OutfoxBot

热土蜘蛛:Adminrtspider

搜狗蜘蛛:sogou spider

SOSO蜘蛛:sosospider

360搜蜘蛛:360spider




网络上常见的垃圾UA列表
内容采集

FeedDemon
Java 内容采集
Jullo 内容采集
Feedly 内容采集
UniversalFeedParser 内容采集
SQL注入

BOT/0.1 (BOT for JCE)
CrawlDaddy
无用爬虫

EasouSpider
Swiftbot
YandexBot
AhrefsBot
jikeSpider
MJ12bot
YYSpider
oBot
CC攻击器

ApacheBench
WinHttp
TCP攻击

HttpClient
扫描

Microsoft URL Control
ZmEu phpmyadmin
jaunty




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值