nginx 屏蔽采集的 $http_user_agent

最新推荐文章于 2025-10-14 07:54:04 发布

原创最新推荐文章于 2025-10-14 07:54:04 发布 · 2.6k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#nginx #centos

本文介绍了一种使用Nginx配置文件来屏蔽特定爬虫的方法，通过设置条件判断请求头中的User-Agent，有效阻止如Yandex、Ahrefs等搜索引擎爬虫及特定路径的访问，以保护网站资源。

block_cralwer

# save at /etc/nginx/block_cralwer
# then use it `include block_cralwer` at `server` directive

set $fbd 0;

if ($http_user_agent ~* "yandex|Ahref|MJ12bot|XoviBot|SemrushBot|AhrefsBot|Twitterbot|Claritybot|Crawler|Python") {
    set $fbd 1;
}

location ~* \/(plus|data|trust|include|shtml|bbs|rank|rxcq|tager) {
    set $fbd 1;
}

location ~ ^/(wp-admin|wp-login\.php) {
    set $fbd 1;
}

if ($fbd = 1 ) {
    return 403;
}