OpenResty实战应用

最新推荐文章于 2024-03-13 08:20:53 发布

am540

最新推荐文章于 2024-03-13 08:20:53 发布

阅读量385

点赞数

分类专栏：经验

原文链接：https://zhuanlan.zhihu.com/p/83209234

版权

经验专栏收录该内容

30 篇文章

订阅专栏

转载自https://zhuanlan.zhihu.com/p/83209234

文章目录

黑名单

为了防止恶意用户或者爬虫请求服务器，从而造成对正常请求的影响，一般会为这些用户创建一个黑名单，阻止访问。在OpenResty access_by_lua指令处于请求访问阶段，用于访问控制。我们将代码黑名单代码使用access_by_lua执行。本文提供了以下三种添加黑名单的方法：

静态黑名单

Nginx配置示例：

location /lua {
    default_type 'text/html';
    access_by_lua_file /path/to/access.lua;
    content_by_lua 'ngx.say("hello world")';
}

lua代码示例：

-- 加入要限制的黑名单,例如IP黑名单
 local blacklist = {
     ["10.10.76.111"] = true,
     ["10.10.76.112"] = true,
     ["10.10.76.113"] = true
 }
 
 local ip = ngx.var.remote_addr
 if blacklist[ ip ] then
    return ngx.exit( ngx.HTTP_FORBIDDEN )
 end

上面的方式直接将黑名单放在lua table中，每个请求都查询lua table中是否包含，若是则阻止访问后端服务器，返回HTTP_FORBIDDEN。该方案每次添加删除会修改配置文件和Reload Nginx，不太适合频繁操作。下面提供一种动态黑名单方法。

动态黑名单（一）

将黑名单放在Redis服务器中，对于每一个请求，获取参数，查询Redis是否存在，若存在和上面方案类似返回HTTP_FORBIDDEN。

lua代码示例：

 local redis = require "resty.redis"
 local redis_host = "127.0.0.1"
 local redis_port = 6379
 
 local red = redis:new()
 red:set_timeout(100)
 local ok, err = red:connect(redis_host, redis_port)
 if not ok then
     return
end

local cid = ngx.var.arg_cid
local res, _ = red:get( cid )
if res and res ~= ngx.null then
    return ngx.exit( ngx.HTTP_FORBIDDEN )
end

该方案引入了lua-resty-redis模块，用于判断参数cid是否在存放黑名单的Redis服务器中。该方案避免黑名单更新而修改配置和Reload Nginx，但每次请求都要访问Redis服务器，下面的方案对此进行优化。

动态黑名单（二）

引入OpenResty提供的共享内存ngx.shared.DICT，用于存放所有黑名单，并定期从Redis更新避免每次请求均访问Redis，提高性能。

Nginx配置示例：

 http {
 
     lua_shared_dict blacklist 1m; 
     init_worker_by_lua_file "/path/to/init_redis_blacklist.lua";
     server {
         listen       2019;
 
         location /redis/blacklist/dynamic {
              default_type 'text/html';
            access_by_lua_file "/path/to/redis_blacklist_dynamic.lua";
            content_by_lua_block{
                ngx.say('hello 2019')
            }
        }
    }
}

首先，使用lua_shared_dict指令定义变量blacklist，并分配1M大小的共享内存空间，用于存放黑名单（共享内存的大小根据具体情况评估，适当设置）。

其次，使用init_worker_by_lua_file指令执行init_redis_blacklist.lua，该代码执行定时器，定期从Redis拉取黑名单数据，注意lua_shared_dict和init_worker_by_lua_file指令的环境均在http块。

init_redis_blacklist.lua代码如下：

 local redis = require "resty.redis"
 local redis_host = "127.0.0.1"
 local redis_port = 6379
 local delay = 5
 
 local redis_key = "blacklist"
 local blacklist = ngx.shared.blacklist
 
 local function timer_work( delay, worker )
    local timer_work

    timer_work = function (premature)
        if not premature then
            local rst, err_msg = pcall( worker )
            if not rst then
                ngx.log(ngx.ERR, "timer work:", err_msg)
            end
            ngx.timer.at( delay, timer_work )
        end
    end
    ngx.timer.at( delay, timer_work )
end

local function update_blacklist()
    local red = redis:new()
    local ok, err = red:connect(redis_host, redis_port)
    if not ok then
        ngx.log(ngx.ERR, "redis connection error: ", err)
        return
    end

     local new_blacklist, err = red:smembers(redis_key)
     if err then
        ngx.log(ngx.ERR, "Redis read error: ", err)
        return
    end

    blacklist:flush_all()

    for _, k in pairs(new_blacklist) do
        blacklist:set(k, true);
    end
end

if 0 == ngx.worker.id() then
    timer_work( delay, update_blacklist )
end

timer_work函数调用OpenResty提供的API ngx.timer.at，创建反复循环使用的定时器。（我们将定时器仅绑定在worker.id为0的进程上）定时器每隔delay秒执行一次update_blacklist函数，update_blacklist会连接Redis服务器，调用smember获取集合redis_key中的所有成员，先使用ngx.shared.DICT.flush_all清空共享内存blacklist中数据后，遍历new_blacklist将数据添加到blacklist中。

最后，在对应的location中，添加access_by_lua_file指令，执行redis_blacklist_dynamic.lua，代码如下，和前面静态黑名单lua代码类似：

local blacklist = ngx.shared.blacklist

local function run()
    local cid = ngx.var.arg_cid
    if blacklist:get( cid ) then
        return ngx.exit( ngx.HTTP_FORBIDDEN )
    end
end

小结

以上列举了三种添加黑名单的方法，可以根据具体情况选择：

第一种方法静态黑名单，配置简单，不依赖Redis，但不适合频繁添加和修改；
第二种动态黑名单方法，对每一个请求访问Redis，相对于第三种方案，黑名单实时性较强，但是每次都通过网络访问Redis；
第三种方案将黑名单存放在共享内存中，定期更新，避免每次请求都访问Redis，提高性能。

限流

限流的目的又很多，在本文提及的限流主要用于：防止非用户攻击、正常突发流量保护。

Nginx提供了模块ngx_http_limit_req_module和ngx_http_limit_conn_module分别用于控制速率和控制并发连接数。详细可以参考官网。下面主要介绍lua-resty-limit-traffic和lua-resty-redis-ratelimit模块的在线上的应用。

lua-resty-limit-traffic
我们使用lua-resty-limit-traffic来限制location的请求速率，使upstream集群的请求速率在预估负载范围内，避免突发流量导致upstream集群被压垮（lua-resty-limit-traffic包含了resty.limit.req、resty.limit.count、resty.limit.conn以及resty.limit.traffic四个模块，以下例子仅使用了其中的resty.limit.req模块）。

请求速率nginx配置示例：

 http {
     lua_shared_dict location_limit_req_store 1m; 
     server {
         listen       2019;
         location /limit/traffic {
            access_by_lua_file "/path/to/limit_traffic.lua";
            default_type 'text/html';
            content_by_lua_block{
                ngx.say('hello 2019')
            }
        }
    }
}

首先使用lua_shared_dict指令定义变量location_limit_req_store用于限流，并分配1M大小的共享内存空间（共享内存的大小根据具体情况评估，适当设置）。其次将limit_traffic.lua代码配置在要限流的location中，用访问控制指令access_by_lua_file执行。

limit_traffic.lua代码示例如下：

local limit_req = require "resty.limit.req"
local json = require "cjson"
 
local rate = 1
local burst = 1
 
local function do_limit()
     local message = {
         message = "Too Fast",
    }
    ngx.header.content_type="application/json;charset=utf8"
    ngx.say(json.encode(message))
    return ngx.exit(ngx.HTTP_OK)
end

local function location_limit_traffic()
    local reject = false

    local lim, err = limit_req.new("location_limit_req_store", rate, burst)
    if not lim then
        ngx.log(ngx.ERR, "init failed! err: ", err)
        return reject
    end

    local limit_key = "location_limit_key"
    local delay, err = lim:incoming(limit_key, true)
    if not delay then
        if err == "rejected" then
            reject = true
        end

        return reject
    end

    if delay > 0 then
        ngx.sleep(delay)
    end

    return reject
end
local reject = location_limit_traffic()
if reject then
    do_limit()
end

该示例为了方便演示，将平均速率rate和桶容量burst均设置为1，线上环境根据具体情况设置。变量limit_key 设置成一个字符串，对于每个请求都相同，该场景是为了保护Nginx代理到后端的请求控制在一定的范围内。也可以根据其他维度进行限流，例如IP，那么limit_key可以赋值为ngx.var.binary_remote_addr。

我们在使用resty.limit.req时，当请求被限流时，并没有马上ngx.HTTP_FORBIDDEN，而是结合我们业务需求，返回HTTP_OK即200，并附带Json格式信息作为提示。

注：Nginx作为代理，一般将多个部署在不同机器上的Nginx作为集群。每个Nginx是相互独立的，所以在Nginx集群上对某一个location做使用resty.limit.req做限流时，每个Nginx上限制的状态不能共享，rate应该设置约为N/M，其中N location对应upstream集群能够承受负载的上限，M为Nginx的个数。

lua-resty-redis-ratelimit
在请求的参数中有一个参数，例如unique_id，来标识用户，所有请求会均匀的转发Nginx集群中，每个用户的请求会被转发到不同的Nginx进行代理，需要做跨机器速率限制，resty.limit.req做不到。我们使用lua-resty-redis-ratelimit模块来对用户进行限流，该模块将信息保存在Redis中，可以实现Nginx实例共享限流状态，跨机器速率限制。

用户限流nginx配置示例：

 http {
     server {
         listen       2019;
         location /redis/ratelimit {
             access_by_lua_file "/path/to/redis_ratelimit.lua";
             default_type 'text/html';
             content_by_lua_block{
                ngx.say('hello 2019')
            }
        }
    }
}

redis_ratelimit.lua代码如下：

local ratelimit = require "resty.redis.ratelimit"
local json = require "cjson"

local redis = {
        host = "127.0.0.1",
        port = 6379,
        timeout = 0.02
}

local rate = "1r/s"
local burst = 0
local duration = 1

local function do_limit()
    local message = {
        message = "Too Fast",
    }
    ngx.header.content_type="application/json;charset=utf8"
    ngx.say(json.encode(message))
    return ngx.exit( ngx.HTTP_OK )

end

local function user_rate_limit()
    -- 限流参数根据实际情况而定
    local limit_key = ngx.var.arg_unique_id
    if limit_key == nil then
        return
    end

    local reject = false

    local lim, err = ratelimit.new("user-rate", rate, burst, duration)
    if not lim then
        ngx.log(ngx.ERR, "failed to instantiate, err: ", err)
        return reject
    end

    local rds = {
            host = redis.host,
            port = redis.port,
            timeout = redis.timeout
    }

    local delay, err = lim:incoming(limit_key, rds)
    if not delay then
        if err == "rejected" then
            reject = true
        end
        return reject
    end

    if delay >= 0.001 then
        ngx.sleep(delay)
    end

    return reject

end

local reject = user_rate_limit()
if reject then
    do_limit()
end

该示例为了方便演示，将平均速率rate、桶容量burst、延迟duration分别设置为设置为"1r/s"、0、1，线上环境根据具体情况设置。url中的参数unique_id的值赋值给变量limit_key，将user-rate:limit_key作为唯一标识存于Redis。当一个用户访问频率超过设定的rate后就会返回一个Json信息，提示用户刷新太快。

ABTest

利用OpenResty可以很容易实现ABTest，下面的例子使用和Nginx set对应指令set_by_lua*，通过该指令设置Nginx变量，可以实现赋值逻辑，根据不同url中参数cid的不同将请求分发到不同的upstream集群。

nginx配置示例：

http {

    upstream pool_1{
        server 0.0.0.0:2020;
    }

    upstream pool_2{
        server 0.0.0.0:2021;
    }

    server {
        listen       2019;
        location /select/upstream/according/cid {
            set_by_lua_file $selected_upstream "/path/to/select_upstream_by_cid.lua" "pool_1" "pool_2";
            if ( $selected_upstream = "" ){
                proxy_pass http://pool_1;
            }
            proxy_pass http://$selected_upstream;
        }
    }
}

set_by_lua_file指令将"pool_1"、"pool_2"作为参数传递到lua代码select_upstream_by_cid.lua中，select_upstream_by_cid.lua的返回值，初始化变量selected_upstream。返回空字符串时，在nginx conf中做特殊处理，proxy_pass默认代理到pool_1，否则代理到selected_upstream。

select_upst_by_cid.lua逻辑如下：

local first_upstream  = ngx.arg[1]
local second_upstream = ngx.arg[2]

local cid = ngx.var.arg_cid
if cid == nil then
    return ""
end

local id = tonumber(cid)

if id == nil then
    return ""
end

if id % 2 == 0 then
    return first_upstream
end

return second_upstream

获取参数cid，将cid转换成数字类型，取模运算，根据结果返回参数，从而将请求按照cid分流到不同upstream。

利用OpenResty提供很多的模块或指令进行ABTest有很多玩法：例如模块lua-upstream-nginx-module[5]和balancer提供一些API可以对upstream进行动态管理，可以将upstream信息存于Redis、Consul等服务器中，利用OpenResty提供API实现动态分流，可以参考开源项目ABTestingGateway。

服务质量监控

使用OpenResty提供的API对Nginx进行服务质量监控，有实时、占用资源少等特点。通过ngx.var.VARIABLE获取Nginx内置变量，例如：request_time、upstream_response_time、upstream_status等，所有的Nginx内置变量在这里：Alphabetical index of variables。本文只统计几个，可以根据需求定制。该统计模块代码在目录nginx_metric中，包含三个文件：nginx_metric.lua、nginx_metric_output.lua和metric.lua。

nginx配置示例：

http {

    lua_package_path "/path/to/nginx_metric/?.lua;;";
    lua_shared_dict nginx_metric 1m;
    log_by_lua_file "/path/to/nginx_metric/nginx_metric.lua";

    upstream test{
        server 0.0.0.0:2020;
    }

    server {
        listen       2019;

        location /metric {
            proxy_pass http://test;
        }

        location /nginx/metric/output {
            default_type 'application/json';
            content_by_lua_file "/path/to/nginx_metric/nginx_metric_output.lua";
        }
    }

    server {
        listen       2020;
        location /metric {
            default_type 'text/html';
            content_by_lua_block{
                ngx.sleep(1)
                ngx.say('hello 2020')
            }
        }
    }
}

配置分成4步：

lua_package_path指令添加nginx_metric路径；
lua_shared_dict指令定义共享内存用于存放统计数据；
log_by_lua_file指令执行nginx_metric.lua代码，对每一个请求在log阶段进行数据统计。这段代码是统计的核心，log_by_lua*是请求的最后阶段，统计代码只做统计，不影响正常请求。
添加一个location，例如location /nginx/metric/output，用于查看统计数据。nginx_metric_output.lua代码获取共享字典中的数据，整理成以upstream为key，以Json格式返回数据。

nginx_metric.lua代码如下：

 local nginx_metric = require "metric"
 
 local dict = ngx.shared.nginx_metric
 local item_sep = "|"
 local exptime = 3600 * 24 --second
 
 local metric_prefix = ngx.var.proxy_host
 if metric_prefix == nil then
     return
end

nginx_metric = nginx_metric:new(dict, item_sep, metric_prefix, exptime)
nginx_metric:record()

Nginx常常用于反向代理，该示例中对配置了upstream的location每一个请求进行统计。ngx.var.proxy_host获取proxy_pass指令定义的变量。对只配置了upstream的location每一个请求进行统计，对没有配upstream的location，不进行统计，对参数metric_prefix可以根据需求定制。实现统计的是metric.lua代码，以下是统计代码：

local _M = {}
    local mt = { __index = _M }
    
    function _M.new(_, dict, item_sep, metric_prefix, exptime)
        local self = {
            dict = dict,
            item_sep = item_sep,
            metric_prefix = metric_prefix,
            exptime = exptime,
    }
    return setmetatable(self, mt)
end

function _M.req_sign(self, t)
    return self.metric_prefix .. self.item_sep .. t
end

local function dict_safe_incr(dict, metric, value, exptime)
    if tonumber(value) == nil then
        return
    end

    local newval, err = dict:incr(metric, value)
    if not newval and err == "not found" then
        local ok, err = dict:safe_add(metric, value, exptime)
        if err == "exists" then
            dict:incr(metric, value)
        elseif err == "no memory" then
            ngx.log(ngx.ERR, "no memory for nginx_metric add kv: " .. metric .. ":" .. value)
        end
    end
end

local function str_split(inputstr, sep)
    if sep == nil then
        sep = "%s"
    end
    local t={} ; i=1
    for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
        t[i] = str
        i = i + 1
    end
    return t
end

local function add(dict, metric, value, exptime)
    dict_safe_incr(dict, metric, tonumber(value), exptime)
end

-- request count
function _M.request_count(self)
    local status_code = tonumber(ngx.var.status)
    if status_code < 400 then
        local metric = self:req_sign("request_count")
        add(self.dict, metric, 1, self.exptime)
    end
end

-- request time
function _M.request_time(self)

    local metric = self:req_sign("request_time")
    local req_t = tonumber(ngx.var.request_time) or 0
    add(self.dict, metric, req_t, self.exptime)

end

-- http error status stat
function _M.err_count(self)

    local status_code = tonumber(ngx.var.status)
    if status_code >= 400 then
        local metric_err_qc = self:req_sign("err_count")
        local metric_err_detail = metric_err_qc.."|"..status_code
        add(self.dict, metric_err_detail, 1, self.exptime)
    end

end

---- upstream time and count
function _M.upstream(self)

    local upstream_response_time_s = ngx.var.upstream_response_time or ""
    upstream_response_time_s = string.gsub(string.gsub(upstream_response_time_s, ":", ","), " ", "")
    --Times of several responses are separated by commas and colons

    if upstream_response_time_s == "" then
        return
    end

    local resp_time_arr = str_split(upstream_response_time_s, ",")

    local metric_upstream_count = self:req_sign("upstream_count")
    add(self.dict, metric_upstream_count, #(resp_time_arr), self.exptime)

    local duration = 0.0
    for _, t in pairs(resp_time_arr) do
        if tonumber(t) then
            duration = duration + tonumber(t)
        end
    end

    local metric_upstream_response_time = self:req_sign("upstream_response_time")
    add(self.dict, metric_upstream_response_time, duration, self.exptime)

end

function _M.record(self)
    self:request_count()
    self:err_count()
    self:request_time()
    self:upstream()
end

return _M

主要逻辑是通过ngx.var.VARIABLE获取变量值，进行类型转换，并累加。nginx_metric模块代码参考了falcon-ngx_metric

nginx_metric_output.lua代码如下：

local json = require("cjson")
local nginx_metric = ngx.shared.nginx_metric

local function output()
    local keys = nginx_metric:get_keys()
    local res = {}

    for _, k in pairs(keys) do

        local value = nginx_metric:get(k)
        local s, e = string.find(k, '|')
        local upst_name = string.sub(k, 1, s -1)
        local metric = string.sub(k,  e + 1)

        if res[ upst_name ] == nil then
            res[ upst_name ] = {}
        end

        res[upst_name][metric] = value

        if string.find(metric, "err_count") then
            if res[upst_name]["err_count"] == nil then
                res[upst_name]["err_count"] = 0
            end
            res[upst_name]["err_count"] = res[upst_name]["err_count"] + value
        end

    end

    local ret = json.encode( res )
    ngx.status = ngx.HTTP_OK
    ngx.print( ret )
    ngx.exit(ngx.HTTP_OK)

    return ret
end

output

演示

执行以下指令：

curl -v  http://127.0.0.1:2019/nginx/metric

执行以下指令进行查询：

curl http://127.0.0.1:2019/nginx/metric/output
{
    "test":{
        "request_count":1,
        "upstream_count":1,
        "upstream_response_time":1.001,
        "request_time":1.001
    }
}

request_count为请求数，request_time为响应时间，upstream_count为代理到upstream的请求数，upstream_response_time为upstream server的响应时间，以上数据均是累加总和。要使用该数据进行监控或者告警，可以每隔N秒获取一次，将相邻两次的数据求差、求平均即可求出平均响应时间和QPS。