问题
最近客户反馈有请求返回502错误,查询生产环境的Nginx日志文件,发现存在大量的错误日志信息
no live upstreams while connecting to upstream
upstream prematurely closed connection while reading upstream
upstream timed out (110: Connection timed out) while reading response header from upstream
recv() failed (104: Connection reset by peer) while reading response header from upstream,
查找资料
查找资料查看这些日志错误信息到底是什么含义?
参考:https://blog.youkuaiyun.com/zzhongcy/article/details/89090193
Nginx 参数调优
参考:https://yq.aliyun.com/articles/585433
Nginx 负载
参考:https://www.nginx-cn.net/article/221
参考:https://zhuanlan.zhihu.com/p/34943332
解决思路
公司环境
公司采用的SLB(server load balance)机器来实现Nginx的集群,采用的轮询分配请求
Nginx 负载 中有4台机器,采用轮询,权重都是1,但是机器配置不同,两台机器16G,两台机器8G,并且8G内存的机器还处理其他任务,导致这两台机器总是宕机,导致上述错误
upstream server01 {
server 172.01.01.213:8080;
server 172.01.01.226:8080;
server 01.16.13.208:8080;
server 01.16.13.216:8080;
keepalive 32;
}
upstream server02 {
server 01.16.13.208:8080;
server 01.16.13.216:8080;
server 172.01.13.231:8080;
server 172.01.13.230:8080;
keepalive 32;
}
处理方案
1.修改权重
2.将两台8G机器从server02中移除
upstream server01 {
server 172.01.01.213:8080 weight=2;
server 172.01.01.226:8080 weight=2;
server 01.16.13.208:8080;
server 01.16.13.216:8080;
keepalive 32;
}
upstream server02 {
#server 01.16.13.208:8080;
#server 01.16.13.216:8080;
server 172.01.13.231:8080;
server 172.01.13.230:8080;
keepalive 32;
}
Nginx 修改后配置文件检查
#进入Nginx的sbin目录
./nginx -t
nginx: the configuration file /alidata1/opt/nginx-1.14.0/conf/nginx.conf syntax is ok
nginx: configuration file /alidata1/opt/nginx-1.14.0/conf/nginx.conf test is successful
Nginx 重新加载
#Nginx 不停机,重新加载配置文件
./nginx -s reload
遇到问题
参考:https://cloud.tencent.com/developer/article/1076274
调整完成后,还是会出现502的问题,发现work_processers 3 ,
查询Nginx机器processer,发现只有两个处理器,调整work_processers 2 ; 重新加载Nginx后,检测一段时间,error.log没有再发现新增错误日志,问题解决
[root@ngx1 sbin]# cat /proc/cpuinfo | grep processor
processor : 0
processor : 1