redis连接异常的解决方案
最近在做高并发架构,发现程序偶尔报错:
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:50) ~[Pool.class:na]
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:86) ~[JedisPool.class:na]
at com.qingmayun.common.dbConfig.JedisManager.getDbJedis(JedisManager.java:56) ~[JedisManager.class:na]
at com.qingmayun.common.dbConfig.PipelineManager.getDbPipeline(PipelineManager.java:21) [PipelineManager.class:na]
at com.qingmayun.common.dbConfig.RedisDao.smsMonitWhenSend(RedisDao.java:1227) [RedisDao.class:na]
at com.qingmayun.mqconsumer.channel.base.ChannelWatcher.sendMonit(ChannelWatcher.java:79) [ChannelWatcher.class:na]
at com.qingmayun.mqconsumer.channel.base.proxy.SmsProxy.sendMsg(SmsProxy.java:71) [SmsProxy.class:na]
at com.qingmayun.mqconsumer.TemplateSmsConsumer.sendTemplateSms(TemplateSmsConsumer.java:205) [TemplateSmsConsumer.class:na]
at com.qingmayun.mqconsumer.TemplateSmsConsumer.access$100(TemplateSmsConsumer.java:50) [TemplateSmsConsumer.class:na]
at com.qingmayun.mqconsumer.TemplateSmsConsumer$1.handleDelivery(TemplateSmsConsumer.java:82) [TemplateSmsConsumer$1.class:na]
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144) [rabbitmq-client.jar:na]
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99) [rabbitmq-client.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: connect timed out
at redis.clients.jedis.Connection.connect(Connection.java:155) ~[Connection.class:na]
at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:83) ~[BinaryClient.class:na]
at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1643) ~[BinaryJedis.class:na]
at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:85) ~[JedisFactory.class:na]
at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) ~[commons-pool2-2.3.jar:2.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) ~[commons-pool2-2.3.jar:2.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) ~[commons-pool2-2.3.jar:2.3]
at redis.clients.util.Pool.getResource(Pool.java:48) ~[Pool.class:na]
... 14 common frames omitted
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.7.0_71]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[na:1.7.0_71]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[na:1.7.0_71]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[na:1.7.0_71]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.7.0_71]
at java.net.Socket.connect(Socket.java:579) ~[na:1.7.0_71]
at redis.clients.jedis.Connection.connect(Connection.java:149) ~[Connection.class:na]
... 21 common frames omitted
百度了redis的Could not get a resource from the pool和java.net.SocketTimeoutException: connect timed out始终得不到理想答案。网络上的说法无非就是使用线程池,增加连接时长,增加线程池数量。已经照做了,但是并没啥鸟用,压力山大,再解决不了没脸见人了。
自力更生吧,查看cpu,30%;内存,还剩很多;网络,同一台机器;硬盘,读写无瓶颈;Redis配置,使用默认的最大10000连接,当前活动连接也就1K多;swap,并无使用,Redis也已经删除数据重启了;jvm,XMX配置了4G,才使用1G;tcpdump,抓到了包,然而看不懂。网络上一个团队说他们通过抓包最终找到了问题是他们显卡驱动不行,找供应商升级了驱动......
哥只是一个java小屌而已,基本的问题查查还行,太深了还真没那个水平,而且据说还可能是redis版本或者兼容性这种让人彻底无语的原因。
自己写了个小程序for循环开2000个线程去连接池拿连接,拿到后休息10秒,打印成功和失败。多次测试发现,前面1800+的连接总能拿成功,后面就失败了,失败量随机的,而且RP爆发的一次竟然全部获取成功了。线程数改到1000,RP好的时全部成功,最常见的还是后面几十个失败。看来线程数多一些少一些也总有一些拿不到连接。记得一句话,JVM在线程很多的时候不稳定,而且网络偶尔丢包或者中断那么几毫秒还是很常见的,作为程序员需要考虑这种极端的环境,而不要太相信机器太相信逻辑。
好,改造了下获取连接的代码。
public static Jedis getJedis()
{
int timeoutCount = 0;
while (true) // 如果是网络超时则多试几次
{
try
{
Jedis jedis = RedisPool.getResource();
return jedis;
} catch (Exception e)
{
// 底层原因是SocketTimeoutException,不过redis已经捕捉且抛出JedisConnectionException,不继承于前者
if (e instanceof JedisConnectionException || e instanceof SocketTimeoutException)
{
timeoutCount++;
log.warn("getJedis timeoutCount={}", timeoutCount);
if (timeoutCount > 3)
{
break;
}
}else
{
log.warn("jedisInfo。NumActive=" + RedisPool.getBalancePool().getNumActive() + ", NumIdle="
+ RedisPool.getNumIdle() + ", NumWaiters="
+ RedisPool.getNumWaiters() + ", isClosed="
+ RedisPool.isClosed());
log.error("getJedis error", e);
break;
}
}
}
return null;
}
同时,连接池的获取超时时间改成5秒(默认2秒)。扔到服务器一跑,再无ERROR。偶尔会打印getJedis timeoutCount=1,极少数getJedis timeoutCount=2。领盒饭去了。