阿里巴巴中间件之Sentinel

最新推荐文章于 2025-05-07 17:29:23 发布

田青钊

最新推荐文章于 2025-05-07 17:29:23 发布

阅读量1.5k

点赞数 3

文章标签： java 分布式

本文链接：https://blog.youkuaiyun.com/weixin_44963129/article/details/112136910

版权

一.Sentinel之限流

限流解决的问题

a.恶意流量
b.保护系统

限流的场景

黄金周或者节假日去景点的时候，某些景点会进行限制每天的人流量。

数据库的连接数达到最大连接数的时候会阻塞阈值，保护数据库。

线程池的线程数会设置最大线程数，超过最大线程数的时候-> 拒绝策略，或者直接返回异常。

常见的限流算法

滑动窗口
漏桶
令牌桶

sentinel的限流算法：基于滑动窗口来实现的（源码分析里面会讲到）

滑动窗口限流

发送端和接收端都会维护一个数组的序列，这个序列被称之为一个窗口，发送方的窗口的大小是由接收方定义的，这些是由tcp来决定的，这些数据服务端如果还没有处理完的时候，这个时候窗口的大小必须要缩小，这样客户端发送数据的量就少了。

滑动窗口算法地址：https://media.pearsoncmg.com/aw/ecs_kurose_compnetwork_7/cw/content/interactiveanimations/selective-repeat-protocol/index.html
在这里插入图片描述
如图所示，接收方的窗口是5个的时候，数据还没有处理完的时候，这时候没有办法继续接收数据，这时候发送方就会停止发送。
滑动体现的是他是一个滚动的过程，窗口是这一次能够发送多少数据。

当发送方发送数据的时候为Packet状态，接收方接收到消息之后为Received状态，然后向接收方发送一个为Ack状态的标记，当发送方确认收到消息之后变为Ack Received状态。

在这里插入图片描述
在这个过程中，我们可以看到接收方每确认收到一条消息之后就会向后滑动一次，当发送方收到了接收方发送的Ack之后也会向后滑动一次。

漏桶限流

在这里插入图片描述
客户端发送请求到服务端的sayHello()接口，假设这个接口使用了漏桶的限流，当请求来到了之后，要判断当前的流量是否达到了设置的阈值，如果超过了就直接返回，或者说直接抛一个异常，最常见的可能就是某些网站会提示系统繁忙，请稍后再试！如果通过了，就表示没有达到限流的阈值，这个时候这个请求是可以处理的，所以这中间的限流就需要一个算法来实现。

假设一秒钟来了一个请求，但是现在这个sayHello()接口每秒只能处理100个请求，剩下的请求只能被拦截掉。请求到了服务端之后，会有个闸门，以固定的速率来处理请求，这个接口在漏桶算法的保护下是很安全的，这大概就是漏桶限流的实现。

Google的Guava包中的RateLimiter类就是令牌桶算法的解决方案。

令牌桶限流

在这里插入图片描述
令牌桶和漏桶的限流表面上看起来很相似，漏桶只是能够强行限制请求的访问，令牌桶中间还多了一步令牌的限制，在请求到来之后，会去令牌桶里面根据请求的某个表示拿到一个令牌，比如token之类的。比如该令牌桶以每秒钟10个的生成，某一下突然来了一千个请求，前面一百个请求可以访问接口，之后的请求拿不到令牌也就被限流了。有的系统实现的有每天限制一个用户拿多少个令牌，每天拿到某一个数量之后也就被限流了，常见的就是微信的assesToken。

限流的算法实现

juc工具包下面的semphore：访问一个方法，通过semphore信号量拿到一个令牌

Guava(RateLimiter)：基于令牌桶的实现/漏桶

Redisson(RRateLimiter)：令牌桶

限流只是一个最基本的服务治理/服务质量体系要求

流量的切换
针对不同的渠道设置不同的限流策略
流量的监控
熔断
动态限流

来看一下Sentinel的架构图

在这里插入图片描述
底层支撑有携程的apollo做配置，redis做存储，zookeeper也可以做配置和存储，nacos也可以做配置，所以这些组件为sentinel提供了动态数据源的支持。

中层为grpc、dubbo、springcloud、rocketMQ提供了服务的限流，

为netflix的zuul、tengine、nginx提供了网关的限流。

上层为云原生、istio、envoy、service mesh提供了全局的限流服务。

sentinel的应用实战

sentinel官网地址：https://sentinelguard.io/zh-cn/docs/quick-start.html

github地址：https://github.com/alibaba/Sentinel/

在github的Releases里面可以下载jar包，目前最新的版本为1.8.0
在这里插入图片描述
下载完之后直接通过java -jar去启动即可，启动的时候可以加一些参数

java -jar -Dserver.port=8888 -Dcsp.sentinel.dashboard.server=localhost:8888 -Dproject.name=sentinel-dashboard sentinel-dashboard-1.8.0.jar

第一个参数为当前sentinel的dashboard的端口号，

第二个参数为把当前sentinel的dashboard也加入到限流的领域中

第三个参数为当前限流被监控的名称

在这里插入图片描述
启动完成后可以看到该画面。

浏览器输入ip:端口号
在这里插入图片描述
默认的账号和密码为sentinel

登录上来之后可以看到左侧有我们当前sentinel-dashboard这个项目的监控详情
在这里插入图片描述

接下来我们来写一个sentinel基于原生api的demo

package com.tqz.sentinel;

import com.alibaba.csp.sentinel.Entry;
import com.alibaba.csp.sentinel.SphU;
import com.alibaba.csp.sentinel.slots.block.BlockException;
import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;

import java.util.ArrayList;
import java.util.List;

/**
 * @Author: tian
 * @Date: 2020/12/21 23:48
 * @Desc:
 */
public class SentinelDemo {

    // 资源名称
    private static String resource = "doTest";

    public static void main(String[] args) {
        // 1.初始化限流规则
        initFlowRules();
        // 2.根据限流规则进行限流
        while (true) {
            Entry entry = null;
            try {
                // 该参数名要和定义的资源名称一样
                entry = SphU.entry(resource);
                System.out.println("Hello World!");
            } catch (BlockException e) {
                // 如果被限流了，抛出该异常
                e.printStackTrace();
            } finally {
                if (null != entry) {
                    entry.exit();
                }
            }
        }
    }

    private static void initFlowRules() {
        // 限流规则集合
        List<FlowRule> flowRuleList = new ArrayList<>();
        FlowRule flowRule = new FlowRule();
        // 被保护的资源(可以是接口，也可以额是方法名称，且该名称要唯一)
        flowRule.setResource(resource);

        /**
         * 限流阈值的类型
         */
        // 根据线程来限流
//        flowRule.setGrade(RuleConstant.FLOW_GRADE_THREAD);
        // 根据qps峰值限流
        flowRule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        // qps为10
        flowRule.setCount(10);
        // 把规则加入进来
        flowRuleList.add(flowRule);
        // 加载规则
        FlowRuleManager.loadRules(flowRuleList);
    }
}

启动demo之前加入jvm参数

-Dcsp.sentinel.dashboard.server=localhost:8888 -Dproject.name=sentinel-demo
在这里插入图片描述
启动完成之后在dashboard上面看到我们在jvm中输入project.name参数对应的项目名，我们可以看到后面的请求都被限流了，这里我是直接抛出了异常，项目种需要根据业务场景去处理，然后我们去sentinel的控制台查看该项目，每秒钟通过的qps很稳定，也就是我们设置的10。

在这里插入图片描述

来简单看一下Sentinel限流的源码

首先从 entry = SphU.entry(resource); 进入到SphU的

public static Entry entry(String name) throws BlockException {
    return Env.sph.entry(name, EntryType.OUT, 1, OBJECTS0);
}

参数name就是我们要限流的方法名，EntryType这个枚举类里面一共有两个类型，一个是IN，也就是进入的，OUT就是出来的

然后从这个方法进入到Sph这个接口里面，点到他的实现类CtSph实现类里面，把我们的方法名和类型包装为了一个wrapper

@Override
public Entry entry(String name, EntryType type, int count, Object... args) throws BlockException {
    StringResourceWrapper resource = new StringResourceWrapper(name, type);
    return entry(resource, count, args);
}

然后进入重载的entry()方法中

public Entry entry(ResourceWrapper resourceWrapper, int count, Object... args) throws BlockException {
        return entryWithPriority(resourceWrapper, count, false, args);
}

在往下点到entryWithPriority()方法，前方高能，重点的方法来了。

private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
        throws BlockException {
        Context context = ContextUtil.getContext();
        if (context instanceof NullContext) {
            // The {@link NullContext} indicates that the amount of context has exceeded the threshold,
            // so here init the entry only. No rule checking will be done.
            return new CtEntry(resourceWrapper, null, context);
        }

        if (context == null) {
            // Using default context.
            context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
        }

        // Global switch is close, no rule checking will do.
        if (!Constants.ON) {
            return new CtEntry(resourceWrapper, null, context);
        }

        ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);

        /*
         * Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
         * so no rule checking will be done.
         */
        if (chain == null) {
            return new CtEntry(resourceWrapper, null, context);
        }

        Entry e = new CtEntry(resourceWrapper, chain, context);
        try {
            chain.entry(context, resourceWrapper, null, count, prioritized, args);
        } catch (BlockException e1) {
            e.exit(count, args);
            throw e1;
        } catch (Throwable e1) {
            // This should not happen, unless there are errors existing in Sentinel internal.
            RecordLog.info("Sentinel unexpected exception", e1);
        }
        return e;
    }

这个方法首先是获取context，这个我们一开始就有了，我们不用关心，接着往下走，就是判断context的一些代码，接着就是ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);这个行代码，这个行代码就是创建我们slotchain 的方法，可以看到返回了一个chain。然后我们看一下 lookProcessChain() 方法然后通过 chain = SlotChainProvider.newSlotChain();创建出来一个chain，所以点进这个方法里面，这里使用了建造者模式去构建Slot。

public static ProcessorSlotChain newSlotChain() {
    if (slotChainBuilder != null) {
        return slotChainBuilder.build();
    }

    // Resolve the slot chain builder SPI.
    slotChainBuilder = SpiLoader.loadFirstInstanceOrDefault(SlotChainBuilder.class, DefaultSlotChainBuilder.class);

    if (slotChainBuilder == null) {
        // Should not go through here.
        RecordLog.warn("[SlotChainProvider] Wrong state when resolving slot chain builder, using default");
        slotChainBuilder = new DefaultSlotChainBuilder();
    } else {
        RecordLog.info("[SlotChainProvider] Global slot chain builder resolved: "
                       + slotChainBuilder.getClass().getCanonicalName());
    }
    return slotChainBuilder.build();
}

看一下这个build()方法，该方法定义在 SlotChainBuilder 接口中，实现类为DefaultSlotChainBuilder

public class DefaultSlotChainBuilder implements SlotChainBuilder {

    @Override
    public ProcessorSlotChain build() {
        ProcessorSlotChain chain = new DefaultProcessorSlotChain();

        // Note: the instances of ProcessorSlot should be different, since they are not stateless.
        List<ProcessorSlot> sortedSlotList = SpiLoader.loadPrototypeInstanceListSorted(ProcessorSlot.class);
        for (ProcessorSlot slot : sortedSlotList) {
            if (!(slot instanceof AbstractLinkedProcessorSlot)) {
                RecordLog.warn("The ProcessorSlot(" + slot.getClass().getCanonicalName() + ") is not an instance of AbstractLinkedProcessorSlot, can't be added into ProcessorSlotChain");
                continue;
            }

            chain.addLast((AbstractLinkedProcessorSlot<?>) slot);
        }

        return chain;
    }
}

我们可以看到把每一个slot都转型为了 AbstractLinkedProcessorSlot，所以去看一下这个抽象类的实现类大概有哪些
在这里插入图片描述
重点关注上面圈住的几个，也是sentinel整个限流的流程，此处参考了官网的一张流程图

第一个TreeNodeBuilder 为树节点，实现类在NodeSelectorSlot ，这个 slot 主要负责收集资源的路径，并将这些资源的调用路径以树状结构存储起来，用于根据调用路径进行流量控制。看一下NodeSelectorSlot 这个类上面的注释
简单介绍了其工作原理：上述代码通过 ContextUtil.enter() 创建了一个名为 entrance1 的上下文，同时指定调用发起者为 appA；接着通过 SphU.entry()请求一个 token，如果该方法顺利执行没有抛 BlockException，表明 token 请求成功。以上代码将在内存中生成以下结构。每个 DefaultNode 由资源 ID 和输入名称来标识。换句话说，一个资源 ID 可以有多个不同入口的 DefaultNode。

以上代码将在内存中生成以下结构：上面的结构可以通过调用

curl http://localhost:8719/tree?type=root

来显示，第二行是介绍每个参数的意思。
在这里插入图片描述

第二个ClusterNodeBuilder 可以想到bubbo里面的cluster，是做一些熔断降级的，所以他会去存储在第一个TreeNode构建好的资源，然后回去统计这些资源，比如qps、线程的数量。
第三个StatisticSlot 也是核心的，他是根据entry的访问去统计一些数据然后记录下来，把这些数据当做限流的因素，比如是否要触发限流、是否要触发熔断。
第四个FlowSlot 它和FlowRule有关系，它会去根据第三个统计出来的指标去跑，规则和统计出来的指标有一个比较，比较之后yes or no的结果就出来了。
第五个DegradeSlot 是根据熔断和降级做相应的策略，也就是来决定资源是否在接下来的时间被自动熔断掉。
第六个AuthorizeSlot 授权的，也就是跟黑白名单有关系。
第七个SystemSlot 这个是系统规则检查，这个 slot 会根据对于当前系统的整体情况，对入口的资源进行调配。其原理是让入口的流量和当前系统的 load 达到一个动态平衡。

再回过头来看 CtSph 类的 entryWithPriority 方法
找到该方法的最后一行

chain.entry(context, resourceWrapper, null, count, prioritized, args);

该方法的默认实现类为 StatisticSlot
这个类的 entry 方法中有一行添加通过请求的 node.addPassRequest(count);
进入到该方法中来到了DefaultNode 中

@Override
    public void addPassRequest(int count) {
        super.addPassRequest(count);
        this.clusterNode.addPassRequest(count);
    }

继续往下点进入到了StatisticNode中

@Override
    public void addPassRequest(int count) {
        rollingCounterInSecond.addPass(count);
        rollingCounterInMinute.addPass(count);
    }

该 addPass 方法在 Metric 接口中，实现类为 ArrayMetric

  @Override
    public void addPass(int count) {
        WindowWrap<MetricBucket> wrap = data.currentWindow();
        wrap.value().addPass(count);
    }

在该方法中通过 data.currentWindow(); 大概能猜到sentinel实现限流的算法是基于窗口，也就是上面所讲的滑动窗口。进入到 LeapArray的 currentWindow() ，

  public WindowWrap<T> currentWindow() {
        return currentWindow(TimeUtil.currentTimeMillis());
    }

这里又获取了系统当前的毫秒值，再传给了重载的currentWindow(long timeMillis) 方法

public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }

        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) {
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

看看人家大厂写的，注释跟理论都写的很清楚。计算当前时间会落在一个采集间隔 ( LeapArray ) 中哪一个时间窗口中，即在 LeapArray 中属性 AtomicReferenceArray > array 的下标。其实现算法如下：

首先用当前时间除以一个时间窗口的时间间隔，得出当前时间是多少个时间窗口的倍数，用 n 表示。
然后我们可以看出从一系列时间窗口，从 0 开始，一起向前滚动 n 隔得到当前时间戳代表的时间窗口的位置。现在我们要定位到这个时间窗口的位置是落在 LeapArray 中数组的下标，而一个 LeapArray 中包含 sampleCount 个元素，要得到其下标，则使用 n % sampleCount 即可。
long windowStart = calculateWindowStart(timeMillis); 计算当前时间戳所在的时间窗口的开始时间，即要计算出 WindowWrap 中 windowStart 的值，其实就是要算出小于当前时间戳，并且是 windowLengthInMs 的整数倍最大的数字，Sentinel 给出是算法为 ( timeMillis - timeMillis % windowLengthInMs )。其代码为

 protected long calculateWindowStart(/*@Valid*/ long timeMillis) {
       return timeMillis - timeMillis % windowLengthInMs;
   }

尝试从 LeapArray 中的 WindowWrap 数组查找指定下标的元素。
如果指定下标的元素为空，则需要创建一个 WindowWrap 。其中 WindowWrap 中的 MetricBucket 是调用其抽象方法 newEmptyBucket (timeMillis)，由不同的子类去实现。
这里使用了 CAS 机制来更新 LeapArray 数组中的元素，因为同一时间戳，可能有多个线程都在获取当前时间窗口对象，但该时间窗口对象还未创建，这里就是避免创建多个，导致统计数据被覆盖，如果用 CAS 更新成功的线程，则返回新建好的 WindowWrap 。
如果指定索引下的时间窗口对象不为空并判断起始时间相等，则返回。
如果原先存在的窗口开始时间小于当前时间戳计算出来的开始时间，则表示 bucket 已被弃用。则需要将开始时间重置到新时间戳对应的开始时间戳。
最后判断了一下当前开始的窗口是否小于之前的老窗口，正常情况下是不会小于的，但是可能会有系统时间回滚的情况。不得不配置人家严谨的逻辑，连时间回钟都替我们想好了。

到这里sentinel的限流实现大概讲完了。

二.Sentinel熔断和降级

熔断有以下几种策略

RT(平均响应时间)
1s内，连续执行100个请求，平均的相应时间超出一个阈值(可自定义)会触发熔断(降级)，那么在接下来的时间窗口期内，对该方法的调用都会自动的熔断。注意Sentinel默认统计的RT上限是4900ms，超出此阈值的都会算作4900ms，若需要更改上限可以通过启动配置项-Dcsp.sentinel.statistic.max.rt=xxx来配置
异常的比例
当资源的每秒请求大于5，并且每秒异常总数占通过量的比值超过阈值之后，资源进入降级状态，在接下来的时间窗口内，对该方法的调用都会自动的返回。异常的比例在[0.1，1.0]
异常个数
当资源近1分钟的异常数超过阈值之后会进行熔断。注意由于统计时间窗口是分钟级别的，若时间小于60s，则结束熔断状态后仍可能再进入熔断状态

来通过Spring Boot简单的写一个sentinel的熔断处理

@RestController
@RequestMapping("sentinel")
public class SentinelController {

   @GetMapping("test")
   @SentinelResource(value = "test", blockHandler = "handleException", fallback = "handleException")
   public String test() throws InterruptedException {
       int i = 1 / 0;
       return "test";
   }

   public String handleException() {
       return "熔断返回的兜底数据。。。";
   }
}

通过SentinelResource注解来实现，value为资源名称，必需项（不能为空）,类似于hystrix的 @HystrixCommand
blockHandler / blockHandlerClass : blockHandler 对应处理 BlockException的函数名称，可选项。blockHandler 函数访问范围需要是 public，返回类型需要与原方法相匹配，参数类型需要和原方法相匹配并且最后加一个额外的参数，类型为 BlockException。blockHandler 函数默认需要和原方法在同一个类中。若希望使用其他类的函数，则可以指定 blockHandlerClass 为对应的类的 Class 对象，注意对应的函数必需为 static 函数，否则无法解析。

fallback： fallback 函数名称，可选项，用于在抛出异常的时候提供 fallback 处理逻辑。fallback函数可以针对所有类型的异常（除了 exceptionsToIgnore 里面排除掉的异常类型）进行处理。fallback 函数签名和位置要求：
- 返回值类型必须与原函数返回值类型一致；
- 方法参数列表需要和原函数一致，或者可以额外多一个 Throwable 类型的参数用于接收对应的异常。
- fallback 函数默认需要和原方法在同一个类中。若希望使用其他类的函数，则可以指定 fallbackClass为对应的类的 Class 对象，注意对应的函数必需为 static 函数，否则无法解析。