Sentinal

What is Sentinal?

Sentinel is a powerful flow control component that takes “flow” as the breakthrough point and covers multiple fields including flow control, concurrency limiting, circuit breaking, and adaptive system protection to guarantee the reliability of microservices.

Sentinal在流控上支持的特性比Hystrix多很多,也有很多其他Hystrix不具有的特性。而且”takes “flow” as the breakthrough(突破) point”。

Flow Control

设计的时候Sentinal就是以“flow”作为breakthrough的,那么自然流控的方式会比较多。

Flow Control by Thread(Concurrency)

虽然Hystrix也支持信号量的方式控制,而且Hystrix推荐线程池方式,只是在“For circuits that wrap very low-latency requests (such as those that primarily hit in-memory caches) the overhead can be too high and in those cases you can use another method such as tryable semaphores”推荐信号量,但Sentinal只支持信号量的方式控制,个人认为这是两者区别中最重要的一个。

在Hystrix的官方介绍中:

You can use semaphores (or counters) to limit the number of concurrent calls to any given dependency, instead of using thread pool/queue sizes. This allows Hystrix to shed load without using thread pools but it does not allow for timing out and walking away.

在Sentinal的官方介绍中:

Besides concurrency (aka. semaphore isolation), there are other ways to achieve this: thread pool isolation.

  • Thread pool isolation: Allocate managed thread pools to handle different invocations. When there are no more idle threads in the pool, the request is rejected without affecting other resources. When an invocation exceeds the timeout, the invocation will be cut down.
  • Semaphore isolation: Limit the concurrency count of the invocation. The benefit of using thread pools is that you can isolate business logic completely by pre-allocating thread pools. But it also brings us extra costs of context switch and additional threads. If the incoming request is already handled in a separate thread, for instance, a servlet request, it will almost double the thread overhead if using the thread pool mode.

So we recommend using concurrent thread count flow control, which represents lightweight semaphore isolation.

sentinal的开发认为线程池方式比较重,在Hystrix中有给出过一些数据:cost-of-threads,简单的一个结果,没有具体说明基准测试的条件:

At the median (and lower) there is no cost to having a separate thread.

At the 90th percentile there is a cost of 3ms for having a separate thread.

这里只是从延迟上去分析,而没有考虑资源上消耗。个人感觉,这消耗到底是大是小,从体量上考虑,绝大多数公司如日活百万的,不是个问题。

这种信号量的方式,Sentinal还是没有解决那最重要的问题,超时,另外丢失了一个附加的特性(虽然很多情况下用不到)异步。

Circuit Breaking

可以触发跳闸的参数包括Average Response Time、Exception Ratio、Exception Count三个,相对与Hystrix只有Exception Ratio一个多了两种策略。

Adaptive System Protection

Load Protection

The request will be blocked under the condition:

  • Current system load (load1) exceeds the threshold (highestSystemLoad);
  • Current concurrent requests exceed the estimated capacity (thread count > minRt * maxQps)

Global Metrics Protection

We have a global statistic node ENTRY_NODE which records global metrics (e.g. inbound QPS, average RT and thread count).

“The idea of TCP BBR gives us inspiration. ”,BBR是什么?Google BBR是什么?以及在 CentOS 7 上如何部署,简单说就是在拥塞算法的时候以丢包率来进行流量控制,但是后来发现丢包的原因并不是因为快到到负荷上限,还有一些乱起八糟的原因。后来就通过时延来进行控制。

同样的,在调控流量对于系统负载的影响的时候,没有只考虑系统负载,不是只要系统负载达到了预先设定的值就立刻降低流量。如上引用所述,还会有另外一个条件:thread count > minRt * maxQps。

这minRt*maxQPS是个什么意思?maxQPS是一秒内处理的最大请求数,minRt意思是一个请求需要花的时间,相乘的意思是一秒内的请求,总共需要多长时间处理完成,这是右边的意思,右边其实有个隐含的时间1秒,如果左边也加上个1秒,thread count * 1秒,意思也是这么多线程1秒的工作时间,与右边一个意思。这个公式意思是,通过minRt * maxQps来估计的线程的数目是不是已经大到了会导致系统负载过高的数目了。言外之意,系统负载过高并不一定是由于业务的线程数目过多导致的,可能是其他的原因,比如操作系统的磁盘整理什么的。

参考

Sentinel: The Flow Sentinel of Your Services

Table of Contents