
What is Hystrix?

In a distributed environment, inevitably some of the many service dependencies will fail. Hystrix is a library that helps you control the interactions between these distributed services by adding latency tolerance and fault tolerance logic. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve your system’s overall resiliency.

git上的参考,Hystrix只是一个library(与vue和react一样),主要解决调用服务时时延、错误问题,防止级联故障,提高系统的overall resiliency。

What Is Hystrix For?

Hystrix is designed to do the following:

  • Give protection from and control over latency and failure from dependencies accessed (typically over the network) via third-party client libraries.
  • Stop cascading failures in a complex distributed system.
  • Fail fast and rapidly recover.
  • Fallback and gracefully degrade when possible.
  • Enable near real-time monitoring, alerting, and operational control.

最重要的就是Stop cascading failures 这个特性。辅助监控还有告警、在线配置修改等。


What Design Principles Underlie Hystrix?

  • Preventing any single dependency from using up all container (such as Tomcat) user threads.
  • Shedding load and failing fast instead of queueing.
  • Providing fallbacks wherever feasible to protect users from failure.
  • Using isolation techniques (such as bulkhead, swimlane, and circuit breaker patterns) to limit the impact of any one dependency.
  • Optimizing for time-to-discovery through near real-time metrics, monitoring, and alerting
  • Optimizing for time-to-recovery by means of low latency propagation of configuration changes and support for dynamic property changes in most aspects of Hystrix, which allows you to make real-time operational modifications with low latency feedback loops.
  • Protecting against failures in the entire dependency client execution, not just in the network traffic.


How Does Hystrix Accomplish Its Goals?

Wrapping all calls to external systems (or “dependencies”) in a HystrixCommand or HystrixObservableCommand object which typically executes within a separate thread (this is an example of the command pattern).

“typically executes within a separate thread”,官方都这么说,看来使用信号量控制的方式确实是非主流。

Timing-out calls that take longer than thresholds you define. There is a default, but for most dependencies you custom-set these timeouts by means of “properties” so that they are slightly higher than the measured 99.5th percentile performance for each dependency.


Maintaining a small thread-pool (or semaphore) for each dependency; if it becomes full, requests destined for that dependency will be immediately rejected instead of queued up.

“each dependency”这里理解可能是一个服务,但也可以理解为一个服务的一个接口。在一个服务内,如果服务有很多接口,其中一个接口调用量非常大,可以单独搞出一个“ a small thread-pool (or semaphore)”。

Measuring successes, failures (exceptions thrown by client), timeouts, and thread rejections.


Tripping a circuit-breaker to stop all requests to a particular service for a period of time, either manually or automatically if the error percentage for the service passes a threshold.


Performing fallback logic when a request fails, is rejected, times-out, or short-circuits.


Monitoring metrics and configuration changes in near real-time.


PS: 真的断路器,参考wiki是没有什么half-open状态的。而且一般不会自动恢复,都是手动恢复的。



具体步骤Flow Chart


断路器工作流程图 参考:Circuit Breaker











线程池 or 信号量?

Hystrix支持两种隔离方式,线程池方式和信号量方式。线程池优点:支持超时、支持异步(walk away),信号量优点:新能好,无线程池开销。


You can use semaphores (or counters) to limit the number of concurrent calls to any given dependency, instead of using thread pool/queue sizes. This allows Hystrix to shed load without using thread pools but it does not allow for timing out and walking away. If you trust the client and you only want load shedding, you could use this approach.



线程池默认的线程数目为10,coreSize与maximum都为10,线程keep alive为1分钟,,在以下类中

public abstract class HystrixThreadPoolProperties {

    /* defaults */
    static int default_coreSize = 10;            // core size of thread pool
    static int default_maximumSize = 10;         // maximum size of thread pool
    static int default_keepAliveTimeMinutes = 1; // minutes to keep a thread alive
    static int default_maxQueueSize = -1;        // size of queue (this can't be dynamically changed so we use 'queueSizeRejectionThreshold' to artificially limit and reject)
                                                 // -1 turns it off and makes us use SynchronousQueue
    static boolean default_allow_maximum_size_to_diverge_from_core_size = false; //should the maximumSize config value get read and used in configuring the threadPool
                                                                                 //turning this on should be a conscious decision by the user, so we default it to false

    static int default_queueSizeRejectionThreshold = 5; // number of items in queue
    static int default_threadPoolRollingNumberStatisticalWindow = 10000; // milliseconds for rolling number
    static int default_threadPoolRollingNumberStatisticalWindowBuckets = 10; // number of buckets in rolling number (10 1-second buckets)


static int default_maxQueueSize = -1; // size of queue (this can’t be dynamically changed so we use ‘queueSizeRejectionThreshold’ to artificially limit and reject) // -1 turns it off and makes us use SynchronousQueue

如上所说,主要原因是阻塞队列的maxQueue不能“can’t be dynamically changed so we use ‘queueSizeRejectionThreshold’”,默认情况下使用default_queueSizeRejectionThreshold 来作为队列中的最大数,超过这个数目就会reject。



public class HystrixRuntimeException extends RuntimeException {

    private static final long serialVersionUID = 5219160375476046229L;

    private final Class<? extends HystrixInvokable> commandClass;
    private final Throwable fallbackException;
    private final FailureType failureCause;

    public static enum FailureType {

All exceptions thrown from the run() method except for HystrixBadRequestException count as failures and trigger getFallback() and circuit-breaker logic.


Command Group And Command Thread-Pool

Hystrix uses the command group key to group together commands such as for reporting, alerting, dashboards, or team/library ownership.

By default Hystrix uses this to define the command thread-pool unless a separate one is defined.

The thread-pool key represents a HystrixThreadPool for monitoring, metrics publishing, caching, and other such uses. A HystrixCommand is associated with a single HystrixThreadPool as retrieved by the HystrixThreadPoolKey injected into it, or it defaults to one created using the HystrixCommandGroupKey it is created with.

并不是一个service一定对应一个Command Group(Command Thread-Pool),一个service中的两个接口,可以有相同的Command Group、不同的Command Thread-Pool,不同的Command Thread-Pool有相同的Command Group是为了统计方便。

The reason why you might use HystrixThreadPoolKey instead of just a different HystrixCommandGroupKey is that multiple commands may belong to the same “group” of ownership or logical functionality, but certain commands may need to be isolated from each other.



