Tech Twitter: Pattern: Circuit Breaker

Problem

How to prevent a network or service failure from cascading to other service?

Solution

A service client should invoke a remote service via a proxy that functions in a similar fashion to an electrical circuit breaker.

When the number of consecutive failures crosses a threshold, the circuit breaker trips, and for the duration of a timeout period, all attempts to invoke the remote service will fail immediately.

After the timeout expires the circuit breaker allows a limited number of test requests to pass through. If those requests succeed the circuit breaker resumes normal operation. Otherwise, if there is a failure the timeout period begins again.

The different States of the Circuit Breaker

1. Closed

When everything is normal, the circuit breaker remains in the closed state and all calls pass through to the services. When the number of failures exceeds a predetermined threshold the breaker trips, and it goes into the Open state.

2. Open

The circuit breaker returns an error for calls without executing the function.

3. Half-Open

After a timeout period, the circuit switches to a half-open state to test if the underlying problem still exists. If a single call fails in this half-open state, the breaker is once again tripped. If it succeeds, the circuit breaker resets back to the normal, closed state .

Use Case of Circuit Breaker Pattern

Let’s take an example to understand, where we can apply Circuit Breaker Pattern in Microservices architecture.

Scenario:

Assume there are 5 different services in a Microservices application. Whenever it receives requests, the server will allocate one thread to call the particular service. But, due to some failure, the service is little delayed, and the thread is waiting. However it’s okay, if only one thread is waiting for that service.

But, if the service is a high demanding service that gets many requests, it is not good to hold. Because more threads will be allocated for this service within some time, and these threads will have to wait.

As a result, the remaining requests that comes to your service will be blocked or queued. Even though, the service is recovered back, the webserver is still trying to process the requests that in the queue. Because the webserver will never recover, since it receives requests continuously.

Eventually this might lead to Cascading failures throughout the application. Therefore, this kind of scenarios will lead to crash your services and even the application.

Solution:

The above scenario is a perfect example to apply Circuit Breaker Pattern. Assume you have defined threshold for a particular service, as it should respond within 200ms. As I mentioned above, that service is a high demand service, that continuously receive requests. In case, if 75% of those requests are reaching the upper threshold (150ms — 200ms) means that service is going to fail soon.

However, if several requests exceed the maximum threshold (200ms) means that service not responding anymore. As a result, it will fail back to the consumer and inform this particular service is not available. So, if you remember the above-mentioned states of this pattern, now we are moving to “Open” state from the “Closed” state.

As a result, all those requests that comes to the particular service, not going to wait anymore. However, after a timeout, the Circuit Breaker sends ping requests to that service in the background. That means now we are in the “Half-Open” state of the Circuit Breaker pattern. If these requests are successful, the Circuit Breaker will allow to send requests for that service again.

So, you can use the Circuit Breaker Pattern to improve the fault-tolerance and resilience of the Microservice Architecture and also to prevent the cascading of failure to other microservices.

Different Aspects we can use for Circuit Breaker pattern

Circuit Breaker
Retry
Rate Limiter.
Bulkhead
Time Limiter

1. Circuit Breaker

The circuit breaker has three distinct states: Closed, Open, and Half-Open:

You can implement the circuit breaker pattern with Netflix Hystrix. The following code can better explain the solution.

Example

The below microservice recommends the reading list to the customer:

package hello;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RequestMapping;

@RestController
@SpringBootApplication
public class BookstoreApplication {
   @RequestMapping(value = "/recommended")
   public String readingList(){
        return "Spring in Action ;
   } 
      public static void main(String[] args) {
            SpringApplication.run(BookstoreApplication.class, args);
    } 
}

Client application code which will call the reading list recommendation service:

package hello;
import java.net.URI;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;
import com.netflix.hystrix.contrib.javanica.annotation.HystrixCommand;

@Service
public class BookService {
     private final RestTemplate restTemplate;
     public BookService(RestTemplate rest) {
          this.restTemplate = rest;
     }
     @HystrixCommand(fallbackMethod = "reliable")
     public String readingList() {
          URI uri = URI.create("http://localhost:8090/recommended");
          return this.restTemplate.getForObject(uri, String.class);
     }
     
    public String reliable() {
      return "Cloud Native Java (O'Reilly)";
     }
}

In the above code method, the reading list is called remote microservice API to get the reading list recommendation.
Look at line number 19 of the above code, we have provided a fallback method "reliable." If the remote API does not respond in time, the method "reliable" will be called and that will serve the request.
In the fallback method, you can return either a default output or even call some other remote or local API to serve the request.

Update application.properties:

resilience4j.circuitbreaker.instances.getInvoiceCB.failure-rate-threshold=80

resilience4j.circuitbreaker.instances.getInvoiceCB.sliding-window-size=10

resilience4j.circuitbreaker.instances.getInvoiceCB.sliding-window-type=COUNT_BASED

resilience4j.circuitbreaker.instances.getInvoiceCB.minimum-number-of-calls=5

resilience4j.circuitbreaker.instances.getInvoiceCB.automatic-transition-from-open-to-half-open-enabled=true

resilience4j.circuitbreaker.instances.getInvoiceCB.permitted-number-of-calls-in-half-open-state=4

resilience4j.circuitbreaker.instances.getInvoiceCB.wait-duration-in-open-state=1s

‘failure-rate-threshold=80‘

Indicates that if 80% of requests are getting failed, open the circuit ie. Make the Circuit Breaker state as Open.

‘sliding-window-size=10‘

indicates that if 80% of requests out of 10 (it means 8) are failing, open the circuit.

'sliding-window-type=COUNT_BASED‘

indicates that we are using COUNT_BASED sliding window. Another type is TIME_BASED.

'minimum-number-of-calls=5‘

indicates that we need at least 5 calls to calculate the failure rate threshold.

‘automatic-transition-from-open-to-half-open-enabled=true‘

indicates that don’t switch directly from the open state to the closed state, consider the half-open state also.

'permitted-number-of-calls-in-half-open-state=4‘

indicates that when on half-open state, consider sending 4 requests. If 80% of them are failing, switch circuit breaker to open state.

‘wait-duration-in-open-state=1s’

indicates the waiting time interval while switching from the open state to the closed state.

These attributes are the important part of an implementation of a Circuit Breaker. We can configure the values as per our requirement and test the implemented functionality accordingly.

2. Retry

Suppose Microservice ‘A’ depends on another Microservice ‘B’. Let’s assume Microservice ‘B’ is a faulty service and its success rate is only upto 50-60%.

However, fault may be due to any reason, such as service is unavailable, buggy service that sometimes responds and sometimes not, or an intermittent network failure etc.

However, in this case, if Microservice ‘A’ retries to send request 2 to 3 times, the chances of getting response increases. Obviously, we can achieve this functionality with the help of annotation @Retry provided by Resilience4j without writing a code explicitly.

Here, we have to implement a Retry mechanism in Microservice ‘A’. We will call Microservice ‘A’ as Fault Tolerant as it is participating in tolerating the fault. However, Retry will take place only on a failure not on a success.

By default retry happens 3 times. Moreover, we can configure how many times to retry as per our requirement.

Example:

in Circuit Breaker  Example replace  
@HystrixCommand(fallbackMethod = "reliable") with
@Retry(fallbackMethod = "reliable")

Update application.properties.

resilience4j.retry.instances.getInvoiceRetry.max-attempts=5

resilience4j.retry.instances.getInvoiceRetry.wait-duration=2s

resilience4j.retry.instances.getInvoiceRetry.
retry-exceptions=org.springframework.web.client.ResourceAccessException

By default the retry mechanism makes 3 attempts if the service fails for the first time.

But here we have configured for 5 attempts, each after 2 seconds interval.

Additionally, if business requires it to retry only if a specific exception occurs, that can also be configured as above.

If we want Resilience4j to retry when any type of exception occurs, we don’t need to mention the property ‘retry-exceptions’.

3. Rate Limiter

Rate Limiter limits the number of requests for a given period. Let’s assume that we want to limit the number of requests on a Rest API and fix it for a particular duration.

There are various reasons to limit the number of requests that an API can handle, such as protect the resources from spammers, minimize the overhead, meet a service level agreement and many others.

Undoubtedly, we can achieve this functionality with the help of annotation @RateLimiter provided by Resilience4j without writing a code explicitly.

Example:

in Circuit Breaker  Example replace  
@HystrixCommand(fallbackMethod = "reliable") with
@RateLimiter(fallbackMethod = "reliable")

Update application.properties.

resilience4j.ratelimiter.instances.getMessageRateLimit.limit-for-period=2
resilience4j.ratelimiter.instances.getMessageRateLimit.limit-refresh-period=5s
resilience4j.ratelimiter.instances.getMessageRateLimit.timeout-duration=0

The above properties represent that only 2 requests are allowed in 5 seconds duration.

Also, there is no timeout duration which means after completion of 5 seconds, the user can send request again.

4. Bulkhead

In the context of the Fault Tolerance mechanism, if we want to limit the number of concurrent requests, we can use Bulkhead as an aspect. Using Bulkhead, we can limit the number of concurrent requests within a particular period.

Please note the difference between Bulkhead and Rate Limiting. Rate Limiter never talks about concurrent requests, but Bulkhead does. Rate Limiter talks about limiting number of requests within a particular period.

Hence, using Bulkhead we can limit the number of concurrent requests. We can achieve this functionality easily with the help of annotation @Bulkhead without writing a specific code.

Example:

in Circuit Breaker  Example replace  
@HystrixCommand(fallbackMethod = "reliable") with
@Bulkhead(fallbackMethod = "reliable")

Update application.properties.

resilience4j.bulkhead.instances.getMessageBH.max-concurrent-calls=5
resilience4j.bulkhead.instances.getMessageBH.max-wait-duration=0

‘max-concurrent-calls=5’ indicates that if the number of concurrent calls exceed 5, activate the fallback method.

‘max-wait-duration=0’ indicates that don’t wait for anything, show response immediately based on the configuration.

4. Time Limiter

Time Limiting is the process of setting a time limit for a Microservice to respond. Suppose Microservice ‘A’ sends a request to Microservice ‘B’, it sets a time limit for the Microservice ‘B’ to respond.

If Microservice ‘B’ doesn’t respond within that time limit, then it will be considered that it has some fault. We can achieve this functionality easily with the help of annotation @Timelimiter without writing a specific code.

Example:

import java.util.concurrent.CompletableFuture;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import io.github.resilience4j.timelimiter.annotation.TimeLimiter;

@RestController
public class TimeLimiterController {

      Logger logger = LoggerFactory.getLogger(TimeLimiterController.class);

      @GetMapping("/getMessageTL")
      @TimeLimiter(name = "getMessageTL")
      public CompletableFuture<String> getMessage() {
         return CompletableFuture.supplyAsync(this::getResponse);
      }

      private String getResponse() {

         if (Math.random() < 0.4) {       //Expected to fail 40% of the time
             return "Executing Within the time Limit...";
         } else {
             try {
                 logger.info("Getting Delayed Execution");
                 Thread.sleep(1000);
             } catch (InterruptedException e) {
                 e.printStackTrace();
             }
         }
         return "Exception due to Request Timeout.";
      }
}

Update application.properties.

resilience4j.timelimiter.instances.getMessageTL.timeout-duration=1ms
resilience4j.timelimiter.instances.getMessageTL.cancel-running-future=false

‘timeout-duration=1ms’ indicates that the maximum amount of time a request can take to respond is 1 millisecond

‘cancel-running-future=false’ indicates that do not cancel the Running Completable Futures After TimeOut.

Kubernetes	Microservices
K8s_introduction Introduction To Docker & Docker-Swarm Mastering Kubernetes Design Patterns common_commands Deep Dive into Kubeproxy: Unraveling Its Inner Workings in Kubernetes Helm KubeApiServer QoS A Deep Dive into Kubernetes Sidecar, Init Containers & Container Communication A Comprehensive Guide to Different Types of Services in Kubernetes Troubleshooting Kubernetes Ingress vs Service Mesh What is Prometheush Simplifying Kubernetes Complexity with the Operator Pattern Dynamic kubernetes cluster scaling POWERFUL TOOLS TO MANAGE KUBERNETEST All k8s Post	MicroServices Design Patterns Reverse proxy v/s Forward proxy How To Implement Hystrix Circuit Breaker In Microservices Application? What is Externalized configuration - Build Once, Run Anywhere in Ms? What is Prometheus Monitoring system & time series database What is an API gateway and why is it important?
Python	AI/ML
Python libraries and frameworks Python Basic Concepts ALL Post Python Intermediate Concepts ALL Post	AI: Categories and Subcategories
Spring Framework	Spring Boot
Spring Framework- Introduction What is bean In Spring Framework? Inversion Of Control [IOC] Spring - Beans AutoWiring Spring - Bean Validations Spring - Event Handling Spring - Internationalization (I18N) Spring - Bean Manipulations or Bean Wrappers Spring - Property Editors Spring - Profiling Spring Expression Language – SpEL API & Example	Building A Dockerizing Spring Boot App Part1 - End-to-End data Encryption Using Public and Private Keys in java / Spring Boot Part2 - End-to-End data Encryption - Different methods of encryption using public and private keys Demystifying Role based JWT Authentication in Modern Web Applications using spring boot
Core Java	Java Coding Question
Java_Fundamentals Java_8_To_18_Features Design_Patterns_&_Principles Benefits of setting initial and maximum memory size to the same value StackoverflowError causes-solutions	Java8_Coding_Question String_Coding_Question Array_Coding_Question Stack_Coding_Question Queue_Coding_Question Linked_List_Coding_Question Binary_Tree_Coding_Question Binary_Search_Tree_Coding_Question Sorting_Coding_Question Graph_Coding_Question DynamicProgramming_Easy_coding_Question Dynamic_Programming_Coding_Question Miscellaneous_Programming_Coding_Question
Maven	AWS
Demystifying the Maven Build Lifecycle: Phases, Goals, and Custom Lifecycles Mastering Maven Profiles: Tailoring Your Builds with Precision Mastering Maven Plugins and Dependency Management with Spring Boot	AWS Basics service AWS Service Sketch AWS v/s Azure Service All AWS Post

Tech Twitter

Thursday, May 12, 2022

Pattern: Circuit Breaker

1. Closed

2. Open

3. Half-Open

You may also like

Get new posts by email: