Configuring SmallRye Retry for a Kubernetes deployment

We had a need to configure the SmallRye Retry configurations on the fly on our product that is running as a Kubernetes deployment, but couldn't easily find documentation on this particular case. So, I decided to share this solution, and apparently also create a small tutorial on Quarkus and Kubernetes.

For this example, you shold have some understanding of Java, Maven, Docker and Kubernetes, but in the end, it is quite simple, and you (hopefully) should not have many issues replicating it.

Skip straight to a Kubernetes configuration example

Requirements

Rancher for Kubernetes and Docker, Maven and Java 17

Setting up the project

To create a suitable test environment, we will need a small Quarkus project. To set this up, I decided to create a new project following the SmallRye Fault Tolerance guide. This allows us to create a project with a single resource that fails 50% of the time it is called. This gives us the necessary starting point to create our test case with a slight modification.

Remove the additional retry configurations from the annotation in CoffeeResource.java. This allows the global configuration changes to take effect and not be overwritten by the annotation.

@GET
@Retry
public List<Coffee> coffees() {

Creating a Kubernetes deployment

Creating a Kubernetes deployment requires us to create a Docker image from the project and using that as the Kubernetes deployment's image.

Creating a Docker image

To create a Docker image of the project lets first change the Quarkus package type to an uber jar. This can be done by modifying the resources/application.properties-file and adding the following line to it:

quarkus.package.type=uber-jar

The next step will be adding a Dockerfile to the project root, in which we specify which image we want to use and where to find our jar to deploy.

# Using Java 17
FROM eclipse-temurin:17-jre  

# Copy the jar
COPY ./target/microprofile-fault-tolerance-quickstart-1.0.0-SNAPSHOT-runner.jar /app/app.jar

EXPOSE 8080  
CMD ["java", "-jar", "/app/app.jar"]

With the configuration set and Dockerfile in place we can create the image that we will be using. This will create a local image "retry" with tag "latest".

# Creates the target jar that will be copied to the image
mvn clean package

# Creates the images based on the Dockerfile
docker build --no-cache -t retry:latest .

Creating the Kubernetes deployment

Let's first create a namespace for our test deployment called "dev". This is not necessary, but I like to avoid using the "default" namespace.

kubectl create namespace dev

# Change the current namespace to dev
kubectl config set-context --current --namespace=dev

The next step is to create the deployment yaml, in which we will configure the deployment. Ours will look like this:

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  # Name of the deployment  
  name: retry-deployment  
  labels:  
    app: retry  
spec:  
  replicas: 1  
  selector:  
    matchLabels:  
      app: retry  
  template:  
    metadata:  
      labels:  
        app: retry  
    spec:  
      containers:  
        - name: retry  
          # Image to be used for the deployment  
          image: retry:latest  
          imagePullPolicy: IfNotPresent  
          ports:  
            - containerPort: 8080

This creates a deployment named "retry-deployment" to our current namespace when we apply the yaml:

# Create the deployment
kubectl apply -f .\deployment.yaml

# Show the deployment
kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
retry-deployment   1/1     1            1           3s

To access and test the endpoint, we have to forward the 8080 port from the deployment to our host:

# Forward the 8080 port
kubectl port-forward deployments/retry-deployment 8080

# Curl the endpoint
curl localhost:8080/coffee
[{"id":1,"name":"Fernandez Espresso","countryOfOrigin":"Colombia","price":23},{"id":2,"name":"La Scala Whole Beans","countryOfOrigin":"Bolivia","price":18},{"id":3,"name":"Dak Lak Filter","countryOfOrigin":"Vietnam","price":25}]

# You can follow the retries from the application logs
kubectl logs -f deployment/retry-deployment

# First call failed and was retried
2023-10-21 09:11:15,564 ERROR [org.acm.mic.fau.CoffeeResource] (executor-thread-1) CoffeeResource#coffees() invocation #1 failed
# Second succeeded and was returned
2023-10-21 09:11:25,509 INFO  [org.acm.mic.fau.CoffeeResource] (executor-thread-1) CoffeeResource#coffees() invocation #2 returning successfully

With these resources in place, we can continue to the retry configurations.

Resetting configurations

If you need to reset the configurations for Kubernetes, you can delete and recreate the deployment, which will use the deployment.yaml-file configuration again.

kubectl delete -f ./deployment.yaml
kubectl apply -f ./deployment.yaml

The image might still have some configurations from the resources/application.properties-file, to clear these you should remove the unnecessary lines from the file, repackage the project, update the image and recreate the deployment:

# Repackage the project
mvn clean package

# Update the image
docker build --no-cache -t retry:latest .

# Recreate the deployment
kubectl delete -f ./deployment.yaml
kubectl apply -f ./deployment.yaml

Retry configurations

The documentation for the retry configurations can be found from here.

Through application.properties

To configure the SmallRye retry, we would normally use the resources/application.properties-file which would look like this:

# These would be the default values, if we would not have specified them

# The time after which the retry is triggered
Retry/delay=0
# Number of retries
Retry/maxRetries=3
# We can also enable/disable the retries from the configuration
Retry/enabled=true

Through Kubernetes

If we'd want to configure a Kubernetes deployment with resources/application.properties-file we would need to rebuild and deploy the application, which is a bit clunky.

Instead, we can change the configurations by editing the deployment's environment variables:

# Edit the deployment
kubectl edit deployments/retry-deployment

You can locate the environment variables under the containers section. 
The following changes will set the retry delay to 10 seconds and the retry amount to 1:
    spec:
      containers:
      - env:
        - name: Retry_delay
          value: "10000"
        - name: Retry_maxRetries
          value: "1"
        image: retry:latest
        imagePullPolicy: IfNotPresent
        name: retry
        ports:
        - containerPort: 8080
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

Alternatively this can be done on the deployment.yaml-file before applying the changes again.

Granular configurations

Some times, we want to control and change configurations per class basis on the fly. Thankfully these variables can be configured per class and even method.

Let's create another endpoint to our system using another resource class. We can do this, by creating a new file "FakeTeaResource.java" and copying the contents from "CoffeeResource.java" and making the following changes:

@Path("/tea")  
public class FakeTeaResource {

After building the application, you should be able to access two different endpoints:

localhost:8080/coffee
[{"id":1,"name":"Fernandez Espresso","countryOfOrigin":"Colombia","price":23},{"id":2,"name":"La Scala Whole Beans","countryOfOrigin":"Bolivia","price":18},{"id":3,"name":"Dak Lak Filter","countryOfOrigin":"Vietnam","price":25}]
C:\Tools\cmder
localhost:8080/tea
[{"id":1,"name":"Fernandez Espresso","countryOfOrigin":"Colombia","price":23},{"id":2,"name":"La Scala Whole Beans","countryOfOrigin":"Bolivia","price":18},{"id":3,"name":"Dak Lak Filter","countryOfOrigin":"Vietnam","price":25}]

Now, let's focus on disabling retries exclusively for the "tea" endpoint. You can verify the logs, that the calls are not retried anymore.

# No retry:
2023-10-21 09:56:21,814 ERROR [org.acm.mic.fau.FakeTeaResource] (executor-thread-1) TeaResource#coffees() invocation #0 failed
2023-10-21 09:56:21,819 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /tea failed, error id: efd3ddc6-d1eb-4c29-99af-8
b0386236b8a-1: java.lang.RuntimeException: Resource failure.

# Retry:
2023-10-21 09:58:49,500 ERROR [org.acm.mic.fau.CoffeeResource] (executor-thread-1) CoffeeResource#coffees() invocation #0 failed
2023-10-21 09:58:49,653 ERROR [org.acm.mic.fau.CoffeeResource] (executor-thread-1) CoffeeResource#coffees() invocation #1 failed
2023-10-21 09:58:49,758 INFO  [org.acm.mic.fau.CoffeeResource] (executor-thread-1) CoffeeResource#coffees() invocation #2 returning successfully

Using application.properties

# This applies the changes only to FakeTeaResource-class
org.acme.microprofile.faulttolerance.FakeTeaResource/Retry/enabled=false

# It can also be configured per method basis. The following only disables retries for the coffees-method in the FakeTeaResource-class:
org.acme.microprofile.faulttolerance.FakeTeaResource/coffees/Retry/enabled=false

Using Kubernetes

Like in the previous section, we can also adjust the configurations for the Kubernetes deployment:

# Disable retry for FakeTeaResource-class
containers:
  - env:
    - name: org_acme_microprofile_faulttolerance_FakeTeaResource_Retry_enabled
      value: "false"
    
# Disable retry for coffees-method in FakeTeaResource-class
containers:
  - env:
    - name: org_acme_microprofile_faulttolerance_FakeTeaResource_coffees_Retry_enabled
      value: "false"

Conclusion

Retries are used as a way to improve the reliability of an application. This means that it is not an error handling solution, but should be used as a way to reduce the amount of errors. Having handling for the errors after retries have been used up is important.

Retries also shouldn't be always configured globally, as some calls might take longer than others making retries happen too frequently or stressing the database too much, and some of them we do not want to retry at all, for various reasons.

If you want to dive deeper into these concepts, the SmallRye Fault Tolerance documentation has examples for fallbacks, circuit breakers, timeouts etc.

Cleaning up

To clean up the local environment we can delete the Docker image, Kubernetes deployment and namespace:

# Delete the deployment
kubectl delete -f ./deployment.yaml

# List all docker images, we should find one "retry"
docker image ls

# Delete the image
docker image rm retry

# Delete the Kubernetes namespace
kubectl delete namespace dev