Supporting disaster recovery with a multi-region EKS deployment

By admin
Disaster events are one of the biggest challenges that a software organization can face. Natural disasters like earthquakes or floods, technical failures such as power or network loss, and human actions such as unauthorized attacks can disable an entire fleet of systems, leading to complete failure for a business. To deal with disaster scenarios requires a proactive approach to prepare and recover from failure.


One
CARDINAL

of the key benefits of running in the cloud is how easy it is to run workloads in multiple regions. This allows you to deploy a resilient architecture that supports disaster recovery, even in the cases where an entire region is disabled.

This post shows you how you can deploy

Elastic Kubernetes Service
ORG

in a multi-region architecture and seamlessly shift traffic to a

second
ORDINAL


AWS
LOC

region in the event of a disaster.

Disaster recovery is not

one
CARDINAL

-size fits all — different solutions exist depending on your requirements. This blog post from

AWS
ORG

provides a great graphical representation of the different solutions. As you move further to the right, the solution gets more complex, and typically more expensive.

A spectrum of disaster recovery options

The recovery strategy this blog post targets is a warm standby architecture:

two
CARDINAL


Kubernetes
ORG

clusters are continuously running and able to accept traffic, as long as your services are horizontally scalable you can keep the secondary failover region at lower capacity and scale up in the event of a disaster.

The overall architecture we will be implementing in this post is to deploy

two
CARDINAL


EKS
ORG

clusters in

two
CARDINAL

regions:

one
CARDINAL

in us-east-1 and the other in us-west-2. Each cluster is deployed to a separate

VPC
ORG

. To properly direct traffic to the correct primary cluster, we use

AWS Global Accelerator
ORG

. Global Accelerator provides a pair of globally unique IP addresses that serve as the entry point into our application.

Global Accelerator
ORG

can be configured to direct a percentage of traffic to

one
CARDINAL

cluster at a time.

A multi-region

EKS
ORG

deployment supporting disaster recovery

To simplify the explanation, we can use the eksctl tool to easily provision

EKS
ORG

clusters in

two
CARDINAL

regions. Go ahead and install eksctl using the instructions available. If you are on

macOs
PERSON

, you can use homebrew

brew tap weaveworks/tap brew install weaveworks/tap/eksctl

Creating the clusters#

Our multi-region deployment uses

two
CARDINAL

different clusters in

two
CARDINAL

different regions. To keep things simple (and cost-effective), we declare a node group with

only one EC2
CARDINAL

instance as the node using the following configuration for eksctl .

apiVersion : eksctl.io/v1alpha5 kind : ClusterConfig metadata : name : ${EKS_CLUSTER} region : ${AWS_REGION} managedNodeGroups : – name : main-ng instanceType : m5.large desiredCapacity :

1
CARDINAL

privateNetworking : true

Using the envsubst command, we can set the appropriate environment variables before creating the cluster. Go ahead and create a cluster in us-east-1 and us-west-2 .

us-east-1 cluster

AWS_REGION = us-east-1 EKS_CLUSTER = sookocheff-us-east-1 envsubst < eksconfig.yml | eksctl create cluster -f –

us-west-2 cluster

AWS_REGION = us-west-2 EKS_CLUSTER = sookocheff-us-west-2 envsubst < eksconfig.yml | eksctl create cluster -f –

Enabling IAM Roles for Service Accounts#

In both clusters, we need to enable

AWS
ORG

IAM Roles to Kubernetes Service Accounts. When enabled,

EKS
ORG

uses an admission controller to inject

AWS
ORG

session credentials into pods based on the annotation of the Service Account used by the pod. The credentials will get exposed as

AWS_ROLE_ARN
PERSON

and AWS_WEB_IDENTITY_TOKEN_FILE environment variables. Recent versions of the

AWS
ORG

SDK will automatically read these environment variables, so nothing more needs to be done by any application running on a pod that needs access to

AWS
ORG

services.

To set up the OIDC provider necessary for

IAM
ORG

-based service accounts, execute the following command for each cluster we created.


eksctl
ORG

utils associate-iam-oidc-provider –cluster = <clusterName> –approve

Getting traffic to our cluster#

For our example cluster, we will deploy a load balancer outside of

Kubernetes
ORG

to direct traffic destined for

Kubernetes
ORG

pods using

the AWS Load Balancer Controller
PRODUCT

. Using this controller, an AWS Elastic Load Balancer will be deployed in response to events in the

Kubernetes
ORG

control plane. Once traffic reaches our cluster, we direct it to

Nginx Ingress Controller
ORG

, which acts as a Layer 7 reverse proxy that exposes individual services in our cluster to external traffic. The following diagram from this blog post on exposing

Kubernetes
ORG

applications shows the overall architecture.

Load balancing traffic to

Kubernetes
ORG

services

Setting up IAM Roles for the AWS Load Balancer#

An

IAM
ORG

policy with the required permissions for the AWS Load Balancer Controller is available on Github, you can grab that policy directly from the repo:

curl -o iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.7/docs/install/iam_policy.json

With the policy file in place, you can create the policy using the following command. Note that since

IAM
ORG

is a global service, this only needs to be done once to create the policy for both regions.


aws iam
GPE

create-policy \ –policy-name EksLoadBalancerControllerIAMPolicy \ –policy-document

file://iam_policy.json
ORG

We have the policy, now we need to create

the Kubernetes Service Account
ORG

in each cluster that uses the policy. The following command does just that, leveraging eksctl . Substitute EKS_CLUSTER with clusters created earlier, and

ACCOUNT_ID
ORG

with your

AWS
ORG

account id. You can grab this easily using aws sts get-caller-identity –query ‘Account’ –output text

eksctl create iamserviceaccount \ –cluster = ${ EKS_CLUSTER } \ –name = aws-load-balancer-controller \ –namespace = kube-system \

–attach
ORG

-policy-arn = arn:aws:iam:: ${ ACCOUNT_ID } :policy/EksLoadBalancerControllerIAMPolicy \ –approve

Install the AWS Load Balancer Controller#

With the correct service account in place, we can install

the AWS Load Balancer Controller
PRODUCT

using helm.


First
ORDINAL

, add the

EKS
ORG

charts repo.

helm repo add eks https://aws.github.io/eks-charts

Then install the load balancer, referencing the service account we previously created. Remember to install the load balancer in each cluster we created.

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ –set clusterName=<cluster-name> \ –set serviceAccount.create=false \ –set serviceAccount.name=aws-load-balancer-controller

Install the Nginx Ingress Controller#

The nginx Ingress Controller needs to configured with annotations that allow

the AWS Load Balancer Controller
PRODUCT

to route traffic to it. To set these annotations, create the following file called values.yaml with the appropriate annotations set. In this case, we change the Service type to be

LoadBalancer
PERSON

, and define the name for the load balancer that will be used (in this case default-ingress ). We make it internet-facing, so we can access it, define its target type to be IP, and configure the health check for the

NGINX
ORG

server it will route to.

controller : service : type :

LoadBalancer
PERSON

annotations : service.beta.kubernetes.io/aws-load-balancer-name : default-ingress service.beta.kubernetes.io/aws-load-balancer-type : external service.beta.kubernetes.io/aws-load-balancer-scheme : internet-facing service.beta.kubernetes.io/aws-load-balancer-nlb-target-type : ip service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol : http service.beta.kubernetes.io/aws-load-balancer-healthcheck-path : /healthz service.beta.kubernetes.io/aws-load-balancer-healthcheck-port :

10254
CARDINAL

With this file in place, we can deploy the nginx Ingress

Controller
PERSON

using

Helm
PERSON

:

helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx \ –namespace kube-system \ –values values.yaml

Deploying the service#

To test our setup, we can deploy an application to each region that simply prints the application name alongside the region that it resides in. For that, I use this simple

Kubernetes Service
ORG

and Deployment that uses the http-echo container to simply return HTTP with the region.

apiVersion : apps/v1 kind : Deployment metadata : name : ${SERVICE_NAME} labels : app.kubernetes.io/name : ${SERVICE_NAME} spec : selector : matchLabels : app.kubernetes.io/name : ${SERVICE_NAME} replicas :

1
CARDINAL

template : metadata : labels : app.kubernetes.io/name : ${SERVICE_NAME} spec : terminationGracePeriodSeconds :

0
CARDINAL

containers : – name : ${

SERVICE_NAME
FAC

} image : hashicorp/http-echo imagePullPolicy : IfNotPresent args : – – listen=:3000 – – text=${SERVICE_NAME} | ${AWS_REGION} ports : – name : app-port containerPort :

3000
CARDINAL

resources : requests : cpu :

0.125
CARDINAL

memory :

50Mi
PERSON

— apiVersion : v1 kind : Service metadata : name : ${SERVICE_NAME} labels : app.kubernetes.io/name : ${SERVICE_NAME} spec : type :

ClusterIP
PERSON

selector : app.kubernetes.io/name : ${

SERVICE_NAME
FAC

} ports : – name : svc-port port :

80
CARDINAL

targetPort : app-port protocol : TCP

Our resource definition takes

two
CARDINAL

environment variables. The name of the service (

SERVICE_NAME
ORG

), and the region that it is deployed to (

AWS_REGION
PRODUCT

).


Switch
ORG

the

Kubernetes
ORG

context to the

first
ORDINAL

region in us-east-1 , and deploy the app.


SERVICE_NAME
DATE

=

first
ORDINAL

AWS_REGION = us-east-1 envsubst < service.yaml | kubectl apply -f –

Deploy the same app again using the

us
GPE

-west-2

Kubernetes
ORG

context.


SERVICE_NAME
DATE

=

first
ORDINAL

AWS_REGION = us-west-2 envsubst < service.yaml | kubectl apply -f –

At this point, we have the same application deployed to both clusters.

Deploying an Ingress#

To allow traffic external to the cluster to reach our services requires deploying an ingress. Our ingress is fairly simple, it routes a request matching a prefix to the service we just deployed.

apiVersion : networking.k8s.io/v1 kind : Ingress metadata : name : default-ingress spec : ingressClassName : nginx rules : – http : paths : – path : /first pathType : Prefix backend : service : name :

first
ORDINAL

port : name : svc-port

Deploy the ingress using this file. You may need to wait for

up to ten minutes
TIME

for

the Elastic Load Balancer
WORK_OF_ART

to be created before traffic is routable to your service.

kubectl apply -f ingress.yaml

You can test whether your ingress is working by retrieving the load balancer URL and making queries against it. The following command will retrieve the load balancer URL from the ingress resource.

export NLB_URL = $( kubectl get -n kube-system service/ingress-nginx-controller \ -o

jsonpath
PERSON

= ‘{.status.loadBalancer.ingress[0].hostname}’ )

With the URL in hand, you can query your service. Repeat the process in both regions to see your ingress working for both

EKS
ORG

clusters.

curl ${ NLB_URL } /first

We now have a service running in

two
CARDINAL

clusters that are accessible from their own unique load balancers. Next, we tie these

two
CARDINAL

clusters together using

AWS Global Accelerator
ORG

.

Deploying Global Accelerator#

AWS Global Accelerator is a networking service that sends your user’s traffic through

AWS
ORG

’s global network infrastructure. In some cases, this can improve user experience because by offering an earlier access point onto

Amazon
ORG

’s networking infrastructure.

For our purposes, it also makes it easier to operate multi-regional deployments.

Global Accelerator
ORG

provides

two
CARDINAL

static IP address that are anycast from

AWS
ORG

’s globally distributed edge locations. This gives you a single entry point into an application no matter how many regions it is deployed to.


Global Accelerator
ORG

provides an improved experience over using

DNS
ORG

to failover traffic. DNS results are often cached by clients for unknown periods of time. With

Global Accelerator
ORG

, the

DNS
ORG

entry remains the same while allowing you to switch traffic between different endpoints. This eliminates any delays caused by

DNS
ORG

propagation or client-side caching of

DNS
ORG

results

Creating a Global Accelerator#

A Global Accelerator consists of a few main components: listeners, endpoint groups, and endpoints.

A listener processes inbound connections from clients to

Global Accelerator
ORG

, based on the port (or port range) and protocol (or protocols) that you configure. TCP and

UDP
ORG

protocols. Each listener has

one
CARDINAL

or more endpoint groups associated with it, and traffic is forwarded to endpoints in

one
CARDINAL

of the groups.

Each endpoint group is associated with a specific

AWS Region
LOC

. Endpoint groups include

one
CARDINAL

or more endpoints in the

Region
LOC

.

An endpoint is the resource within a region that

Global Accelerator
ORG

directs traffic to. Endpoints can be

Network Load Balancers, Application Load Balancers
ORG

, EC2 instances, or

Elastic IP
PERSON

addresses.


AWS Global Accelerator
ORG

can be created and configured from the command line.

First
ORDINAL

, create the accelerator and take note of the ARN.


ACCELERATOR_ARN
GPE

= $(

aws globalaccelerator
ORG

create-accelerator \ –name multi-region \ –query "

Accelerator
WORK_OF_ART

.AcceleratorArn" \ –region us-east-1 \ –output text )

Next, add a listener on port

80
CARDINAL

for the TCP protocol. This configures

Global Accelerator
PRODUCT

to listen for HTTP traffic.

LISTENER_ARN=$(aws globalaccelerator create-listener \ –accelerator-arn $ACCELERATOR_ARN \ –region us-west-2 \ –protocol TCP \ –port-ranges FromPort=80,ToPort=80 \ –query "

Listener
WORK_OF_ART

.

ListenerArn
ORG

" \ –output text)

The load balancers we created as part of our default ingress need to be registered as endpoints for the listener, so we can direct traffic to the correct location. Traffic is routed to

one
CARDINAL

or more registered endpoints in the endpoint group. Using the command line, you can register both an endpoint group and endpoints within the group using the same command.

For both regions, repeat the following steps:


First
ORDINAL

, query

AWS
ORG

for the ARN of the load balancer matching our ingress we previously created.

export

INGRESS_ARN=$(aws elbv2
ORG

describe-load-balancers \ –region us-east-1 \ –query "LoadBalancers[?contains(DNSName, ‘default-ingress’)].LoadBalancerArn" \ –output text)

Next, create the endpoint group with an endpoint configured as the ingress load balancer we created.

ENDPOINT_GROUP_ARN = $(

aws globalaccelerator
ORG

create-endpoint-group \ –region us-east-1 \ –listener-arn $LISTENER_ARN \ –endpoint-group-region us-east-1 \ –query "

EndpointGroup
ORG

.EndpointGroupArn" \ –output text \ –endpoint-configurations

EndpointId
ORG

= $INGRESS_ARN ,Weight = 128,ClientIPPreservationEnabled = True )

When repeating these steps for the secondary region, set the traffic dial to route

0%
PERCENT

of traffic to the

us
GPE

-west-2 endpoint group.

ENDPOINT_GROUP_ARN = $(

aws globalaccelerator
ORG

create-endpoint-group \ –region us-east-1 \ –listener-arn $LISTENER_ARN \ –endpoint-group-region us-west-2 \ –query "

EndpointGroup
ORG

.EndpointGroupArn" \ –traffic-dial-percentage 0 \ –output text \ –endpoint-configurations

EndpointId
ORG

= $INGRESS_ARN ,Weight = 128,ClientIPPreservationEnabled = True )

With this configuration, we have a single global accelerator configured to send

100%
PERCENT

of traffic to our cluster in us-east-1 and

0%
PERCENT

of traffic to our cluster in us-west-2 .

Testing it out#

To test out our configuration, we can send traffic to the

DNS
ORG

entry of the global accelerator.


GA_DNS
GPE

= $( aws globalaccelerator describe-accelerator \ –accelerator-arn $ACCELERATOR_ARN \ –query "Accelerator.

DnsName
PERSON

" \ –output text )

We configured global accelerator to route

100%
PERCENT

of traffic to the

us
GPE

-east-1 region. When we make a request to the accelerator, it responds with the region where the application was deployed.

❯ curl $

GA_DNS
GPE

/first first | us-east-1

In the

AWS
ORG

console, if you set the accelerator’s traffic dial to route

100%
PERCENT

of traffic to us-west-2 and

0%
PERCENT

of traffic to us-east-1 after a few moments you will see traffic directed to us-west-2 through the same entry point.

❯ curl $

GA_DNS
GPE

/first first | us-west-2

To simulate a disaster recovery scenario, you can delete the ingress rule and load balancer in the primary region. The health status in global accelerator will report as unhealthy and begin routing users to the healthy region.