Canary deployment with Argo
Once again, we are going to talk about GitOps… In our last article, we briefly talked about Argo and its adoption by the CNCF. As a reminder, Argo is a Continuous Delivery tools suite:
- ArgoCD
- Argo Workflow
- Argo Rollout
- Argo Event
I would advise you to read or read again our article.
Introductions have been made, we are now going to practice with ArgoCD and Argo Rollout making a full application deployment with a simple git commit. We will see how Argo Rollout allows us to make Canary deployment, watch and control the rolling update and be able to rollback if something wrong happen.
Our application and its releases
I will use a simple application which provides a webservice with a json output like this:
{
"color": "red",
"status": "ok"
}
Very simple. We will watch color
to track application upgrades and status
as a health metric.
Here’s three releases of our application:
1.0
withcolor
atred
2.0
withcolor
atblue
3.0
withcolor
atblack
andstatus
atnok
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: colorapi
labels:
app: colorapi
spec:
replicas: 5
revisionHistoryLimit: 2
selector:
matchLabels:
app: colorapi
template:
metadata:
labels:
app: colorapi
spec:
containers:
- name: colorapi
image: particule/simplecolorapi:1.0
imagePullPolicy: Always
ports:
- name: web
containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: colorapi
spec:
ports:
- port: 80
selector:
app: colorapi
type: LoadBalancer
ArgoCD
As hinted by its name, ArgoCD handles the Continuous Delivery part, it allows us to deploy specific versions and keep our actual deployments synced with the desired state stored in a git repository. ArgoCD supports Helm, Ksonnet, Jsonnet, Kustomize and of course standalone Kubernetes manifests. We will use the laters for this article.
I let you walk through the official “getting started” guide to deploy Argo on your Kubernetes cluster. You should also install the ArgoCD CLI. Then we can start deploying our first application.
$ argocd app create colorapi --repo https://github.com/particuleio/demo-concourse-flux.git --path deploy --dest-server https://kubernetes.default.svc --dest-namespace default
$ argocd app sync colorapi
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
colorapi-7ccb9d965b-8pr54 1/1 Running 0 22s
colorapi-7ccb9d965b-9vtmp 1/1 Running 0 22s
colorapi-7ccb9d965b-c7jkf 1/1 Running 0 22s
colorapi-7ccb9d965b-klt7q 1/1 Running 0 22s
colorapi-7ccb9d965b-s2ftc 1/1 Running 0 22s
Our first test will be a modification of our replicas count. We are going to change that parameter in our Deployment, commit/push our code and see if ArgoCD updates our actual Deployment. ArgoCD can track either a tag, a commit or a branch. If you track a git tag, don’t forget to update it to the latest commit.
https://argoproj.github.io/argo-cd/user-guide/tracking_strategies/
A poll occurs every 3 minutes, it shouldn’t take long before the scheduling of our new replicas.
This is something pretty simple, in fact, we already describe and talk about such behaviour when we introduced you to Flux CD. They do the same thing. However, Flux can track a Docker image and update your Deployment manifest. ArgoCD can’t at the moment.
Let’s do a real application update.
Argo Rollout
Next component of the suite: Argo Rollout. It improves rolling update strategies provided by Kubernetes and adds the Canary Deployment and Blue/Green Deployment. The Canary is the one we will demonstrate. There is no exact definition of a Canary Deployment. The concept is to redirect a small part of the application traffic to the new version of the application. After that, it’s free for all, you can rapidly scale the new version up and replace the old one or you can do it steadily but slowly. The one commun point is to run tests and diagnosis to make sure that the new version doesn’t bring any sort of regression.
Let’s deploy Argo Rollout:
$ kubectl create namespace argo-rollouts
$ kubectl apply -n argo-rollouts -f https://raw.githubusercontent.com/argoproj/argo-rollouts/stable/manifests/install.yaml
Argo Rollout brings a CustomResourceDefinition that will superset the Deployment ressource: Rollout. Turning a Deployment into a Rollout is easy, you just have to change some fields:
apiVersion: argoproj.io/v1alpha1 # Changed from apps/v1
kind: Rollout # Changed from Deployment
Commit and push your Rollout. ArgoCD shouldn’t do anything because it’s basically the same resource.
Rollout strategy
Rollout resource allows to improve the strategy
Deployment spec. Here, we can
define our Canary configuration.
strategy:
canary:
steps:
- setWeight: 20
- pause:
duration: "30s"
- setWeight: 50
- pause:
duration: "30s"
There will be 4 steps:
- We redirect 20% of the traffic to our new version
- We wait 30 seconds
- We redirect 50% of the traffic to our new version
- We wait 30 seconds
The last step, which is implicite, redirects 100% of the traffic to the new version, ending the rolling update.
Only an Ingress can really redirect traffic between many services. We only have one Service, so the “redirection” is actually made by the proportion of pods running by different ReplicaSet at a given time and a round-robin mechanism. Argo Rollout can, however, work with real traffic management, Istio, Nginx et ALB are supported.
Let’s update our Rollout with a new image tag:
$ sed -i 's/1.0/2.0/g' deploy/helloworld.yml
$ git commit -am "bump: 2.0"
$ git push
$ argocd app sync colorapi
TIMESTAMP GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
2020-04-25T11:40:32+02:00 argoproj.io Rollout default colorapi Synced Healthy
2020-04-25T11:40:32+02:00 Service default colorapi Synced Healthy
2020-04-25T11:40:32+02:00 argoproj.io AnalysisTemplate default webcheck OutOfSync
Name: colorapi
Project: default
Server: https://kubernetes.default.svc
Namespace: default
URL: https://argo_url.tld/applications/colorapi
Repo: https://github.com/particuleio/demo-concourse-flux
Target: argocd
Path: deploy
SyncWindow: Sync Allowed
Sync Policy: Automated (Prune)
Sync Status: OutOfSync from argocd (95662aa)
Health Status: Healthy
Phase: Running
Start: 2020-04-25 11:40:34 +0200 CEST
Finished: <nil>
Duration: 1s
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
Service default colorapi Synced Healthy
argoproj.io AnalysisTemplate default webcheck OutOfSync
argoproj.io Rollout default colorapi Synced Healthy
Here’s what should happen:
- A new ReplicaSet (RS) is created with 2 desired pods. There are 2 canary pods out of 8 16% ~= 20%)
- 30 sec wait
- A new canary pod is created and three basline pods are terminating, there are 3 canary pods out of 6 (50%)
- 30 sec wait
- The last 3 baseline pods are terminating and three new canary pods appear, ending the rolling update
If we curl our application, we can see this behaviour:
$ while true; do curl $monapp | jq .color; sleep 0.5; done
"red"
"red"
"red"
"red"
"red"
"red"
"red"
# début des 20%
"blue"
"red"
"red"
"red"
"red"
"red"
"blue"
"red"
"red"
"red"
"blue"
"blue"
"red"
"red"
"red"
"red"
"red"
"red"
"red"
# début des 50%
"blue"
"red"
"blue"
"red"
"red"
"blue"
"blue"
"blue"
"red"
"blue"
"red"
"blue"
"blue"
"red"
"red"
"blue"
"blue"
"red"
"red"
# Fin du rolling update
"blue"
"blue"
"blue"
"blue"
"blue"
"blue"
"blue"
This rolling-update can also be started with your console and the argo plugin for kubectl.
$ kubectl argo rollouts set image colorapi "*=particule/simplecolorapi:2.0"
rollout "colorapi" image updated
This plugin also offers this good looking interface to follow your rolling-update:
$ kubectl argo rollouts get rollout colorapi -w
Name: colorapi
Namespace: default
Status: ✔ Healthy
Strategy: Canary
Step: 4/4
SetWeight: 100
ActualWeight: 100
Images: particule/simplecolorapi:1.0 (stable)
Replicas:
Desired: 3
Current: 3
Updated: 3
Ready: 3
Available: 3
NAME KIND STATUS AGE INFO
⟳ colorapi Rollout ✔ Healthy 5h52m
├──# revision:32
│ ├──⧉ colorapi-66f9756599 ReplicaSet ✔ Healthy 10m stable
│ │ ├──□ colorapi-66f9756599-p4kfl Pod ✔ Running 4m6s ready:1/1
│ │ ├──□ colorapi-66f9756599-lqnsn Pod ✔ Running 3m32s ready:1/1
│ │ └──□ colorapi-66f9756599-554wc Pod ✔ Running 2m58s ready:1/1
│ └──α colorapi-66f9756599-32 AnalysisRun ✔ Successful 4m6s ✔ 30,⚠ 12
├──# revision:31
│ ├──⧉ colorapi-7d458c8cd8 ReplicaSet • ScaledDown 5m36s
│ └──α colorapi-7d458c8cd8-31 AnalysisRun ✔ Successful 5m36s ✔ 30
├──# revision:30
├──⧉ colorapi-59b5ddb84f ReplicaSet • ScaledDown 7m1s
└──α colorapi-59b5ddb84f-30 AnalysisRun ✔ Successful 7m ✔ 30
Analysis Run
But what’s very interesting with Argo Rollout (and every canary deployment tools) is the ability to watch your rolling update and the ability to take actions based on some metrics. If those metrics don’t match with what was expected, an automatic rollback occures.
We use a new resource for that: AnalysisRun
. Let’s describe it with the piece
of code you need to change in your Rollout manifest.
strategy:
canary:
analysis:
templates:
- templateName: webcheck
args:
- name host
value: colorapi
steps:
- setWeight: 20
- pause:
duration: "30s"
- setWeight: 50
- pause:
duration: "30s"
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: webcheck
spec:
args:
- name: host
metrics:
- name: webcheck
failureLimit: 1
interval: 5
successCondition: result == "ok"
provider:
web:
# paceholders are resolved when an AnalysisRun is created
url: "http://{{args.host}}/"
jsonPath: "{$.status}"
Some inputs. Let’s start with the changes in our Rollout
. We introduced the
analysis
parameter, it allows to specify the AnalysisTemplate we will use.
This analysis will run indefinitely, it will only be stopped when it will fail
or when our rolling update will be completed. We also pass some arguments to be
used later.
The Analysis
now. There are many
providers and
we chose web
because it allows us to easily base on our application json
output to determine the outcome of your analysis. Our test is simple, we are
going to check the value of status
, if it’s “ok”, it’s good, otherwise, the
test would be marked as failed. The test will occur every 5 seconds and we
tolerate one failure before initiate a rollback.
Demonstration with the non functionnal image.
$ kubectl argo rollouts set image colorapi "*=particule/simplecolorapi:3.0"
rollout "colorapi" image updated
$ while true; do curl $monapp | jq .color; sleep 0.5; done
# Beginning of the test
"red"
"red"
"red"
"red"
"red"
"red"
"red"
# 20%, trouble's coming
"black"
"red"
"red"
"black"
"red"
"black"
"red"
"red"
"red"
"red"
"red"
"black"
"black"
# More than 1 error happened, rollback initiated
"red"
"red"
"red"
"red"
"red"
"red"
"red"
"red"
"
The result of our Rollback would look like this:
$ kubectl argo rollouts get rollout colorapi
Name: colorapi
Namespace: default
Status: ✖ Degraded
Strategy: Canary
Step: 0/2
SetWeight: 0
ActualWeight: 0
Images: particule/simplecolorapi:1.0 (stable)
Replicas:
Desired: 3
Current: 3
Updated: 0
Ready: 3
Available: 3
NAME KIND STATUS AGE INFO
⟳ colorapi Rollout ✖ Degraded 30h
├──# revision:34
│ ├──⧉ colorapi-7d458c8cd8 ReplicaSet • ScaledDown 28h canary
│ └──α colorapi-7d458c8cd8-34 AnalysisRun ✖ Failed 35m ✔ 1,✖ 2
Our state is Degraded
because we asked for the 3.0
image but it failed. We
can have our Healthy
state back by asking again the 1.0
.
What matters is that our rolling update was successfully stopped. We prevented a non functionnal image from being deployed in production !
Conclusion
We just scratched the surface of what Argo can offer. Our metric didn’t make much sense, neither did our application but this example was sufficient enough to understand the importance of GitOps concepts.