How do you develop an application running akka-cluster on Kubernetes?
There are so many positives about akka-cluster and improving your knowledge on them will only benefit your programming!
Want to know more? Check out this article from SoftwareMill Developer, Grzegorz Kocur.
'In this article I won’t go into akka-cluster details and why it’s great. You can find a lot of excellent articles about it. Of course a good starting point is the very good akka documentation. Since kubernetes is becoming a standard way of running contenerized applications on multiple servers I want to describe how to run akka-clusters on kubernetes and how to develop such application.
Clusters — the challeges
Running clustered application can be challenging. Fortunately most of the complicated stuff like: leader election, convergence, failure detection, etc, is already built into akka itself and you don’t need to worry about it (I recommend you to read akka documentation which describes how it works).
You — as an operator — are responsible for forming the cluster, adding new nodes, removing nodes from cluster and handling failures like split-brain or node crashes.
Generally the akka-cluster is formed like this:
- The first node starts and joins itself. It’s called seed node since other nodes will use it’s address as the contact point to get cluster design. It becomes the cluster leader.
- Next node starts, connects to seed node(s) and tries to join the cluster.
- When all cluster nodes “see” the new member the cluster leader sets it’s state to up.
- If the node leaves a cluster in the graceful way it changes it’s state to 'leaving' and the leader removes it from the cluster.
- The leaving node’s state changes to 'removed'.
Looks pretty easy, but there are some problems with this flow. First (and probably most important) — it works only on “stable” clusters. When one or more nodes are unreachable (for example after hardware failure) the leader can’t perform it’s duties. So you have to start the node again or mark it as Down (manually or using some scripts). The second problem: when the seed node goes down there is no way to add new nodes to cluster.
On the “static” system (when all cluster nodes run on predefined IP numbers and ports) it’s possible to avoid those problems with for example running some scripts to down unreachable nodes or configuring multiple seed nodes (which is the best practice). But things are becoming really complicated when cluster runs on orchestration system like AWS ECS or kubernetes. As you probably know pods on k8s are ephemeral which means they can go down and up again at any time. Also the IPs and ports are assigned dynamically.
Here shines the new way of forming akka-cluster: cluster bootstrap and service discovery.
Cluster bootstrap
Gracefully downing cluster nodes
'CoordinatdShutdown' is the akka extension introduced in akka 2.5. It can stop actors and services in specific order when the application is shut down. This extension is fully configurable and you can define your own shutdown procedure configuring shutdown phases.
When clustering is enabled the default configuration is exactly what is expected — when 'SIGTERM' is sent to jvm process the application will sent its own cluster state as 'leaving' and eventually will down itself gracefully. When application runs in container and the container is stopped (for example using 'docker stop' command) 'SIGTERM' is sent to 'PID 1' end things run exactly as we want.
Something bad happened…
It can happen. Well, actually it will happen. Sooner or later. The container will be killed by OS because of OOM. The host will brake because of hardware failure. The network partition will occur because of faulty network switch…
Of course coordinated shutdown will not help in such cases — for example when the 'SIGINT' is sent to application it will stop working immediately and other akka cluster nodes will see such node as 'unreachable'.
We need a process which will heal our cluster — the Split Brain Resolver (SBR).
There is a great commertial SBR implementation sold as part of commercial subscription by Lightbend. There are also several open source solutions, you have to decide yourself. In the example application described below I use very interesting solution: akka cluster custom downing.
Ok, let’s write some code and see all those concepts in action. You can download example application from github and try it yourself on your own kubernetes cluster.
Simple akka cluster
First things first — we have to start cluster discovery. Since it relies on akka management extension we have to start it as well:
Most important part is configuration.
To run cluster bootstrap and service discovery we must not set seed nodes:
docker run -ti -p 8558:8558 \
-e SEED_NODES.0=”akka.tcp://akka-simple-cluster@172.17.0.2:2551" \
softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT
There are several downing strategies we can use in SBR, in this case I will use 'MajorityLeaderAutoDowning'. This strategy will keep the partition with more reachable nodes. If the nodes number is the same in both partition (for example when running even number of nodes) the partition with lowest address is kept. Generally — it’s always a good idea to run odd number of nodes in distributed systems.
Let’s configure cluster management:
And the discovery itself:
The whole 'application.conf' file is on GitHub.
Let’s deploy our brand new application on minikube.
We have to start minikube:
minikube start
and set some environment variables pointing to minikube instead of our local docker daemon:
eval $(minikube docker-env) # on bash, or
eval (minikube docker-env) # on fish, my favourite shell
Let’s build docker container:
gkocur@gk ~/D/w/s/b/a/akka-simple-cluster-k8s (master)> sbt docker:publishLocal
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from /Users/gkocur/.sbt/1.0/plugins
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/gkocur/Desktop/work/sml/blog/akka-k8s/akka-simple-cluster-k8s/project
[info] Loading settings from build.sbt ...
...
[info] Successfully built c98c54dc9e52
[info] Successfully tagged softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT
[info] Built image softwaremill/akka-simple-cluster-k8s:0.0.1-SNAPSHOT
[success] Total time: 8 s, completed May 31, 2018 8:33:05 PM
We can now apply our deployment:
kubectl apply -f k8s/simple-cluster-deployment.yml
and the service:
kubectl apply -f k8s/simple-cluster-service.yml
The cluster needs few seconds to bootstrap. Let’s check the status:
KUBE_IP=$(minikube ip)
MANAGEMENT_PORT=$(kubectl get svc akka-simple-cluster -ojsonpath="{.spec.ports[?(@.name==\"management\")].nodePort}")
curl http://$KUBE_IP:$MANAGEMENT_PORT/cluster/members | jq
{
"selfNode": "akka.tcp://akka-simple-cluster@172.17.0.11:2551",
"leader": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
"oldest": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
"unreachable": [],
"members": [
{
"node": "akka.tcp://akka-simple-cluster@172.17.0.10:2551",
"nodeUid": "-1472917702",
"status": "Up",
"roles": [
"dc-default"
]
},
{
"node": "akka.tcp://akka-simple-cluster@172.17.0.8:2551",
"nodeUid": "-375470270",
"status": "Up",
"roles": [
"dc-default"
]
},
{
"node": "akka.tcp://akka-simple-cluster@172.17.0.11:2551",
"nodeUid": "1000843329",
"status": "Up",
"roles": [
"dc-default"
]
},
{
"node": "akka.tcp://akka-simple-cluster@172.17.0.9:2551",
"nodeUid": "1793471128",
"status": "Up",
"roles": [
"dc-default"
]
},
{
"node": "akka.tcp://akka-simple-cluster@172.17.0.7:2551",
"nodeUid": "83031023",
"status": "Up",
"roles": [
"dc-default"
]
}
]
}
Summary