Connecting...

W1siziisimnvbxbpbgvkx3rozw1lx2fzc2v0cy9zawduawz5lxrly2hub2xvz3kvanbnl2jhbm5lci1kzwzhdwx0lmpwzyjdxq

Functional DevOps with Scala and Kubernetes by Joan Goyeau

W1siziisijiwmtgvmdcvmtivmtuvmzyvmzkvoti3l3blegvscy1wag90by04mzk0njuuanblzyjdlfsiccisinrodw1iiiwiotawedkwmfx1mdazzsjdxq

Today's read is a great article written by Senior Scala Software Engineer Joan Goyeau called 'Functional DevOps with Scala and Kubernetes'. We hope you enjoy this read!

 

'Context

As a functional backend developer I’ve always been surprised by how DevOps stayed away of all the functional programming paradigm. Clearly the backend is leading in that domain and we’ve seen successful attempt to bring it in the frontend with Scala.JS, so why not DevOps?

Generally DevOps is about using tools that are configured via declarative configuration (YAML, XML…) and if there is a gap in the tooling we bridge it with some imperative scripting (Bash, Python…).

But what if we do everything with Scala? We could use libraries in place of tools, configured in a functional programming style (declarative) and the gap in libraries can also be written in Scala (that can later be extracted as libraries if generic enough). This sounds familiar to you? Indeed this is exactly what we are doing all the time in development, coding our solutions and using libraries to solve common problems avoiding to reinvent the wheel over and over.

So let’s try something new here and deploy our services with the same functional programming approach we have in our applications.

 

Use case

For the sake of simplicity, in this story we will boil down the architecture of the system we want to deploy to the minimum.

So our system will be composed simply of a Backend API and a Database:

Couple of things to note here:

  • The Backend API will be a stateless service accessible from the web.
  • The Database will be stateful and in this example let’s say that it’s Elasticsearch.

It is not important to know about the business use case here, except that this system will have to be delivered in quick iterations to production.

I guess the first question is where are we going to run our applications?

 

Kubernetes

 
Kubernetes is an open source system for managing containerized applications across multiple hosts; providing basic mechanisms for deployment, maintenance, and scaling of applications.
Source: https://github.com/kubernetes/kubernetes
 
This sounds like a good solution to run our applications without the need to care about the underlying hardware. Kubernetes gives us this abstraction where developers can just build containers of their applications and give them to a Kubernetes cluster which will take care of running them wherever there is compute power available.
 

We are not going to deep dive in how Kubernetes works here but it’s important as a developer to understand some high level concepts which are well documented on https://kubernetes.io/docs/concepts/.

Ok now you might wonder how are we going to deploy on Kubernetes?

 

Orkestra

 
Orkestra is an Open Source Continuous Integration / Continuous Deployment server as a library running on Kubernetes.
It leverages Kubernetes concepts such as Jobs or Secrets, and configuration as code in Scala to take the most of compile time type safety and compatibility with Scala or Java libraries.
Source: https://orkestra.tech
 
This sounds like what we’d like for our use case. The key features we are interested in being:
  • We are functional developers convinced by its advantages over unsafe Yaml, Python or Groovy code. So it’s perfect here since the configuration is in Scala.
  • It’s extendable with Scala/Java libraries.
  • It runs on Kubernetes, so we can reduce operational management by using a managed Kubernetes cluster such as GKE or EKS. Plus we can reuse the compute power left on a cluster for DevOps tasks.
  • It’s Highly Available, not that we are looking into 100% up time for an internal tool, but it let’s operational maintenance possible without impacting the usage of the tool during working hours.
  • It’s Fully scalable, so we can run many jobs in parallel if the underlying Kubernetes cluster is large enough.
 

Continuous Integration

Continuous Integration is fairly easy and there is a tones of tools out there to do it like Travis, CircleCI, GitLabCI, Jenkins and many more.

All we want is probably for each pull requests execute a full compilation, the tests, some linter (like Scalafix, Scalafmt) and maybe check that the Contributor License Agreement has been signed. In any case it should be a matter of few commands.

Of course Orkestra can run those checks on pull requests. Let’s have a look at the code. Remember that at anytime you can get more documentation about Orkestra on https://orkestra.tech.

We first need to define a job that executes the checks:

object PullRequestChecks {
  lazy val board = JobBoard[GitRef => Unit](JobId("pullRequestChecks"), "Pull Request Checks")(
    Input[GitRef]("Git ref")
  )

  lazy val job = Job(board) { implicit workDir => gitRef =>
    Github.statusUpdated(Repository("myOrganisation/backend"), gitRef) { implicit workDir =>
      sh("./sbt test")
    }
  }
}

This is the first time we define a job so let’s take a moment to understand what’s going on here step by step:

We define a board that will represent the UI. There is different types of boards but we will be using a JobBoard here.
• In the JobBoard we define the signature of the function this job will be executing, here GitRef => Unit.
• We need to give a unique ID and a nice name for the display.
• Lastly we give the UI elements for the parameters form. Here only a Input[GitRef] with a nice name for the display.

Then we define a job that will be responsible for executing the workload.
• A Job needs to reference the JobBoard so we pass it.
• Now the interesting stuff starts! We pass the function that will be executed when a PR is updated.
• Because this is DevOps we will be dealing often with files and directories. To do so in a referencially transparent way we have a context object of type Directory that knows in which directory we are and this is what workDir is about. Of course we don’t want to pass this parameter everywhere, so we make it implicit.
• gitRef is the parameter value given by the user through the parameter form in the board.
• Github.statusUpdated is an helper function that will let know Github when the test started and if it fails or not. It also takes care of checking out the git ref for you and moving into the Git directory.
• Lastly we execute a shell command where we run the tests.

Now that we have the job defined we need to create the Orkestra server, register the job to it and configure the GitHub Pull Request hooks:

object Orkestra extends OrkestraServer with GithubHooks {
  lazy val board = Folder("Orkestra")(PullRequestChecks.board)
  lazy val jobs = Set(PullRequestChecks.job)
  
  lazy val githubTriggers = Set(
    PullRequestTrigger(Repository("myOrganisation/myRepo"), PullRequestChecks.job)()
  )
}
  • We create the Orkestra server by just extending OrkestraServer, which forces us to implement board (the UI) and jobs (the triggerable jobs).
  • board will be a Folder of just the PullRequestChecks board for now.
  • jobs will contain only the PullRequestChecks job.
  • Then we need to configure the Pull Requests Github hooks.

Alright, let’s have a look at the UI and see how beautiful it is:

 

Continuous Integration was pretty simple but now that we have this so well checked code, we have to run it somewhere!

This arises some questions like:

  • What will we be deploying? (our applications, databases? …)
  • Where? (Kubernetes? AWS managed service? GCP managed service?…)
  • How? (Docker? Yum?…)
  • Configurations? (auto scaling?, secrets? environments?…)
  • Extra tasks? (running migrations, copy data between environments, switching load balancers…)

Here I don’t think that Travis, CircleCI, GitLabCI, Jenkins… are very good for anything outside CI since it’s only running a bunch of iterative commands on Git pushes, which is ok for simple tasks but as soon as you have to do a little more than that, it’s very hard.

The usual solution to this issue is to combine with other technologies like Ansible, Chef, Terraform… to handle hardware/software provisioning and the deployment logic which we would run from one of the CI mentioned above.

I don’t like much this solution since it introduces many technologies, therefore increasing complexity. Plus these technologies are very far from the functional programming we, functional developers, are used to.

Let’s answer some of the above questions for our example:

  • What? We know that we will have to deploy our backend and an Elasticsearch database.
  • Where? We said we’ll use Kubernetes to run all our services.
  • How? Since we will be using Kubernetes, it will be with Docker images.
  • Configuration? We might want multiple environments, for example to be able to test a new version without impacting the production users.
 

Environments

Before doing any deployment we probably need an environment where to deploy our services and as we just saw actually multiple environments:

  • Staging: where we will continuously deploy the master branch.
  • Production: where we will deploy versions we are sure they work on staging. Our end users will use this environment, so it needs to be up at all times!

On each of these environments the minimum requirements to deploy are:

  • A Kubernetes namespace
  • database

So let’s write a job that creates an environment:

object CreateEnvironment {
  lazy val board = JobBoard[String => Unit](JobId("createEnvironment"), "Create Environment")(
    Input[String]("Environment name")
  )

  lazy val job = Job(board) { implicit workDir => environmentName =>
    Await.result(for {
      _ <- Kubernetes.client.namespaces.createOrUpdate(Namespace(environmentName))
      _ <- Elasticsearch.deploy(environmentName)
    } yield (), 1.minute)
  }
}
  • Note that now the function we run is String => Unit, therefore we use a Input[String].
  • The core of the job now runs 2 Futures, one to create the namespace, the other for deploying our Database Elasticsearch and it awaits the results.
We need to register this new job. Here is the new version of Orkestra that we created earlier:
object Orkestra extends OrkestraServer with GithubHooks {
  lazy val board = Folder("Orkestra")(
    PullRequestChecks.board,
    CreateEnvironment.board
  )

  lazy val jobs = Set(PullRequestChecks.job, CreateEnvironment.job)
  
  lazy val githubTriggers = Set(
    PullRequestTrigger(Repository("myOrganisation/myRepo"), PullRequestChecks.job)()
  )
}

 

Continuous Deployment

We created our environments, now we need to deploy our backend artifacts on these environments. Let’s write this job:
object DeployBackend {
  lazy val board = JobBoard[(String, String) => Unit](JobId("deployBackend"), "Deploy Backend")(
    Input[String]("Version"),
    Input[String]("Environment name")
  )

  lazy val job = Job(board) { implicit workDir => (version, environmentName) =>
    Await.result(DeployBackend(version, environmentName), 1.minute)
  }
}

We probably need also a job that publishes the artifact:


object PublishBackend {
  lazy val board = JobBoard[(GitRef, Boolean) => String](JobId("publishBackend"), "Publish Backend")(
    Input[GitRef]("Git ref"),
    Checkbox("Run checks")
  )

  lazy val job = Job(board) { implicit workDir => (gitRef, runChecks) =>
    Await.result(PublishBackend(gitRef, runChecks), 1.hour)
  }
}

Let’s also create a job that publishes and deploys straight away, so that if we want to deploy from source code we just have to do one action:

object PublishAndDeploy {
  lazy val board =
    JobBoard[(GitRef, Boolean, String) => Unit](
      JobId("publishAndDeployBackend"),
      "Publish and Deploy Backend"
    )(
      Input[GitRef]("Git ref"),
      Checkbox("Run checks", checked = true),
      Input[String]("Environment name")
    )

  lazy val job = Job(board) { implicit workDir => (gitRef, runChecks, environmentName) =>
    Await.result(for {
      version <- PublishBackend.job.run(gitRef, runChecks)
      _ <- DeployBackend.job.run(version, environmentName)
    } yield (), 1.hour)
  }
}

Again we need to update the Orkestra object to register our new jobs. At the same time we can add a GitHub hook for the automated deployment of the master branch to staging:

object Orkestra extends OrkestraServer with GithubHooks {
  lazy val board = Folder("Orkestra")(
    PullRequestChecks.board,
    CreateEnvironment.board,
    PublishBackend.board,
    DeployBackend.board,
    PublishAndDeploy.board
  )

  lazy val jobs = Set(
    PullRequestChecks.job,
    CreateEnvironment.job,
    PublishBackend.job,
    DeployBackend.job,
    PublishAndDeploy.job
  )
  
  lazy val githubTriggers = Set(
    PullRequestTrigger(Repository("myOrganisation/myRepo"), PullRequestChecks.job)(),
    BranchTrigger(Repository("myOrganisation/myRepo"), "master", PublishAndDeploy.job)("staging")
  )
}
  • See that we are using a BranchTrigger to trigger the PublishAndDeploy job as soon as the master branch is updated.

Great, we now have a minimum viable CI/CD!

 

Extra jobs

This architecture already created us some repetitive tasks we will have to do from time to time.

One of them is copying a subset of data from the production environment to the staging environment. We probably want to override all staging data with fresh new real world data coming from production regularly, like every week.

Let’s write this copy data job:

object CopyData {
  lazy val board = JobBoard[(String, String) => Unit](JobId("copyData"), "Copy Data")(
    Input[String]("Source environment"),
    Input[String]("Destination environment")
  )

  lazy val job = Job(board) { implicit workDir => (source, destination) =>
    Await.result(copyData(source, destination), 1.minute)
  }
}

The good part of writing this job in Scala is that if your backend is written in a JVM language (Java, Scala, Groovy…) you can publish a jar of it, make the Orkestra project depend on it and call the functions directly. In this example we’ll assume that copyData() is a function defined in the backend code.

Again don’t forget to register the job and the board. And we will be adding a Cron trigger so that this job is ran every Monday at 5am (outside of developers working hours) so that we start the week on fresh data:

object Orkestra extends OrkestraServer with GithubHooks with CronTriggers {
  lazy val board = Folder("Orkestra")(
    PullRequestChecks.board,
    CreateEnvironment.board,
    PublishBackend.board,
    DeployBackend.board,
    PublishAndDeploy.board,
    CopyData.board
  )

  lazy val jobs = Set(
    PullRequestChecks.job,
    CreateEnvironment.job,
    PublishBackend.job,
    DeployBackend.job,
    PublishAndDeploy.job,
    CopyData.job
  )

  lazy val githubTriggers = Set(
    PullRequestTrigger(Repository("myOrganisation/myRepo"), PullRequestChecks.job)(),
    BranchTrigger(Repository("myOrganisation/myRepo"), "master", PublishAndDeploy.job)("staging")
  )

  lazy val cronTriggers = Set(
    CronTrigger("0 5 * * 1", CopyData.job)("prod", "staging")
  )
}

 

Slack integration

You might now be like, this thing is cool (or not?) but I’m used to have plenty of plugins on other CI and to name only one: a Slack integration that send messages when a deployment has been done on staging or prod, so that everyone is aware in the team.

Remember that we are writing vanilla Scala here, therefore we can depend on any library. More specifically we have access to all Maven central. Do you know how many libraries there is in Maven central? There must be someone who wrote a Slack client in Scala!

Bingo: https://github.com/gilbertw1/slack-scala-client
I will not go through the usage of this library here but I think it’s well documented on the repo.

 

Conclusion

This story showed on a reduced example how we can use functional programming for a DevOps purpose, with all the benefits it brings.

We put in place a CI to check our Pull Requests, created jobs to create and deploy our architecture and even created some jobs for miscellaneous tasks. All of that has been done with only one technology: Scala/Orkestra, nevertheless the possibilities are endless since we can use libraries from the Scala and Java ecosystem.

Don’t forget to check out the full code on Github, comment, fork and improve. The documentation of Orkestra and feel free to follow me on Twitter. 

I’d also like to make a special thanks to DriveTribe who started this whole project and are now fully using Orkestra.'

 

This article was written by Joan Goyeau and posted originally on Medium.com