Connecting...

W1siziisimnvbxbpbgvkx3rozw1lx2fzc2v0cy9zawduawz5lxrly2hub2xvz3kvanbnl2jhbm5lci1kzwzhdwx0lmpwzyjdxq

Why We Do Scala in Zalando

W1siziisijiwmtgvmdivmjivmdkvmzivmzyvmjyyl2rvd25sb2fklnbuzyjdlfsiccisinrodw1iiiwiotawedkwmfx1mdazzsjdxq

Leveraging the full power of a functional programming language

 

In Zalando Dublin, you will find that most engineering teams are writing their applications using Scala. We will try to explain why that is the case and the reasons we love Scala.

This content is coming both from my own experience and the team I'm working with in building the new Zalando Customer Data Platform.

 

How I came to use Scala

 

I have been working with JVM for the last 18 years. I find there is a lot of good work making the Java Virtual Machine very efficient and very fast, utilizing the underlying infrastructure well. 

I feel comfortable debugging complex issues, such identifying those caused by garbage collection, and improving our code to alleviate the pauses (see Martin Thompson’s blog post or Aleksey Shipilёv’s JVM Anatomy Park).

I liked Java. I didn’t mind the boilerplate code too much if it didn’t get in the way of expressing the intent of the code. However, what bugged me was the amount of code required to encourage immutability and not having lambdas to transform collections.

At the end of 2012, I had to design a service whose only mission was to back up files from customer mobile devices (think a cloud backup service). It was a simple enough service, accepting bytes from the customer device (using a REST API) and writing them to disk. We were using the Servlet API, and the system was working well. However, as the devices were mobile phones and the upload bandwidth wasn’t very high, the machines were mostly idle, waiting for the buffers to fill up. Unfortunately, we couldn't scale up. When the system reached a few hundred workers, it would start to quickly degrade due to excessive context switching.

We were using Netty in other components, but the programming model of callbacks wasn’t something I wanted to introduce, as it becomes very complex very quickly to compose the callbacks.

 

Introducing Scala

 

I had been looking at Scala for some years and started to look into async HTTP frameworks. I liked spray because it allowed us to use it at a high level or low level depending on our requirements. It also wasn’t a framework that forced us into adapting everything to it. I created a quick proof of concept and was amazed at the conciseness of the code and how efficient it was (spray is optimised for throughput, not latency), being able to handle thousands of concurrent uploads with a single core. Previously we were bound by CPU because of all the context switching, but with spray, we managed to overcome this and become limited mostly by IO.

From that point on I decided I wanted to learn Scala and Functional Programming. I finished the Coursera Functional Programming in Scala course, started writing my side projects in Scala and tried to find a position in a company that worked with Scala.

I evaluated other FP languages like Clojure, but I like strongly typed languages as my experience is that the systems written with them are easier to maintain in the long term. I also looked at Haskell, but I felt more confident with a JVM compiled language that could use all the existing Java libraries.

The first thing I fell in love with was Monad composition to define the program (or subprogram) as a series of stages composed in a for-comprehension. It is a very convenient way to model asynchronous computations using Future as a Monad (I know that Future is not strictly a Monad, but for our code point of view we can assume it is, see https://stackoverflow.com/questions/27454798/is-future-in-scala-a-monad)

We will see some examples below; including a snippet here:

for {
	customer <- findCustomer(address)
	segment <- getCustomerSegment(customer.id)
	email = promotionEmail(customer, segment)
	result <- sendEmail(address, email)
  } yield result

 

It becomes natural and straightforward to define the different stages of computation as a pipeline in a for-comprehension, layering your program in different components, each responsible for their steps inside a for-comprehension.

(We could also run them in parallel using an Applicative instead of a Monad)

 

Now in Zalando

 

Zalando is a big company, currently with over 1,900 engineers working here. As we have mentioned in previous blog posts, we are empowered to use the technologies we choose to build our systems, so the teams pick the language, libraries, components and tools. As you can see on our public Tech Radar, Scala is one of our core languages, with several Scala libraries like AKKA and Play!.

So I cannot say how Zalando teams are working in Scala. Some teams are deep into the Functional Programming side while others are using the language mostly as a “better Java”;  adopting lambdas, case classes and pattern matching to make the code more concise and understandable.

But I can talk about how people are using the language in the Dublin office where the services and data pipelines are written mostly in Scala: How our team that is developing the Customer Data Platform is using Scala, what libraries we are using and what we like about Scala.

 

Things we love about Scala

 

Types

We love types. Types help us understand what we are dealing with. String  or Int can often be meaningless; we don’t want to mix a Customer Password with a Customer Name,  Email Address, etc. We want to know what a given value is.

For this we are currently using two different approaches:

  • Tagged types: Using shapeless @@ we decorate the primitive type with the tag we want to attach.
  • Value classes: Using a single attribute case class that extends AnyVal, so that the compiler tries to remove the boxing/unboxing whenever it can. This is useful when we want to override toString for example we may want to redact sensitive customer data when it goes into logs.

Here you have a simple example of both (full code here ):

 

import java.util.UUID
import shapeless.tag, tag.@@
import cats.data.Validated, Validated._
 
object model {
    final case class Password private(value: String) extends AnyVal {
      override def toString = "***"
    }
 
  object Password {
	def create(s: String): Validated[String, Password] = s match {
  	case candidate if candidate.isEmpty || candidate.length < 8 => Invalid("Minimum password length has to be 8")
  	case valid => Valid(Password(valid))
	}
  }
  sealed trait UserIdTag
  type UserId = UUID @@ UserIdTag
  object UserId {
	def apply(v: UUID) = tag[UserIdTag](v)
  }
 
  sealed trait EmailAddressTag
 
  type EmailAddress = String @@ EmailAddressTag
 
  object EmailAddress {
	def apply(s: String): Validated[String, EmailAddress] = s match {
  	case invalid if !invalid.contains("@") => Invalid(s"$invalid is not a valid email address")
  	case valid => val tagged = tag[EmailAddressTag](valid); Valid(tagged)
	}
  }
}

 

Function Composition

 

Monads/applicatives

One of my favourite features of Scala is how easy and elegant it is to compose functions to create more complex ones. The most common way of doing this is using Monads inside a for-comprehension.

This way we can run several operations sequentially and obtain a result. As soon as any of the operations fail, the comprehension will exit with that failure.

 

def promotionEmail(customer: Customer, segment: CustomerSegment): Email = ???

def sendEmail(address: EmailAddress, message: Email): Future[Unit] = ???

def findCustomer(address: EmailAddress): Future[Customer] = ???

def getCustomerSegment(id: CustomerId): Future[CustomerSegment] = ???

def sendPromotionalEmail(address: EmailAddress)(implicit ec: ExecutionContext): Future[Unit] = {
  for {
	customer <- findCustomer(address)
	segment <- getCustomerSegment(customer.id)
	email = promotionEmail(customer, segment)
	result <- sendEmail(address, email)
  } yield result
}

 

For a full example see here.

If what you want to do is evaluate several functions in parallel and collect all the errors or the successful results, you can use an Applicative Functor. This is very common when doing validations of a complex entity, where we can present all the detected errors in one go to the client.

 

type ValidatedNel[A] = Validated[NonEmptyList[String], A]
final case class Customer(name: Name, email: EmailAddress, password: Password)
  object Customer {
	def apply(name: String, email: String, password: String): ValidatedNel[Customer] = {
  	Apply[ValidatedNel]
    	.map3(Name(name), EmailAddress(email), Password(password))(Customer.apply)
	}
  }

 

For a full example see here.

Another combination is a simple function composition using compose or andThen, or if the arguments don’t match completely, using anonymous functions to combine them.

 

final case class Customer(id: CustomerId, address: EmailAddress, name: Name)
val findCustomer: EmailAddress => Customer
val sendEmail: Customer => Either[String, Unit]
val sendCustomerEmail: EmailAddress => Either[String, Unit] = findCustomer andThen sendEmail

 

For a full example see here.

 

Referential Transparency

 

We like being able to reason about a computation by using the substitution model, i.e., in a referential transparent computation you can always substitute a function with parameters, with the result of executing the function with those parameters.

This simplifies enormously the understanding of a complex system by understanding the components (functions) that together compose the system.

 

def sq(x: Int): Int = x * x
assert(sq(5) == 5 * 5)

 

The previous example might be too basic, but I hope it suffices to make the point. You can always replace calling the sq function with the result, and there is no difference between both. This is also very helpful when testing your program. For more detail, you can look at the Wikipedia article here.

One of the most common issues that stops people using referential transparency, apart from global mutable state, is the ability to blow up the stack throwing exceptions. Throwing exceptions across the call stack can be seen as very powerful.  However, it makes it difficult to reason about your program when composing functions.

 

Monad Transformers

 

One of the caveats of using effects (effects are orthogonal to the type you get, for instance Future is the effect of asynchrony, Option is the effect of optionality, Iterableof repeatability…) is that they become extremely cumbersome when composing more than two operations, or when composing and nesting.

One of the most popular solutions is using a Monad Transformer that allows us to stack two Monads in one and use them as if they were a standard Monad. You can visit this blog post for more detail.

Other option that we are not going to explore in this post is to use extensible effects, you can visit eff for more detail.

The reader can try this without using a Monad Transformer to see how complex it becomes, even if only composing two functions.

 

import scala.concurrent.{ExecutionContext, Future}
import cats.data._
import cats.instances.future._
final case class Customer(id: CustomerId, email: EmailAddress, fullName: String)
def findCustomer(email: EmailAddress): EitherT[Future, Throwable, Customer] =
  EitherT[Future, Throwable, Customer](Future.successful(Right(Customer("id", "me@exampe.com", "John Doe"))))
def sendEmail(recipient: EmailAddress,
          	subject: String,
          	content: String): EitherT[Future, Throwable, Unit] =
  EitherT[Future, Throwable, Unit](Future.successful {
	println(s"Sending promotional email to $recipient, subject: '$subject', content: '$content'")
	Right(())
  })
def promotionSubject(fullName: String): String = s"Amazing promotion $fullName, only for you"
def promotionContent(fullName: String): String = s"Click this link for your personalised promotion $fullName..."
def sendPromotionEmailToCustomer(email: EmailAddress)(implicit ec: ExecutionContext): EitherT[Future, Throwable, Unit] = {
  for {
	customer <- findCustomer(email)
	subject = promotionSubject(customer.fullName)
	content = promotionContent(customer.fullName)
	result <- sendEmail(customer.email, subject, content)
  } yield result
}
import ExecutionContext.Implicits.global
sendPromotionEmailToCustomer("me@example.com")

 

For full example see here

 

Typeclasses

 

We like decoupling structure from behaviour. Some structures might encompass some business functionality, but we should not try as we did in OO to define everything we think a class can do as methods.

For this, we adopt the Typeclass pattern where we define laws for a given behaviour and then implement the particular functionality for a given class. Classic examples are Semigroup, Monoid, Applicative, Functor, Monad, etc.

As a simple example, we wanted to be able to serialise/deserialise data types on the wire. For this we defined two simple Typeclasses:

 

trait BytesEncoder[A] {
  def apply(a: A): Array[Byte]
}
trait BytesDecoder[A] {
  def apply(arr: Array[Byte]): Either[Throwable, A]
}

 

All we need to do is for every type we want to be able to serialise is to implement those interfaces. When we need to use an A serialiser, we will require an implicit BytesEncoder[A], and we will have to provide the implicit when we instantiate the user.

For example, we have a CustomerEntity, and we want to be able to write this to the wire using protobuf, we will then provide:

 

implicit def customerEntityProtoDecoder: BytesEncoder[CustomerEntity] = new BytesEncoder[CustomerEntity] {
	override def apply(ce: CustomerEntity) = ProtoMapper.toProto(ce).toByteArray
}

 

Folds/merges

 

We use extensively Either so we need to fold both cases to get a uniform response from them. We can use a fold to merge both cases into a common response to the client, for instance, a HttpResponse.

We also want to consolidate all the errors from a Validated[NonEmptyList[String],T]into a common response by folding the NoneEmptyList into a String or other adequate type.

 

Conclusion

 

As you can see we are very happy using Scala and finding our way into more advanced topics in the functional programming side of it. But there is nothing stopping you from implementing all these useful features step by step to get up to speed and experiment with the benefits of using an FP language.