Art Artistic Background 459799

38 lines of code towards better data validation in Scala by Jakub Dzikowski

Art Artistic Background 459799

How can you achieve better code maintainability?

Senior Fullstack Developer, Jakub Dzikowski needed to know the answer to this question so he turned to a simple solution of 38-lines-of-code micro-library.  How did this help Jakub? Find out in his article!


'Complex domains require complex data validation. In this article I will show the problem with the validation over nested monads, when the code explodes with boilerplate and it is difficult to separate different levels of abstraction. The simple solution — 38-lines-of-code micro-library, is also a good example of using monad transformers, type classes and tagless final in practice.



The problem

Let’s assume we have a standard architecture with repositories and services. We want to save a 'User':

The implementation is quite simple. For the sake of brevity I use 'Future' from the Scala default library. There are obviously better monads to do it, but 'Future' is probably known by everyone who develops Scala applications.

Suddenly, a new requirement appears. We should be able to save a user only when there is no user with the same name. We need to introduce some validation and validation errors. Our code evolves:

As you probably know, exceptions should be reserved for exceptional situations. That’s why we use 'Either' or 'Try' monad (I prefer the former one). Besides, we have 'sealed trait ValidationError' and appropriate case classes that extend it to represent validation errors.

However, the needs of our client are not fulfilled. It seems that the application has some requirements regarding the users’ age. Our client has external service that determines whether the user is old enough in given country. Oh, and the name should have at least 2 characters.

Below is the result of the naïve approach:

This kind of code is very difficult to reason about, to extend and to test. Everything is mixed up: validation rules, database and external service calls. No responsibility separation.

We can try to make some refactoring. Implement helper methods that will hide parts of the low-level logic. But even if the code looks cleaner, it won’t be simpler. We will not avoid nesting the validation logic.

  • We need to validate name length first. There is no need to call the database if the name is invalid. Thus, we cannot get rid of the embracing 'if'.
  • There is no need to call 'ageService' if the user already exists. Validation will fail anyway.
When another requirement comes up, a lot of time will be spent on the analysis of the current logic and the resulting code will be even more complicated. We need to somehow untangle this code and make it simple. It is possible.

Nested monads

As you may have already noticed, the validation in our example is sequential. There is only one path:

  1. If the name is valid, check if the user does not exist.
  2. If the user does not exist, check the age.
  3. If the age is fine, put the user to database.

Can we make use of 'Either''s monadic nature to perform the validation as a sequence of steps? 'Either' is right-biased, so every '.flatMap' will transform only the 'Right' value.

Well, we cannot do it. We need to handle another level of monad-like types and cope with nested monad types: 'Future[Either[F, S]]' or alternatively 'Either[Future[F], Future[S]]'. Side note: obviously 'Future' is not a monad — or it is a monad only in some circumstances — but it is convenient in this example.

Using nested monads for each validation is a step towards the right direction, however in this case you need to unpack the monads to perform some operations on the data.Can we make use of Either's monadic nature to perform the validation as a sequence of steps? Either is right-biased, so every .flatMap will transform only the Right value.

Well, we cannot do it. We need to handle another level of monad-like types and cope with nested monad types: Future[Either[F, S]] or alternatively Either[Future[F], Future[S]]. Side note: obviouslyFuture is not a monad — or it is a monad only in some circumstances — but it is convenient in this example.

Using nested monads for each validation is a step towards the right direction, however in this case you need to unpack the monads to perform some operations on the data.

Not so easy. Lots of boilerplate or unintuitive helper functions. You can choose one. The best solution I was able to produce is in the repository with sources for this blogpost ('UserServiceBetterLegacy.scala' file). It is far from being optimal.

This is a good time to expose the problem we have. We need to handle two monadic types simultaneously and it is difficult. This is quite a generic problem, so:

  1. It is a good idea to extract an additional abstraction to solve it (helper function, object, lib or whatever).
  2. Probably, the Scala community already took care of it and there is a ready to use solution. Well, obviously there is, but let’s pretend there isn’t (just for a while).

Spoiler: Nested monads bound together should behave like a single monad.

The solution

Before I present the 38 lines of code that was mentioned in the title, I will show you the part of the 'UserService' after its application.

Here it is:

And now:

  • We have a clear separation of validation and action in the 'saveUser' method.
  • Validation is performed as a sequence of steps (it is easy to reason about the business logic).
  • Each validation step has a separate method (single responsibility).

The validation helper methods build 'ValidationResult' objects that encompass two kinds of types: the 'Either' monad (right-biased) and the 'Future'. The 'ValidationResult' type behaves like a monad as well, so we can use it in for comprehension to build the final validation result. That kind of type that handles two types of monad and behaves itself like a monad is called a monad transformer.

When the validation result is build, we can call the 'onSuccess' method to perform the final action (in this case saving the user to the repository).


The 38 lines of code

It is a large chunk of code to put in the gist, but I want to show you everything in one place. Below you can find the whole implementation of data validation with monad transformers. It relies on Cats and it is not bound to any special kind of monad (we use it for example over Slick’s 'DBIOAction'). Just have a quick look at the code and after the listing we will go through the details.

Feel free to copy the gist and use it in your project if you like.

The 38 lines of code contain a micro-library for data validation on top of 'Either'. We use it quite heavily in the project with 60k+ lines of Scala code. The only difference is in 'ValidationResult' and 'ValidationResultOps' — we have some different signatures and some additional functions. The version above contains only the code that is required in the example presented in the article.

There is no need to publish it as a separate library. If you decide to use it, sooner or later you may want to extend it with custom wrappers or helpers.

In order to use the 'ValidationResultLib', you need to create an object (let’s call it 'Validation') that extends it and provide an implicit instance of 'Monad[M]' for the given type of monad. Then, just import 'Validation._' and you are ready to go.

For the purposes of the code examples I have prepared a 'FuturesValidation' object:

As you can see the 'Monad' type comes from the Cats library. Cats provides also a standard instance of the 'Monad[Future]' implicit, so the implementation is very simple.

In the project I have recently worked on we had a package object and we performed the validation over Slick 'DBIOAction' monads:

In this case 'DBRead' is an alias for 'DBIOAction[_, NoStream, Read]'. Thus, we are sure (on the type level) that the validation will include only the database reads.

As you can see, we had to implement our own instance of the 'Monad[M]', because we had used custom 'M' monad. And implicit 'Monad[M]' in scope is required to perform monadic operations on 'ValidationResult' and 'EitherT'. This is how it works in the Cats library.



Unsurprisingly, the 'ValidationResult' is mainly a wrapper on the 'EitherT' monad transformer. However, there are two main advantages of using it instead of the pure 'EitherT'.
  • The language is more specific — the type name directly tells us that we are handling validation.
  • The monad type is hidden — in case of the 'EitherT' we need to provide the monad type when the compiler’s type inference gets lost. For the 'ValidationResult' we have a specific monad type — provided in the implementation of the 'ValidationResultLib'.

'EitherT' itself provides some useful methods for handling the 'Either' values nested in the other monad (let’s say 'M[Either]'). However, most of the operations on 'EitherT' require some implicit argument — it might be 'Functor[M]', 'Applicative[M]' or 'Monad[M]', depending on the function, and it defines basic operations that might be performed on 'M'.

Basically, each monad is both an applicative and a functor. That’s why in the implementation of 'ValidationResultLib' we need to provide implicit value (or 'def') of Cats’ 'Monad[M] '— it simply covers all the cases.

'Monad[M]' requires only three functions: 'pure', 'flatMap' and 'tailRecM'. The former two functions are sufficient to define a monad. Cats requires 'tailRecM' as well, but this is a design decision unrelated to the definition of a monad. (Theoretically there are other subsets of function that are sufficient to create a monad, however I don’t want to go into the details here, so I will just recommend chapter 11 from Functional Programming in Scala or maybe this article.)

You may not be surprised that this kind of using implicit parameters in operations is a known design pattern in functional programming. Say hello to type classes.


Type classes

Type classes are a way of an ad hoc polymorphism or, to put it simple, a way of adding behavior that fits to some kind of API. They consist of three components (this is a quote from Functional Programming, Simplified):
  1. The type class, which is defined as a trait that takes at least one generic parameter (a generic “type”).
  2. Instances of the type class for types you want to extend.
  3. Interface methods you expose to users of your new API.

Have a look at the code:

Type classes are used heavily in the Cats library. When the 'Monad[Future]' is defined, Cats can handle operations over 'Future' because it is adapted to 'Monad' trait and has functions that are required for a monad.

The 'ValidationResultLib' uses similar approach. Inside the micro-library you don’t care what kind of monad is used with 'Either'. You just want to perform monadic operations on it.

This kind of abstracting over the monad type is also a reification of another pattern, known in functional programming — tagless final.


Tagless final

The main goal of tagless final is parametrizing the monad type we handle (as a generic parameter). This pattern consists of three elements as well (find more details here):

  1. The initial instruction set (a.k.a. algebra/language/DSL) which defines the set of operations we can perform.
  2. The description of the solution, when we use the operations defined in the first point.
  3. The interpreter, which implements initial instruction set for particular monad.

Classic examples of tagless final are usually made on repositories and services. For example, we have a 'UserRepository[M[_]]' and we can call 'def findUser(id: String): M[Option[User]]'. The monad type is unknown both for the repository and the service that uses it.

In the 'ValidationResultLib[M[_]]' we have quite a different situation, however the type parameter suggests that it might be a tagless final pattern as well. Let’s check:

Yes, it seems we have a match. Our initial operation set is just a set of functions that are required to perform monadic operations: 'pure', 'flatMap', and 'tailRecM'. The whole library code inside the 'ValidationResultLib' trait is the description of the solution, when we use initial operation set to extend our language. Finally the 'FuturesValidation' object is the interpreter, where we implement our initial set operation for the given monad.

There are two main advantages of using this approach in the 'ValidationResultLib'. I have written about it above, but now we can use different language with the context of tagless final:

  • You can copy the code as-is and implement your own interpreter for given type of monad.
  • Because the monad type is fixed at trait-level, you don’t need to parametrize methods with the monad type (when the type inference fails).

Using tagless final has some advantages, mainly because of better abstraction isolation. The important note here is that I am using this pattern in the micro-library. I am not using it in the code of services, repositories, etc.


Conclusions and final remarks

In this post I showed you 38 lines of code towards better data validation in Scala. It helped me a lot with the project I was recently working on. A lot of boilerplate and spaghetti-logic with leaky abstraction was gone. However, this kind of approach, even if it leads to better code maintainability, has some drawbacks.

The first case is about using 'Either' and similar types for validation. This is a fail fast approach. When the first check fails, all validation will be terminated with the first error. It is useful in the cases when it prevents from unnecessary calls to database or external services. However, there are some situations when you want to know all validation errors, not only the first one. In this case Cats provides 'Validated' applicative as an alternative to 'Either' monad.

There is also a problem related with the 'ExecutionContext' that needs to be passed for almost all operations on many monadic types, for example on the 'Future' (see this thread on Reddit). That’s why our implicit instance of 'Monad[Future]' is created on the fly: 'implicit def monad(implicit ec: ExecutionContext)'. We don’t have a value for 'Monad[Future] '— we create new instances, because we need the 'ExecutionContext' which is, well, contextual. The overhead is probably very small and acceptable in most cases, however in some circumstances it may lead to performance issues.

Besides, the performance issues might come (again — in some circumstances) from using monad transformers. This is related to the broader problem, that the JVM itself is not well designed for some patterns in functional programming. You may read more about this topic in the article about monad transformers by John A De Goes.

So, there is a trade-off. Significant improvement of code readability and maintainability vs slight performance issues that might occur in some circumstances. Choose wisely.'


This article was written by Jakub Dzikowski and posted originally on SoftwareMill Blog.