Connecting...

W1siziisimnvbxbpbgvkx3rozw1lx2fzc2v0cy9zawduawz5lxrly2hub2xvz3kvanbnl2jhbm5lci1kzwzhdwx0lmpwzyjdxq

Algebraic Data Types in four languages by Marcin Baraniecki

W1siziisijiwmtgvmtivmtevmtavntkvmtmvmjgyl3blegvscy1wag90by0ymzk4otguanblzyjdlfsiccisinrodw1iiiwiotawedkwmfx1mdazzsjdxq

When using algebraic data types which different programming languages do you use? In this article by Marcin Baraniecki from SoftwareMill, he looks at comparing how the concept of “sum” algebraic data types is supported by four languages he uses, Haskell, Scala, Rust and TypeScript. 

What would be your programming language?

 

'Algebraic Data Types, or ADTs for short, are powerful in representing the data model in a concise and transparent way. In most cases, they are defined as a closed set of possible values under one, common interface. As an example, let’s consider a “pet” type in some pseudocode:

type Pet = Dog or Cat or Hamster

doggo: Pet = Dog
kitty: Pet = Cat

It is clear, that a pet can be of at most one species at a time, although one might have as many house animals as needed. The 'Pet' type can act as an enumeration here, containing a finite set of possible members. This makes algebraic data types a clever way of describing the entities in the application. The “algebra” here is a “sum” (alternative) of the type’s members — either a 'Dog', a 'Cat' or a 'Hamster'.

In this blog post, I’m going to compare how the concept of “sum” algebraic data types is supported by four languages I use, namely: Haskell, Scala, Rust, and TypeScript. We’ll see how concise the syntax of these can be, and how much of the boilerplate code can be eliminated. Let’s go!

 

Haskell

Leaving pets aside, probably the most common example of algebraic data type in Haskell is a 'Tree', which describes elements of a tree-like data structure. A 'Tree' elements might come in one of two forms:

  • a node, which contains a value, together with its left and right “descendants” (or nested tree elements)
  • an empty “leaf”, which stands for a “tip” of the tree at a given “branch”
A careful reader might spot, that this data type is a bit more complex than our previous 'Pet' example — this is because a node might recursively contain elements belonging to the 'Tree' type! Let’s take a look at the code:
1 data Tree a = Empty | Node a (Tree a) (Tree a) deriving (Show)
2
3 main = let
4   tree = Node 42 (Node 0 Empty Empty) Empty
5   in print tree -- prints Node 42 (Node 0 Empty Empty) Empty

Don’t worry if there’s too much going on (compared to the previous pseudocode 'Pet' example) in the code now — we’ll go over every bit now.

The 'Tree' is declared using a 'data' keyword in line 1. An 'a' letter after the 'Tree' stands for generic parameter declaration — literally meaning that our tree structure can hold values of any (fixed) type — whether these are integers, strings, booleans — even pets.

Next come the “members” of the type — an 'Empty' (tip of the tree) and a 'Node', able to hold a value of the generic 'a' type and two descendants. Both “members” are separated with a pipe ('|') character, which should be read as an “or”.

The 'deriving (Show)' clause at the very end means that every instance of the 'Tree' data type can be converted to a string (and printed out to the console). We don’t need to manually implement any methods (like 'toString') — a default behavior is derived automatically. It’s a cool feature of Haskell, that lets us quickly equip our data types with a set of very common operations — we could as well derive an 'Eq' for equality checks between many tree elements.

Now, take a look at the first line of the code again and notice, that Haskell makes a distinction between a type constructor and data constructor(s):

  • a type constructor is declared to the left of the equals sign, namely 'Tree a'
  • data constructors live to the right of the equals sign — 'Empty' and 'Node a (Tree a) (Tree a)' in that case
In the 'main' function, a simple tree instance is constructed. Graphically, it could be represented like this:
 
 

A tree starts at a 'Node' holding 42 as a value, with an 'Empty' right descendant, and a left descendant being a 'Node' holding 0 and two 'Empty' descendants. The empty branches could as well be omitted in the picture, but I intentionally left them here (blank) to make the structure clearly resemble the tree.

We’ll stick with the tree algebraic data type now, which I’ll show how to implement in three more languages. Let’s move on to…

 

Scala

Scala borrows many concepts from Haskell. Surprisingly though, the syntax of defining ADTs is not so obvious! Observe:

1 sealed trait Tree[+A]
2 case object Empty extends Tree[Nothing]
3 final case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
4
5 val tree = Node(42, Node(0, Empty, Empty), Empty)
6 print(tree) // prints Node(42,Node(0,Empty,Empty),Empty)

We start with a 'trait', which is close to what you might know as an “interface” from other languages. Once again, there’s a generic type parameter (in square brackets), but by the language convention, it is uppercased. There’s also a plus sign next to the 'A' parameter, but its’ purpose is beyond the scope of this article, so just trust me that it must be there. Because we defined our type constructor using 'trait' it needs to be 'sealed', so that no code from the “outside” can extend our algebra.

Next comes a 'case object Empty'. This is Scala’s way of representing “singleton” members of algebraic data types — they don’t receive any parameters from the outside — every instance of such member type will always be exactly the same. Because it doesn’t make sense to parameterize 'Empty', a reference to generic 'A' would also be pointless here (but we still need to declare that it belongs to a 'Tree' type), hence 'extends Tree[Nothing]'.

The 'Node', as we already know, is capable of holding a value. We can’t use a 'case object' here, hence a 'case class'! Ah, case classes. I wish I had them in every language — not only they are a clear & concise way of describing the ADT members, but they also come with a handful of boilerplate methods implemented automatically — 'toString', 'equals' and 'copy', to name a few.

Contrary to Haskell, both 'Node' and 'Empty' are “standalone” types in Scala. It is a perfectly valid case to declare eg. a 'List[Node[Int]' (a list of nodes containing integers), while in Haskell this would not be possible — one could only declare a '[Tree Int]' (list of trees of integers) instead.

 

Rust

Rust is this relatively new kid on the block. It is surprisingly straightforward when it comes to defining data types and structures. There are only three constructs at our disposal: 'trait' (resembling Haskell’s typeclasses), 'struct' and 'enum'. And with that last one, we’ll create our well-known 'Tree':

1 #[derive(Debug)]
2 enum Tree<T> {
3     Empty,
4     Node(T, Box<Tree<T>>, Box<Tree<T>>)
5 }
6
7 fn main() {
8     let tree = Tree::Node(
9         42,
10         Box::new(Tree::Node(
11             0,
12             Box::new(Tree::Empty),
13             Box::new(Tree::Empty)
14         )),
15         Box::new(Tree::Empty));
16   
17     println!("{:?}", tree); // prints Node(42, Node(0, Empty, Empty), Empty)
18 }

The ADT definition spans from line 1 through 5. The 'derive' clause is very much similar to Haskell’s 'derives' keyword, although it comes with more annotation-like flavor here (it’s called attribute in Rust). 'Debug' stands for the ability to be printed to the console. We could as well derive 'PartialEq', 'Eq' or 'Ord' (for ordering).

As said earlier, algebraic data types kind of enumerate their possible members. That’s why an 'enum' is a perfect fit here! It’s parametric too — by convention, in Rust we usually start with a letter 'T' to denote generic parameter.

A well-known 'Empty' member is defined in line 3. It doesn’t take any type parameters — no surprise here. A 'Node' is defined similarly to what we just saw in Haskell’s and Scala’s examples, although the use of 'Box' wrapper might be confusing. Rust is a very special language when it comes to the memory management model — it doesn’t use a garbage collector, but the compiler makes sure that there will be no memory issues, making the language memory-safe and free of GC pauses at the same time. How great is that? But with great power comes great responsibility, so a 'Box' here must be explicitly used to ask for a heap allocation — we won’t go into details here, but trust me that without the 'Box' this code would not compile, with a compiler complaining about possible memory leaks.

The 'enum''s syntax here is so clear, that it seems almost like it’s there for ADTs. I personally like it better than Scala’s way, mostly because it’s almost as simple as in Haskell. It’s only a matter of getting used to memory-safety rules of Rust — although this would not be a problem if only our algebra wasn’t recursive ('Pet' would never have to use 'Box'es!).

 

TypeScript

TypeScript, a statically typed superset of very popular JavaScript is gaining in popularity every day. In fact, most of the new libraries and frameworks are already created with static typing in mind, and I would even risk saying that it’s foolish to start a new project without TypeScript these days.

The type system is quite advanced. The algebraic data types can be created using so-called discriminated union types:

1 type Tree<T> = Empty | Node<T>;
2
3 class Node<T> {
4     constructor(
5         public value: T,
6         public left: Tree<T>,
7         public right: Tree<T>
8     ) {}
9
10     public toString() {
11         return `Node(${this.value}, ${this.left.toString()}, ${this.right.toString()})`;
12     }
13 }
14
15 class Empty {
16     public toString() {
17         return 'Empty';
18     }
19 }
20
21 const tree = new Node(
22     42,
23     new Node(
24         0,
25         new Empty(),
26         new Empty()
27     ),
28     new Empty()
29 )
30
31 console.log(tree.toString()); // prints Node(42, Node(0, Empty, Empty), Empty)

The union is created in the very first line of the gist, using the 'type' keyword, which in other circumstances can also be used to create a type alias. Notice that the 'Tree<T>' here is, again, a generic type constructor. Next, two data constructors are enumerated — well-known 'Empty' and 'Node<T>'.

This is a huge step forward in the JavaScript ecosystem, which could be achieved only with a static type system. A careful reader will spot the difference between that example and the previous three right away — this is the longest piece of code so far! While the pure definition of the ADT is short and concise, we had to explicitly define data constructors in the form of classes (just look how verbose it is, compared to Rust or Haskell). Unfortunately, we can’t get all the goodies of 'case class'es (like Scala’s), nor can we 'derive' the default implementation for common methods automatically. The example above includes a manually implemented 'toString' methods, which gets tedious as our algebras grow in size. I really hope that in the near future TypeScript might enable some kind of 'case class'-like syntax, too.

 

Summary

You might have been using algebraic data types already without knowing it. While the concept of ADTs is quite simple, it gave me a fresh insight at the type definitions from a slightly different perspective. Enumerating possible type “members” in a union leads to much more clear definitions and thinking about the entities in terms of well-formed groups.

If I were to decide on my favorite approach to ADTs, I would pick Haskell. Next — Rust, for its’ approach which is almost as simple as Haskell’s. Scala would be the third, for all the goodies of case classes. Last, but not least — TypeScript. It’s still a bit too verbose for “advanced” types like a 'Tree', but is a huge step towards a concise type system.'