I’ve been working as a software engineer for sixteen years, and in those years I’ve been to countless of job interviews, but I’m still dumbfounded by how this is mostly done by companies. The majority of interviews for engineering positions focus on algorithms and data structures. Candidates must answer questions like how to find the longest subarray of an array with a given sum, what is the complexity of their solution, how can it be improved etc., usually at a whiteboard. Whiteboarding sessions are meant to reveal computer science proficiency, understanding of subjects like the Big O notation, but also generic problem solving skills. They are also meant to give interviewers a feel of how it is to collaborate and work together with the candidate.
All that for jobs that are then mainly only about implementing — day after day — API endpoints, which do nothing more than execute simple database queries, sort the results, and return these in JSON format, maybe thousand times a second. The most computationally complex step is probably the sorting part, and even that looks usually something like
And this is understandable. A vast majority of these companies are building some kind of platform, that automates simple tasks that were previously done by two people calling each other on the phone to exchange data, or by a human representative taking data from customers. Platforms that automate moving of data from one place to another to store it there. This has been the everyday reality of automation so far. Of course, there are more and more data science and machine learning related positions lately, but an average software developer still has a far greater chance of being hired to implement yet another API, and maybe, at most, tackle challenges of distributing API calls across a cluster. Most developers don’t really write too much code that actually computes something, but only that wires various components together to move data between them.
Hiring a software developer can be very expensive. So is testing for computer science proficiency really the best use of that money? And the best way to build an efficient developer team to do the above? Is knowing how to balance a binary search tree really the key engineering skill that an average business’ success depends on? And if not, what is it then?
If a company hires software engineers, there’s a good chance that critical parts of their business exist in the form of computer code. Therefore the growth of that business is closely linked to the maintainability of that code. Businesses adapt and grow constantly, they perpetually change. New features and new offerings to customers are added every day. Existing features are revisited, strategies are revised, whole business models are pivoted.
To manage all this change, companies tend to focus on processes, practices, project management techniques and development methodologies, organizational considerations and team structures, and also on system design and architectural patterns. While getting all these right is fundamental to smooth growth, neither of these deal with the fact that as the business is encoded as computer code, the codebase itself is also in the state of constant flux. The codebase, as the rest of the company, must also be in such a shape that it can handle and take constant change, without any disruption to the business.
Moreover, there are actual methodologies that build on this idea of perpetual incremental improvement. Agile and lean methodologies require codebases that can manage and deal with change from day one, from the first line of code. That are written for change.
A company’s most expensive employees will not spend most of their time with implementing clever algorithms to solve tricky problems, but — beside sorting a collection with that one-liner — with scratching their head over how to keep the codebase’s resistance to change low. Therefore being good at that, and having skills that enable writing maintainable code — code, that a constantly growing and fluctuating group of people can work on together at once, to grow it and to change it, to implement new features and revise existing ones, without disruptions — seem more critical, than a good understanding of the Floyd–Warshall algorithm or distributed systems with byzantine fault tolerance.
But what exactly are these skills? What does maintainable code exactly mean, that a software developer candidate needs to have expertise in? How can be a codebase’s resistance to change kept low, so that more and more features can be added without interruptions, and more and more developers can be hired to add new features even faster, and less and less full rewrites are needed, every couple of years?
Containment of change
We all know the saying that nine women can’t make a baby in a month, but what is it exactly that prevents doubling delivery velocity by doubling the team size? If the team grows together with the codebase, then the amount of code per developer remains constant, and each developer needs to own, understand and maintain to same number of lines of code. But the sheer volume of code is not what determines its resistance to change. The same way as a growing team poses a project management challenge of increasing interdependency of tasks, or a product development challenge of increasing interdependency of features, a growing codebase increases interdependency of code parts. Interdependency means that some specific part of the code cannot be modified without affecting others too. And that requires more and more coordination, as both the team and the codebase grows.
Another way of looking at this is that change spreads and multiplies along these interdependencies, so to contain change, boundaries are needed. Similarly to how compartmentalization is a way to contain failure too. But compartmentalization means something slightly different for writing code, than for running programs. The boundaries are not just at any random place to break up something larger into smaller pieces, but they separate abstraction layers. Compartmentalization of code is achieved through modularity and compositionality: organizing the codebase in such a way, that the whole is composed of smaller, reusable building blocks (which are further composed of even smaller ones, and so on), and this composition isn’t ad-hoc, but follows well-defined rules. Therefore to understand and to reason about the whole, one must only understand the building blocks and the rules of composition.
This approach turns codebases from networks of arbitrary interdependencies between specialized code parts into sets of reusable building blocks, that can be systematically composed into larger blocks. And by systematically we mean that following rules or laws. A set of building blocks with composition laws is a domain language. The key to containing change and thus reducing a codebase’s resistance to it, is to write code that is the theory of a problem domain, instead of being a solution to a specific problem. A theory of a problem domain is basically the most generic domain language, that is sound, and still only allows correct combinations of the building blocks. This way, as a business grows and evolves, new problems and new solutions can be described and expressed by the same language, and thus easily incorporated into the codebase.
Here’s what the famous computer scientist, Barbara Liskov has to say about this:
Modularity based on abstraction is the way things get done
Or what Structure and Interpretation of Computer Programs, a book by Harold Abelson and Gerald Jay Sussman say:
The importance of this decomposition strategy is not simply that one is dividing the program into parts. […] Rather, it is crucial that each procedure accomplishes an identifiable task that can be used as a module in defining other procedures. […] [N]ot quite a procedure but rather an abstraction of a procedure, a so-called procedural abstraction. […] The users of the procedure may not have written the procedure themselves, but may have obtained it from another programmer as a black box. A user should not need to know how the procedure is implemented in order to use it.
Or what Rúnar Bjarnason, in his Functional Programming is Terrible talk at the Northeast Scala Symposium:
Compositionality is the property, that your software can be understood as a whole, by understanding its parts, and the rules governing the composition.
Modularity is the property, that your software can be separated into its constituent parts, and those parts can be reused independently of the whole, in novel ways that you did not anticipate, when you created those parts.
This is what software developer candidates need to be good at: writing modular, compositional code, understand various means of abstraction, and thus be able to collaborate on creating theories of domains. These are the skills that companies should test for, beside algorithms and data structures, because these are the ones required for the challenges that define the workdays of an average developer. In most cases, selecting for these skills is what guarantees the most efficient teams, which provide the best value for the business.
But how can skills related to writing modular, compositional code, be tested? And why does functional programming matter, after all?
Imperative programming is like building assembly lines, which take some initial global state as raw material, apply various specific transformations, mutations to it as this material is pushed through the line, and at the end comes off the end product, the final global state, that represents the result of the computation. Each step needs to change, rotate, massage the workpiece precisely one specific way, so that it is prepared for subsequent steps downstream. Every step downstream depend on every previous step, and their order is therefore fixed and rigid. Because of these dependencies, an individual computational step has not much use and meaning in itself, but only in the context of all the others, and to understand it, one must understand how the whole line works. Global mutable state creates interdependencies between all parts of the code, and programming languages provide no means for expressing these explicitly. Assembly lines produce exactly one product, and if a new product is to be produced, a new assembly line is needed. They represent a specific solution to a particular problem.
Object-oriented programming is a poor attempt at the modularization of the assembly line. Not only because, in doing so, it introduces somewhat arbitrary-feeling means of composition, like members and inheritance. But mainly because all that encapsulation achieves is the breakdown of the global mutable state into smaller pieces, which then get hidden inside modules called classes. But it doesn’t really change the fundamental paradigm of the assembly line, and address the underlying problem of interdependencies. It covers up (encapsulates) larger sections of the assembly line, but those sections can rarely be rearranged in unanticipated ways. And even if so, there is no systematic and proven way of doing it (unless we consider GoF design patterns as such).
The key to real modularization isn’t giving up global state, but giving up mutable state. By giving up effectful procedures, and trading them for pure functions, as the basic building blocks of computation. This seemingly restricting compromise brings functions in programming closer to the notion of functions in mathematics, and away from procedures. If we use pure functions as basic units of abstraction, and these functions compose as mathematical functions: f ∘ g, and this composition obeys laws: f ∘ (g ∘ h) = (f ∘ g) ∘ h and f ∘ id = f = id ∘ f, then we gain a systematic and proven approach to composing larger parts of smaller parts, that works on every level, at each scale. Smaller functions can be composed into larger functions infinitely, and this composition feels more natural and rigorous than inheritance and class members. This unlocks an arsenal of mathematical abstractions, ready to use.
Pure functions are not restricted procedures, but higher level abstractions, that translate to impure procedures. A programmer must only be concerned with the latter as much as a cinematographer is concerned with how a camera works. Functional programming abstracts over the assembly line, and it turns computer programs into large mathematical formulas, which we already know how to reason about. It elevates describing, reasoning and solving problems to the level that is the very science of abstraction: mathematics. That is the theory of theories.
So that’s why functional programming matters, because it’s simply a superior method of creating theories of problem domains. The difference between the approaches of imperative and functional programming mirrors the difference between code with high and low resistance to change: a convoluted network of ad-hoc interdependent code parts versus a system composed hierarchically of building blocks, following simple rules of composition on every level. Therefore the easiest way to test candidates’ skills related to writing modular and compositional code, is to test their functional programming skills. An understanding of key functional programming concepts, like pure functions, function composition, currying and partial application, recursive data structures, algebraic data types, monadic composition, type classes, lenses, monad transformers etc., is a good indicator that a candidate can write and collaborate on code that does not resist the growth and evolution of the business.