Problem solving is a key element of programming. Functional Programmer and Data Science Engineer, Anatolii Kmetiuk started to practice a new approach, in this article he tells us all about it.
'In this article, I'd like to describe an approach in problem-solving I started to practice about a year ago. An approach I call Strategic problem-solving.
Position, Technology, System
To describe strategic problem-solving, we first need to introduce three terms.
Position - a capability to do something. In the context of programming, a "position" may be an ability to perform an HTTP request with one method call. Or have the compiler catch certain kinds of errors for you before they reach the runtime.
An important catch about positions is that they are precisely capabilities, not tools. E.g. take Circe library for Scala. It provides you with a position to parse JSON to a case class with a single line of code. However, the same position could have been provided by some other library. Think interfaces.
Technology - a set of positions, connected one with another in a way that solves a particular problem.
E.g. building a JSON HTTP API will most likely require you to have the positions for:
- A database
- Ability to receive and respond to HTTP requests easily
- Ability to work with JSON
- Ability to interact with the DB from the server
- Way to test your endpoints
- Orchestration of the above: ability to start/stop/test the server with the database with a single command.
And the defined ways for the above to communicate.
The above definitions give rise to strategic problem-solving.
The workflow of strategic problem-solving is simple:
- Identify what positions you need to solve a particular problem (see the example of a JSON API above).
- Define a technology over these positions: Given a database, a server and an ability to receive and send requests in JSON, I can trivially define most of simple JSON APIs.
- Find the tools to substitute to the position places in your technology.
The approach above allows you to first think what you need to solve the problem and only then look for the implementations of what you need, not vice versa.
Example in Scala
A good way to get a feel of this approach is via Frees.IO library for Scala. In general, when you work with this library, it forces you to follow the following workflow:
- Define the methods you are going to use in your program, without implementing them. They are defined in specific traits the authors call Algebras.
- Write your programs in terms of these abstract methods.
- Implement the methods. You will only be able to run your programs when all the methods you rely on are implemented.
The important catch to make: you will not be able to use anything else but the methods you defined at step (1). If in standard programming, you are usually able to import absolutely any 3rd party library, call methods from it in the middle of your program and it will work, this trick will not work with Frees.IO. It is another story why, and is out of scope of this article. Try it for yourself and see!
InterpretationHere is the interpretation of the above three steps of Frees.IO workflow in terms of strategic problem-solving.
- Positions. Defining methods (Algebras) in Frees.IO is equivalent to defining the positions you need for your problem to be solved. E.g. def readFile(str: String): F[String]. It is a method without implementation, but it is clear by name that it provides you with a capability - a strategic position - to read a text file to a String. F here is the reason you will not be able to call a 3rd party method in the middle of your program, by the way.
- Technology. Next, once you have some abstract methods, you can compose them into programs. E.g., given methods readFile, removeEmptyLinesFromStringand writeStringToFile, you are already able to compose them sequentially in a small program that removes empty lines from a file.
- System. Finally, in order to run your program, you first need to implement your abstract methods. This corresponds to the 3rd step of strategic problem solving - plugging in concrete tools in places of abstract positions. This step turns a technology (a blueprint of a process) into a concrete system.
Strategic Positions in Programming
The current situation in the Scala community is such that a strong focus is made on tools. E.g. if you follow the community media long enough, you can notice a certain hype around things like Category Theory, Cats, Monads, Type Theory etc etc.
In the framework of strategic problem-solving, tools define positions you have. Therefore, this focus is only on the first aspect of the strategic problem-solving. Discussions on unifying these into technologies and systems (systems, as in complete applications or websites, from zero to hero) are not as frequent.
Certain positions (especially when held together), give rise to a position that opens you such a great number of capabilities that you can't help but differentiate it from other positions. These we will call Strategic positions.
goto is evil
What does goto have to do with Scala collections?
goto is a technique of programming where you can label lines of your code, and then jump to these lines from any place of your code via goto <label> instruction (or similar). This is present in most of the programming languages.
goto gives you lots of power. You can describe control statements like if, while, for etc in terms of goto. But with great power comes great responsibility. goto is error-prone. It is easy to get bugs when trying to express complex logic with goto.
However if you program with goto for long enough, you will start to notice patterns in your goto-related code. With time, these patterns got language-level support in most programming languages as your usual control structures: if, while, for etc.
Moreover, it turns out that with a certain set of control structures, you can express everything you wanted to express with a lower-level goto in a declarative way, with less chance of a bug. Hence now-a-days goto is considered a bad practice in programming.
Can you see where this is headed yet?
Java collections are like goto. Imperative, low-level primitives you can use to solve virtually any collection problem you will encounter. Only the most simple and necessary stuff is supported (like adding, getting and removing elements from a list). Powerful, but with great power comes great responsibility. Like with goto, you will encounter a ton of bugs trying to solve a simple collections problem. If you are a lucky Scala programmer who have not been touching Java collections in years and you'd like to cause your mind some suffering, I recommend trying the poker hands problem in Java collections to see what I mean :).
Similarly to if and other control structures, Scala Collections give you about 40-50 solutions to problems commonly encountered in collections programming. Again, declarative style with minimal possibility of a bug.
Why are Scala Collections a strategic position for the language and its programmers?
- Collections are ubiquitous. From the simplest mobile app to a distributed machine learning environment, you will deal with collections on every step.
- Collections are hard, and error-prone. If you don't believe me, try that poker hands problem above, in Java.
- It is hard to determine patterns with collections programming due to lots of use cases. Look at the amount of methods (patterns) Scala's List defines. Now imagine that these are not the only problems you will encounter - you will also encounter ones that are solved by combining several different methods.
So, getting collections right is hard due to lots of use cases. And if you don't get it right, it will hurt you almost in every project you will take.
Scala Collections solve this pain for you, however. And by doing so, it opens you doors to simple development of a wide range of projects. If previously the amount of collections work involved in a project could be seen as a boost in its complexity, with Scala Collections - no more (of course, this is true only to a certain point - measure is treasure).
One thing to notice here: both in case of goto and Scala Collections, a strategic position was achieved by providing a declarative solution to a problem. That is, no longer do you need to explain the computer how it works - you just say what you want to get done.
The Whole is greater than the Sum
Another thing to notice is that every individual control structure (if, while, for) is a position of its own, in the sense defined above. The same can be said about the Scala Collections methods: map, filter etc. However, only as an integral whole can they provide you a strategic advantage and eliminate the need for goto or Java Collections.
Without a single one, you can get in a situation where a certain problem is a "blind spot" for your current positions.
This is actually a particularly nasty situation: when a project is extremely good at solving some problem - but at certain point you realize they missed something very small and insignificant that you need at the moment. And you need to go deep to the internals of the project to get the thing done. Small things matter.
Such a project can't be regarded as a strategic position in my opinion, because it does not unlock an entire class of problems. And will likely take a deal of time from you.
Difference from a Technology
The situation above when you have 40-50 positions (capabilities to solve common collections problems) can not be called a technology. This is so, because you do not connect these positions in any way to solve a particular task. Instead, you are flexible: these positions are valuable precisely because they are easily connected in any way you like depending on a problem you are facing.
A situation where you have some 40-50 positions pre-connected for you is called a framework. And we all hate frameworks.
Category Theory Programming
The main idea of functional programming is to eliminate side effects from your functions. This way, it becomes easier to reason about them.
However, attempts to program in a purely functional style gave rise to a number of patterns - problems that repeat over and over (remember goto and Scala Collections). The solution comes in terms of Cats and the Typelevel family of libraries. They are to functional programming what control structures are to imperative programming or Scala Collections are to collections problems. That is, they provide solutions to common problems. How exactly this is done is out of scope of this article - in fact, I have written an entire book and a course on the subject.
This way, these libraries become a strategic position, because they enable you to program anything in an entirely new style.
Where R beats Scala
One thing Scala is bad at is Data Science.
For machine learning, there are Spark, DeepLearning4J even some functional programming solutions like Spire and libraries that stem from it. For data exploration, there are bindings for Bokeh and D3.
However for Data Science proper (i.e. quickly experimenting with models, exploring the data visually - not to be confused with Data Engineering, when you care about performance and scale), you will have a deal of hard times dealing with these libraries.
One example - plotting. Plotting places you in position of visual exploration of the datasets. If you ask me, the ability to quickly and easily see visually what you have, how things relate, and play with them in general, is a pretty strategic position.
With Scala, you can't do plotting easily. Best thing that you have is e.g. Bokeh - but there, you need a few dozens lines to do even a simple plot. It has to do with the fact that you need to define lots of model classes, map the data to graphical representations, define the widgets your view will have etc etc...
With R, many plots can be done with a single statement (spanning on average 5 lines).
Now imagine that you need to do plotting frequently (which is the case for data exploration and visualization tasks). To get an insight, you need to run lots of experiments with your data, and plot them. If every experiment takes you several dozens of lines to conduct, one of the two is likely to happen:
- You will be limiting yourself in terms of the number of experiments you run (because they are perceived as a costly asset). Often this will happen on a subconscious level, you won't even think much of it. Or...
- You will diligently build as many plots as the task demands - but you will waste lots of time.
This is so, because Scala does not provide you with a (strategic) position to do plotting. This won't happen in R. Unless...
R and its libraries are poorly documented. Remember the remark above about partial solutions? If R is not your main language, the position of having R and its libraries alone will not be enough for you, because you won't know what to do with them!
What you need is an extra position - a position of easy access to the information of how to use relevant features of R. This position can be implemented in terms of R Cheat Sheets. Still worse than a systematic documentation, but better than nothing. And for my needs, together with R and its libraries they give a strategic position with respect to plotting.
Web Applications and Docker
Arguably, a position to build applications exposed over the web is a strategic one: Lots of problems are solved in terms of JSON web APIs, or Software as a Service.
Here is a set of positions that became a strategic position for me with respect to web applications:
- Backend (with Finch/http4s)
- JSON handling (Circe)
- Database interaction (Doobie)
- Front end (optional, ScalaJS)
- Database (Postgres)
- Orchestration (Docker)
For a while, I did not pay enough attention to Orchestration. Try working with the stack above but without Docker or a similar solution, and then try using Docker. You will see what I mean by the problem of partial solutions.'