Connecting...

Pexels Photo 943096

Evolutionary Algorithms on the JVM via Scala — a minimal introduction by Mikołaj Koziarkiewicz

Pexels Photo 943096

Dig deeper into the world of machine learning and AI with Senior Software Engineer Mikołaj Koziarkiewicz and how it is worthwhile to consider other machine learning approaches. 

You will also be introduced to a basic overview of evolutionary algorithms via Scala.

 

Unless you’ve just woken up from a several-year cryostasis, you’re probably aware of the recent resurgence of machine learning and AI. This is yet another cycle of enthusiasm (historically interspersed with so-called Winters), and this one is fueled mostly by interest in recommendation systems and the advances — in algorithmics and supporting hardware — of neural networks for machine vision and other purposes.

It is therefore worthwhile to also consider other machine learning approaches, not as significantly blessed by the current hype. So, let’s talk about evolution.

 

Darwin in the machine

The generic proper term for any sort of heuristic approach that is inspired and/or mimics the process of evolution is Evolutionary Algorithms. As you can see from this list, there are quite a few categories, depending on what you choose to mimic and how you go about doing that.

Our primary area of interest for this blog entry will be genetic algorithms. These, are, the ones most people who have any idea about EA are probably aware of, potentially excepting genetic programming.

First of all, let’s say you have an optimization problem 'X'.

When setting up a solution using genetic algorithms, you first encode your problem 'X' into a set of genes, that are nothing more than simple values (integers, strings etc.), possibly with some constraints. This genetic encoding will be used to randomly generate a population of individuals, each possessing their own combination of gene values — that is, their genotypes.

In other words, each genotype represents a solution to the problem 'X'. That solution is a phenotype of the individual. For a biological comparison, your genotype is whatever DNA and other auxiliary genetic structures float about your body, and your phenotype comprises your body as a whole.

Other than a genotype and phenotype, you also need a fitness function. This function accepts a phenotype and returns a numerical score of how well the phenotype solves problem 'X'. This score is used to select the “best” genotypes as eligible to be included into the following generation of the population.

The next generation is then created. In the simplest approach, it’s done by passing through the selected individuals and filling up the remainder of the population quota randomly. However, usually instead of that, some genetic operators are applied to the selected genotypes, boiling down mostly to these categories:

  • recombination operators — “mix up” selected genotypes to create new ones — emulated sexual reproduction,
  • mutation operations — randomly change one or more values of a genotype. The primary benefit here is to escape local optima of the solution space.

The evolutionary run then repeats itself until a termination condition is found, such as:

  • reaching some value threshold of the fitness function,
  • going through a set number of generations,
  • subsequent generations not bringing enough improvement over the previous ones,
  • and so on.

You end up with one or more “optimal” genotypes, expressing phenotypes that are the best solutions to your problem that the genetic algorithm can find.

 

Digression: a little bit of context

(this section is completely optional, only providing some flavor to the article)​

For many years, when running evolutionary algorithms on the JVM, the JGAP project was the way to go. In fact, the original prototype of 'helisa' used JGAP as a backend, while being developed for an investigation into the subsequent series of blogs posts about traffic analisis(which are eventually coming, I swear!).

However, JGAP seems to be abandoned currently, even missing the doc page for over a year. Moreover, it is a very…​ venerable project, up to the point of using “raw” Lists instead of generic ones, etc.

Fortunately, a new contender appeared in the meantime. It’s called jenetics, and it already is both quite featureful and growing in popularity. Therefore, the current incarnation of 'helisa' uses jenetics as a backend.

 

Using GAs in Scala

For this blog, we are going to be using the jenetics library, with the slight caveat that it does not have a (stable) Scala API. However, a frontend called 'helisa' has been written by Yours Truly that allows for exactly that.

 

Encoding our problem

Let’s say we need to guess a number:

  • between 0 and 100,
  • that we know is even,

our genotype, phenotype and our fitness function could look like this:

import com.softwaremill.helisa._ // <1>

val genotype =
  () => genotypes.uniform(chromosomes.int(0, 100)) // <2>

case class Guess(num: Int) // <3>

def fitness(toGuess: Int) =
  (guess: Guess) => 1.0 / (guess.num - toGuess).abs // <4>
  1. Importing all the magic in the API.
  2. A producer of genotypes, each genotype expresses a single gene (which is the possible value of the “guess”).
  3. The representation of a solution to the problem (the phenotype).
  4. The fitness function — the closer to the target number, the higher the fitness score.
 

Running the process

Now that we have all the basic elements of our problem encoding, we can have a go at an evolutionary run!

We need some sort of manager for our process — in 'helisa' parlance, this is called an 'Evolver'. To create it, we need:

  • the genotype,
  • the fitness function,
  • a required population size (more = more diverse population, but slower iteration steps),
  • a validator function for any restriction on our phenotype or genotype.

In practice, we would write:

val Number = 72

val evolver =
  Evolver(fitness(Number), genotype) // <1>
    .populationSize(10)
    .phenotypeValidator(_.num % 2 == 0) // <2>
    .build()
  1. We need the fitness function and the genotype at the very start, since their types determine the output of the 'Evolver'.
  2. We know the number is even, so we’re restricting possible solutions to only those.

To manage iterating the evolutionary process, we currently have a couple of options:

  • 'scala.collection.Iterator'
  • 'scala.collection.Stream'
  • Akka Stream 'Source'
  • FS2 Stream for any 'Async'
  • Reactive Streams 'Publisher'

Let’s keep it simple and use a vanilla, Standard Scala Library Iterator:

val run = evolver.iterator() // <1>

val resultAfter2 = run
  .drop(2)
  .next()

println(resultAfter2.bestPhenotype)

val resultAfter100 = run
  .drop(97)
  .next()

println(resultAfter100.bestPhenotype)
  1. All initialization methods are defined in the 'Evolver' instance, and each call returns a fresh run with a new, random, population.

This will print something like:

Some(Guess(80))
Some(Guess(72))

As is apparent, given enough iterations, the algorithm eventually arrives at a good answer.

 

What can it be used for?

For a problem such as our example, we don’t need genetic algorithms — a simple hill climbing would suffice.

However, genetic algorithms are much more useful for e.g.:

In general, GAs help in situations where:

  • you have multiple parameters whose relationships with each other are hard to determine,
  • you have a search space that has multiple local optima, some better, some worse (and you want to reach the better ones).

Finally, the “killer app” for GAs is metaoptimization, i.e. finding the right parameters to solve some other problem. This applies for both topics like improving other machine learning algorithms (you can in fact witness some examples in the papers linked above) or stuff like your VM configuration.

Just remember one thing: the fitness function runs once per each individual, per each iteration — it can’t take too much time, otherwise it’ll grind your evolutionary run to a halt!

 

Summary

This short article provides a very basic overview of genetic algorithms. There’s lots more about GAs to tell, and I hope to do so here in the future. In the meantime, you can continue exploring the subject by:

  • checking out the 'helisa' project,
  • also taking a look at the Java “backend” of 'helisa', i.e. jenetics, which is, by the way, way better documented and contains more information about genetic algorithms and genetic programing.

Finally, if you’re familiar with JGAP, you can also try out the original, JGAP-backed version of 'helisa'.

Happy programming, and stay tuned!

 

 

This article was written by Mikołaj Koziarkiewicz and posted originally on SoftwareMill Blog.