This guest post was written by Maarten Koopmans for Signify Technology. Maarten has over 20 years of experience in IT in various roles, and a decade with Scala.
This post is design-like and architectural in nature. It will be a 100 feet overview of when and how to use Akka, and when to use queues – specifically RabbitMQ with Akka. We will learn how to transform Akka’s at-most-once delivery to guaranteed delivery for remote components.
So what is Akka? Akka is one of the most used libraries in Scala to use in your Scala project. So what is an actor exactly? I will explain briefly, because the Akka docs (not just API) are extraordinary well written - see https://akka.io/docs/
An actor is a light-weight process (unlike a Thread) that completely encapsulates its state. You can send an actor a message which will end up in its mailbox, and the actor will process these messages in the order they arrive by matching the message against a set of partial functions (case statements) that you have defined inside your actor. You’ll get a reference to the sender, but messages are in principle one-way. If a case statement matches, its logic is executed. If no statement matches, the actor calls the whenUnhandled() function for possible error logic. You often use case objects or case classes to send a message.
Actors exists in a tree (supervision tree), which most often gets created automatically when an actor creates another actor – the newly created actor then becomes the child of the creating actor. Now if somewhere on the fifth level of your tree an actor crashes (for whatever reason), the supervisor strategy determines what happens, e.g. does the whole branch die, does the actor get restarted, do all actors that “siblings” get restarted…. Note that when an actor dies, all its children will also die.
To summarize some basic properties (and go read Akka documentation for more!):
- Actors are lightweight processes with little overhead that encapsulate behavior
- Actors exist in a tree
- You send messages to actors to have them do something
- Messages to actors can have no type parameters
- Akka has at-most-once semantics: a message will be sent at most once, and if it fails (e.g. network failure), it’s lost.
- You can create hundreds of thousands of actors on a single JVM!
- As design principle: every actor should do the smallest amount of work possible, so you get deep trees, but failures are precise.
- Akka’s actor model was heavily inspired by Erlang – in case you want to read up.
Akka has two other very cool features: you can transparently call a remote actor in another JVM, and Akka can be clustered. And this brings us to RabbitMQ.
RabbitMQ is a queueing system, and roughly works as follows: a producer publishes a message to an exchange. The message has a “routing key”, and this allows RabbitMQ to send the message to a queue that is bound to the exchange for that routing key. You can also have virtual hosts, and much more nice stuff, but RabbitMQ also has excellent documentation (and lots of client libraries), so go check out http://www.rabbitmq.com/
RabbitMQ implements the AMQP protocol (use version 0.9.1) with an important extension called publisher confirms. This is important to enable, so you will know whether your message actually has been published to the exchange (network faults do happen – especially in a cloud environment). RabbitMQ has at-least-once semantics, meaning that a message will be delivered, well, at least once to the queue provided the publish (on the other side) was confirmed.
Now when do you use RabbitMQ with Akka? When you need to have the guarantee that your message is delivered to the remote actor. Note that as RabbitMQ uses an at-least-once delivery, your receiving actor must be idempotent or have a way to handle duplicates. But your message never gets lost.
I’ve seen (and used) quite a few designs that work roughly as follows:
- A client JVM serializes the message to be sent to base64 encoded, compressed JSON (akka uses by default Java serialization, which would break between class changes because of the serial version uuid)
- The client JVM publishes to the queue and deals with the ack/nack of the publisher confirm.
- RabbitMQ delivers at least once to the receiving queue.
- The actor on the receiving end deserializes to the expected type, and upon success, matches the message.
- Somewhere in the logic the computation succeeds or fails and the consumer acks or nacks . The message will be taken of the queue (or not, if you nack with republish)
See what happened here? We have eliminated the cost of losing a message sent to a remote actor, at the cost of deduplication or of the remote actor being idempotent! And with RabbitMQs routing keys you can set up many other interesting things, especially if you have multiple remote actors or clusters.
I have seen the above usage pattern in several projects when data loss was not an option. I hope it helps you when you encounter that situation. Enjoy!
If you are interested in using Maarten's experience in your company or project, contact Signify Technology for more information. Also, if you would like to feature your own guest blog post, please email billie.graham@signifytechnology.com.