This is a great series of blogs from Marko Švaljek reagrding Stream Processing With Spring, Kafka, Spark and Cassandra, stay tuned for the rest of the series througout the week! This is Part 1 and Part 2.
'Part 1 - Overview

Using Spring Boot
We're basically just prototyping here, but to keep everything flexible and in the spirit of the newer architectural paradigms like Microservices the post will be split in 5 parts. The software will also be split so we won't use any specific container for our applications we'll just go with Spring Boot. In the posts we won't go much over the basic, you can always look it up in the official documentation.
Apache Kafka
This is the reason why I'm doing this in the first place. It's this new super cool messaging system that all the big players are using and I want to learn how to put it to everyday use.
Spark Streaming
For some time now I'm doing a lot of stuff with Apache Spark. But somehow I didn't get a chance to look into streaming a little bit better.
Cassandra
Why not?
What this series is about?
It's a year where everybody is talking about voting ... literary everywhere :) so let's make a voting app. In essence it will be a basic word count in the stream. But let's give some context to it while we're at it. We won't do anything complicated or useful. Basically the end result will be total count of token occurrence in the stream. We'll also break a lot of best practices in data modeling etc. in this series.
Series is for people oriented toward learning something new. I guess experienced and battle proven readers will find a ton of flaws in the concept but again most of them are deliberate. One thing I sometimes avoid in my posts is including source code. My opinion is that a lot more remains remembered and learners feel much more comfortable when faced with problems in practice. So I'll just copy paste crucial code parts. One more assumption from my side will be that the readers will be using IntelliJ IDEA. Let's got to Part 2 and see how to setup kafka.
Part 2 - Setting up Kafka
In this section we'll setup two kafka brokers. We'll also need a zookeeper. If you are reading this my guess is that you don't have one setup already so we'll use the one bundled with kafka. We won't cover everything here. Do read the official documentation for more in depth understanding.
Downloading
Download latest Apache Kafka. In this tutorial we'll use binary distribution. Pay attention to the version of scala if you attend to use kafka with specific scala version. In this tutorial we'll concentrate more on Java. But this will be more important in parts to come. In this section we'll use the tools that ship with Kafka distribution to test everything out. Once again download and extract the distribution of Apache Kafka from official pages.
Configuring brokers
Go into directory where you downloaded and extracted your kafka installation. There is a properties file template and we are going to use properties files to start the brokers. Make two copies of the file:
1$ cd your_kafka_installation_dir
2$ cp config/server.properties config/server0.properties
3$ cp config/server.properties config/server1.properties
Now use your favorite editor to make changes to broker configuration files. I'll just use vi, after all it has been around for 40 years :)
1$ vi config/server0.properties
Now make changes (check if they are set) to following properties:
1broker.id=0
2listeners=PLAINTEXT://:9092
3num.partitions=2
4log.dirs=/var/tmp/kafka-logs-0
Make the changes for the second node too:
1$ vi config/server1.properties
1broker.id=1
2listeners=PLAINTEXT://:9093
3num.partitions=2
4log.dirs=/var/tmp/kafka-logs-1
Starting everything up
First you need to start the zookeeper, it will be used to store the offsets for topics. There are more advanced versions of using where you don't need it but for someone just starting out it's much easier to use zookeeper bundled with the downloaded kafka. I recommend opening one shell tab where you can hold all of the running processes. We didn't make any changes to the zookeeper properties, they are just fine for our example:
1$ bin/zookeeper-server-start.sh config/zookeeper.properties &
From the output you'll notice it started a zookeeper on default port 2181. You can try telnet to this port on localhost just to check if everything is running fine. Now we'll start two kafka brokers:
1$ bin/kafka-server-start.sh config/server0.properties &
2$ bin/kafka-server-start.sh config/server1.properties &
Creating a topic
Before producing and consuming messages we need to create a topic for now you can think of it as of queue name. We need to give a reference to the zookeeper. We'll name a topic "votes", topic will have 2 partitions and a replication factor of 2. Please read the official documentation for further explanation. You'll see additional output coming from broker logs because we are running the examples in the background.
1$ bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic votes --partitions 2 --replication-factor 2
Sending and receiving messages with bundled command line tools
Open two additional shell tabs and position yourself in the directory where you installed kafka. We'll use one tab to produce messages. And second tab will consume the topic and will simply print out the stuff that we typed in in the first tab. Now this might be a bit funny, but imagine you are actually using kafka already!
In tab for producing messages run:
1$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic votes
In tab for consuming messages run:
1$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic votes
Stay tuned for part 3.