Fast Data with Apache Spark, we have lots of questions!
Processing Fast Data with Apache Spark: The Tale of Two Streaming APIs
Fast Data architectures provide an answer to the increasing need for the enterprise to process and analyze continuous streams of data, which helps accelerate decision making and enables faster responses to changing characteristics of their market. Apache Spark is a popular framework for data analytics. Its capabilities in the streaming domain are represented by two APIs: The low-level Spark Streaming and the more declarative Structured Streaming, which builds upon the recent advances in Spark SQL query optimization and code generation.
After a quick introduction to both APIs, we will discuss their virtues, capabilities and key differences:
- How to get started: ease of development.
- How to deal with time: both at the processing and event level
- How to deal with state: locally, distributed and its relation with time
- How to migrate: functional coding strategies
- How to integrate: Fast Data and microservices
Using a practical approach supported by live demonstrations, we will provide insights into the sweet spot of each API, guidance on how to choose one or even combine both APIs to implement functional and resilient streaming pipelines.