Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming And Spark Streaming
Apache Spark is a really popular framework for data analytics. It combines various libraries but it can be difficult to know which part to use for which job so in this talk by Senior Software Engineer at Lightbend, Gerard Maas he focuses on choosing the right Fast Data stream processing features of Apache Spark.
Have a watch and next time you'll know exactly how to use Apache Spark!
A Tale Of Two Streaming APIs
Fast Data architectures have emerged as the answer for enterprises that need to process and analyze continuous streams of data. Apache Spark has matured into a very popular framework for data analytics that–when combined with other technologies found in Lightbend Fast Data Platform like Akka Streams, Kafka and Mesos–helps businesses accelerate decision making and become reactive to the particular characteristics of their market.
Spark combines various libraries like SQL-based analytics, Fast Data flow processing, graph analytics and a rich library of built-in machine learning algorithms to address a wide range of requirements for large-scale data analytics. But how can you know which part to use for the right job?
In this talk by Gerard Maas, O’Reilly author and Senior Software Engineer at Lightbend, we focus on choosing the right Fast Data stream processing features of Apache Spark, taking a practical, code-driven look at the two APIs available for this: the mature Spark Streaming and its younger sibling, Structured Streaming. Specifically, we will review:
- The capabilities of Spark's APIs for streaming and their key differences
- Advice on making the right choice for an application and how to architect and develop streaming pipelines that use one or both APIs to fulfill its requirements
- A code-based demo on getting started with both the Spark Streaming and Structured Streaming APIs
This article was posted by Oliver J. White and originally published on Lightbend Blog.