Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Feb 27, 2017
Organizations are demanding increasingly faster tools to process and analyze data in real time. Apache Spark and Apache Flink have emerged as popular, open source frameworks to address these requirements. In this tech talk, we provide an overview of these technologies and the differences between them. We show how you can deploy Apache Spark and Flink on AWS to address common big data use cases such as batch and real-time data processing, interactive data science, predictive analytics, and more. We talk about common architectures for running these frameworks on Amazon EMR, including tips to connect to Amazon Kinesis – a platform of managed services that makes it easy to work with real-time streaming data in the AWS Cloud – and Apache Kafka, a popular open source platform for streaming data.
Learning Objectives: • Understand common use cases and differences between Apache Spark and Apache Flink • Explain deployment modes and best practices for running Spark and Flink on Amazon EMR • Identify ways to connect to Kinesis and Kafka for streaming ingest • Describe how to architect streaming jobs for durability and availability