 This week, we discussed big data processing systems. We looked into the origins of MapReduce and Hadoop and saw how they were inspired by functional programming. We discussed how these systems used two primitives, mapping and aggregation, to separate the paralyzable portion of the user code from the portion that has high data dependencies. We also found out how these systems achieve resilience, which is important because they were designed to run on commodity clusters, where failures have to be considered the norm and not the exception, especially for long-running jobs. Hadoop was highly popular, but over time new applications emerged, which had different requirements and richer patterns of interaction. So we looked into Spark, how in-memory computation supports a more iterative and interactive style of applications, and how the emergence of topics like data science and machine learning fueled this development. We recognized how the programming model is much closer to functional programming, but as we will see, there are also other ways to interact with the system, like through Spark SQL. We saw how the system is based on immutable data and the abstraction of RDDs, but also how the alignment with data science made alternative formats like data frames more popular. Your final task for this week is to set up an environment in which you can work with Spark, because next week it will be all about programming the system and crunching some data.