 In this video we are going to start to talk about the contents of chapter 8. So in chapter 8 it is all about understanding the memory in detail. That is important for you if you want to become a data scientist because eventually you want to work with big data. And what is the definition of big data? Well, there are many. But one of them is the data is so big that they don't fit into your computer's working memory, RAM memory. So this chapter 8 will lay the foundation to understand how we deal with situations like that where we cannot put everything into our computer's memory. And before we do so, to start out the chapter, we will introduce the so-called map-filled or reduced paradigm. That's a paradigm of how you can think, of how you want to process heavy loads of numerical data. And in the first video in this video we are not going to optimize the memory, but we are going to see a couple of implications that our program will have. And then in the next video we are going to make that memory efficient. So in this video let's go ahead and create a new file and let's call it MFR for map-filled or reduced paradigm. So let's make an example. So first of all, I am going to give you a couple of numerical data. So we are going to use the same numbers list that we did already in chapter 1, which is the list with the numbers 7, 11, 8, 5, and 3. Then we have 12, 2 and 6, 9, 10 and 1, 4. So these are the 12 numbers between 1 and 12 in no particular order. So I am going to give you a task and the task is to first go ahead and transform the numbers, so transform all numbers. And we are going to transform them according to a mathematical rule, so to say, or a mapping. Sometimes that's what they call it, that's where the term map comes from, but bear with me. So we are going to transform any x into a y according to the rule x to the power of 2 plus 1. So I wrote that here in the latex language so that it looks a little bit fancy, messy. So we are going to transform all the numbers in the list according to this rule. And then the second step, so maybe let's change that here in the first step and the second step is going to be filter out the odd ones. Or in other words, keep the even ones, whatever you want. And then we will have one more task at the end. Once we have done that, the third one is going to be sum up the remaining numbers. So these are three steps that we want to implement. And now we are going to do that by using a list object and also by using several other list objects to hold intermediate results. And then we will discuss what this way of writing the program has or implies in terms of memory usage. So let's first go ahead and write a transformation. So what we are going to do is, we are going to do the following. I am going to introduce a new list, let's call it transformed numbers or transform for short. It is going to be an empty list to start with. And this list is going to hold all the transformed numbers. So why am I doing that? So we could probably go ahead and transform the numbers in place in the numbers list that exist, but then we would change the input to the problem, so to say. And sometimes we don't want to do that. Let's assume we are just loading in some raw data and we don't want to change the raw data. We want to do some analysis and some calculations with the raw data, but we don't want to change the raw data. So therefore I create a new list called transformed, which is going to hold the transformed numbers. So how can we get these numbers here into the transformed list? Well, we are going to write a for loop and we are going to say for number in numbers. And then we are going to go ahead and we are going to transform the numbers. So let's say we call the number transformed number, maybe, or maybe let's call it number two. And let's simply go ahead and say that that is number to the power of two plus one. And now what are we going to do with number two? Well, number two is going to be appended to the end of the transformed list. So we are going to say transformed dot append number two. And now what we could do to make this a little bit easier to read and skip the temporary variable, we could simply go ahead and replace number two in the append as the argument and write the for loop just like that. And then at the end of the for loop, I'm simply going to evaluate the variable transformed so that we see the transformed numbers. So let's run the cell and we see the numbers are transformed. Okay, so let's continue. Let's go ahead and do step two. Step two is going to be filter out the odd ones. So in other words, keep the even ones. So we're going to write events and we're going to set that to an empty list again. So we're going to fill this events list with all the even numbers in the transformed list. So let's go ahead and simply say for number in transformed. And then we're going to write an if statement just like in the very first Python example in this course. We are going to say if number modulo divided by two double equals zero. So if division by two has no rest, then we know it's an even number. Then we are going to say events dot append number. And then at the end of the cell, I will give you a brief view of the events list. Okay, only even numbers. And now comes the third step. We are simply going to sum up all the events. And we can use the build in some function and we will write it just like that. Some events 292. So first of all, let's introduce a couple of words. Whenever you are given a sequence of numbers and note, I use the generic term, the abstract term sequence. It can come as a list can come as a table. It doesn't matter any sequence of numbers. I would say if for every element in the sequence, you want to do some transformation, then we are going to call that step a mapping step. Okay, so maybe let's introduce a variable here, a header here. Let's simply call it mapping. Okay, we are going to map every element to some other element. That is why the term mapping comes from. I said I call it transformation. So transformation is probably also a good term to use. But the more formal word for that step is simply called a map or mapping. We are mapping every element in the original list to some other number according to some transformation rule, which happens to be x squared plus one. The next step takes the transformed numbers, the mapped numbers and filters out the numbers. And therefore we call this step a filtering step. Okay, and the last step takes many numbers, a sequence of many numbers, which is of course shorter than the original numbers list, but it doesn't matter. It's still a sequence of many numbers. And it goes ahead and it reduces them into a single statistic by simply taking the sum. We could also calculate the average, which we did in the very first video of this course. We could do whatever we want, but the idea here is that we take a bunch of numbers and we reduce them into a single one, into a single statistic. So we call this step the reduction step. Okay, or also sometimes reducing step, but we call it the reduction step. Okay, why is map filter reduce? Mapping, filtering and reduction here so important. Well, it has many ideas. So first of all, a lot of theory how to break down big computations into smaller ones that can be separated onto several servers that simultaneously solve a problem. Let's say you're working, let's say you are Google and you're working with real big loads of data that that's never ever fit on even one machine, no matter how big the machine is. This paradigm has many, many implications, many, many advantages. For example, we have a sheer load of academic papers and algorithms to solve various kinds of problems to calculate various kinds of statistics that follow all these paradigm. So whenever you have some task, you're given some task, you're given some numbers and you have some task to do with those numbers, it is always worthwhile to in your head at least break down the steps that you need to solve the problem into mapping, filtering and reduction steps. And if you can do so, if you can do all the mappings in one step and all the filtering in one step or maybe in several filtering steps and then a reduction at the end or something like that, then this has many advantages, theoretical advantages. Namely, if the problem grows inside, so let's say if your input data just quadruples or goes by a factor of 1000 and it does not work with one machine anymore, then a computer scientist that knows about these theoretical ideas behind that can most likely make or solve your problem in a parallel way using many simultaneous servers that together find a solution to your problem. So I don't want to go into too much theory here, but mapping, filtering and reducing is a really good idea to break your problem down in these steps. Okay, so before we end this video, we are going to use Python tutor to see the implication of the code we wrote. And we can already guess it without Python tutor, but Python tutor really helps us to visualize a lot of stuff. So let's simply go ahead. So let's go ahead and copy over the first for loop. And then let's also copy over the second for loop. We don't need to give out the variables here because that was only in Jupyter lab here. Just like that. And then in the last row, we're going to simply thumb them up. And maybe if you want, you could also put all the lists up front maybe just like that. So we have a little bit of more order here. And let's run the, let's run the code here. So at first, we are simply creating a numbers list in the global scope as we see. And then we take, we create two, two originally empty lists that we are going to fill in. So in terms of memory usage, we can regard the empty list as simply no memory usage, even though there's a little overhead. But for now, it's simply, we can treat them as if they are not there. So we have this list here, this one list with your original data. And now let's see what's going to happen as the, as the program runs and it does its job. So it's going to loop. So we are going to have a global variable code number. We can also disregard that in terms of size, because one variable is, you know, not worth it to talk about. But at some point, we're going to get a transform list filled in. And after the mapping step is done, we have a second list, which is of equal size than the first one. So what has happened to our memory usage? Our memory usage has doubled. Okay. So in other words, if the original list of data occupies, let's say 60% of your computer's working memory, then doing this mapping is not possible. It will crash your computer. Your computer will run out of memory. It will die. You will have to hard reset it. Okay, so that is not a good thing. So let's go on. Now comes, okay, that was not even a mapping step yet. So now comes the filtering step. So the filtering step is going to filter out all the odd numbers and only keeping the even ones. And at some point, the filtering step is also done. So probably two more. So the two is going in, the 17 is not going in. Okay, before we now go ahead and calculate the resulting sum, what we see here is we have a third list, which happens to be half the other two lists. However, let's assume that the raw data contains only numbers that are mapped to, let's say, only even numbers. What would happen to our third list? Well, in the worst case scenario, the list would have the same size or the same length as the first and the second list. So in the worst case scenario, our memory consumption would actually triple. Okay, so on average, it wouldn't here, obviously, but in the worst case, it would. So in other words, if you wanted this program to run during these calculations, we would have to make sure that our original list that is given occupies at most one third of your computer's memory, probably a little less because there is some overhead in the memory, but no more than a third of your computer's memory. So that is already a limitation that we may not want. So in the last step, if I click next, the sum function is going to be executed and we don't see a result here, mainly because I forgot to put a variable here. Let's call it result and let's do it one more time and let's jump right to the end. And then we see a result variable here that we did not see previously, but the result actually doesn't really matter in terms of memory. It's only one number. Everything has been reduced into one number, so we don't really care about that. But what we do care about is the fact that we now have three lists and in the worst case scenario, all lists would be in full length as long as the very first list is. So there must be a better way to do that. And that is what chapter 8 is all about. Chapter 8 is all about avoiding temporary list objects. We want to do the calculation in a different way. So let's use this diagram here and think of an idea of how could we calculate the entire result. With the same result without having two intermediate lists. Well, what we could do is when we look at the first number in the raw data, this number seven, we could do the transformation, which gives me 50. But instead of saving the 50 in a list object, what we could do instead is we could just take the 50 and say because it's even, we have to keep it. We just add it to an initial running total. Remember the very first example of this course. Now maybe now you can guess why I made up this example in the very first Python example of this course because now we are building on top of that. So the number 50 will be added on top of an initial total of zero. Then we take the next number, the 11. 11 transformed will be 122. It will also be even number. So let's go ahead and add the 122 on top of the 50 on top of the running total and so on. The number eight will be mapped to 65. 65 is an odd number. Therefore it is not going to be added on top of the running total. So you can already guess that there is a smarter way to get to the same result without having any intermediate storage uses here. And now we could go back to chapter one and review what we programmed back then and we could come up with our own running total idea logic and so on. And we could actually solve this problem in a memory efficient way using only things that we have learned so far in this course. However, Python comes with several other data types that we have not yet seen concrete data types that can do that for us. And we want to learn about them. And this data type in abstract terms is called iterators. In more concrete terms it is often called channel raters. But no matter what the fancy word is, they are built in data types into Python that allow us to do these one by one calculations and not have intermediate lists here. So that is what we are going to see in the next video. So in the next video we are going to see the same example again and we are going to recalculate the same result. So remember the result is going to be 292. You can check in the next video if you get the same result. And we will do so in a different way using different syntax or different data types. And this will be highly memory efficient. So I will see you in the next video.