 Welcome to chapter 8 of the Introduction to Python Programming Course. This chapter is on mappings and sets. So, what are mappings? Well, for the purpose of this chapter, I want you to think of mappings as basically something you all know. So, here I have a coordinate axis, I have x, I have y. And now, the thing is this. You are used to drawing functions as usually a continuous line. For example, a parabola like this. Now, computers cannot be continuous as we will see in a moment. But where is the mapping here? Well, a function is nothing but a mapping from axis to y. So, an f is a function that maps the axis to the y. And usually we write that as a lowercase y is equal to a function of x. So, we know that from high school. And as I said, this function is continuous and computers cannot be continuous. Why can they not be continuous? Well, as we learned in chapter 5 on numbers, we only have a finite number of bits available in our computer's memory to model numbers. So, let's for a moment just take the whole numbers. Maybe let's have a negative one here, a negative two here. We have the zero here, of course. We have the plus one and we have a plus two. And then whatever the values are here, let's say this here is a one. And then this may be a one-half here. And then this may be a one. And so, what does that lead to? That means that if I only look at the discrete values of this mapping here, I would have a mapping that looks somewhat like this. So, we have a table where we have all the axis, negative two, negative one, zero, plus one, plus two. And they are mapped to the y's. And just in the example I said, I made this up. This is mapped to one. This here is mapped to 0.5. The zero is mapped. Okay, maybe should not have put the one here. Maybe I should make it two here and a 1.5. So, we just make this two and this a 1.5. And then the zero is mapped to a one. And the plus one is mapped to a 1.5. And the plus two is mapped to a two. It doesn't really matter. The only thing that matters is that we see that a function is just a mapping from axis to y's. And this is basically conceptually what we want to model in a programming language as well. And what do we see from here? We see that the axis, they are unique. That's an important fact. And the y's, what about them? They may not be unique. This is just how functions are defined in math. And there are certainly other types of mappings, but this is a mapping in the way we are taught in math. And it's exactly this what we want to mimic in a computer. So this kind of mapping. So how do we do that in Python? Well, mainly we do that with the so-called dict type. Dict type stands for dictionary. And as we can guess, a dictionary is something where we can look up something. And this is exactly what we will do. So in the mapping, I will look, I say that I look up the axis, and when I look up an x, that means I read the corresponding y to the x. That's what the lookup means. And so this is what we do with the dict type. So let's see how we can model a dictionary. So throughout this chapter, I will have two main examples that are simple, and one example that is a bit more involved, but also quite simple. So the first simple example is the dictionary called two words. And what does it do? It maps the integers 0, 1, 2 to the strings 0, 1, 2. So syntactically, how do we write it? We write it with curly phrases. And then as we learned before, the white space in between opening and closing delimiters is basically ignored in Python. So we could also write this on entirely one line, but it makes sense because we're mapping access to y, so to say, to write it in maybe a tabular fashion here so that we can read the source code in an easier way. So the curly phrases, that is what constitutes the literal notation for dictionaries. And then let's look at each of the lines. So we map a 0, the integer 0 to the string 0. So we write on the left-hand side the number 0, then we write a colon. So colon belongs to the syntax. Then we write the right-hand side to which we map. And then we write a comma because then the entry is over, so to say. And then this way we specify all the mappings in the two words dictionary. Okay, regarding some terminology. This is dictionary as we have learned. And the values, the access that are mapped to some other values, they are called the keys. And on the right-hand side we have the values that mimic the y's. They are called the values. So a note just to reiterate the point when we had, let's say, an object, any object. Which object do we take? Let's just take for illustration purposes the object 10.0, the flow 10.0. Whenever I create an object, we said that any object has three properties. So the first one was always the identity. So that would be the address here. The second property of any object would be the type. And then what was the third property I was always talking about? Well, we refer to this as the value. And what did we mean? We meant the semantic value of the object. So whatever is inside the object in memory, and whatever that means to us humans, that is what we mean by the value of an object. Okay, so this is one meaning of the word value. Going back to chapter one. And now, as I just mentioned in the terminology for dictionaries, we will say that a dictionary is a mapping from keys to values. So note how I reuse the word value here. And value now has a double meaning. We have to be aware of that. So first, it means the semantic value every object has. And secondly, it means the role an object can play within a dictionary. And the strings 0, 1, 2, they play the roles of values within this dictionary. And the objects 0, 1, 2, the integers, they play the roles of the keys in the dictionary. Okay, so just to get this right away so that you are not confused when you're listening to me now. So whenever I say value, you have to always decide what am I talking about? Am I, do I mean the value of an object or do I mean the values inside a dictionary? These are two different things here. Okay, and just in a way, I created this dictionary. Let's create another dictionary and I call it from words and we just do the opposite. So mathematically, I know that this is not a mathematical function here, but let's say if this were a mathematical function, what would from words and to words be? Well, they would be inverses of each other, right? So we map from 0, 1, 2 integer to 0, 1, 2 string and then here we map 0, 1, 2, the strings to the 0, 1, 2 integer. Okay, so inverse function. And in order for a function to be invertible in math, we remember that the wise must also be unique, otherwise we couldn't invert the function. But here in Python, we can actually deal with that, but this will also, yeah, we will look into this. So it's not, we're not looking at functions here, but I just give you the idea of a function here to give you a reference of something that you know from high school of how you should think about dictionaries. So let's see what else can we tell about dictionaries? Well, of course, every dictionary is an object as anything in Python, so it has an ID. It also has a type and the type is, of course, ticked here, and then it has a value. Note the first meaning of the word value here and the value, the semantic value, so to say, of the dictionary is just the mapping itself. So here I'm given out the value of the object as a literal. A literal means I can copy paste this here back into a code cell and I will get a new object with the same value. That's what happens when I get back, whenever I get back, the value of an object as a literal. Okay, so in other words, we speak of this as the literal notation of the object. Now, as we have seen in previous chapters, most of the types, built in data types in Python, they have a literal notation with which we can create them. However, they also have a constructor with which we can create them and so does the dictionary type as well. So we have a ticked constructor and the ticked constructor can be used in different ways. So one way to use the ticked constructor is we pass it. An iterable that contains or that produces two tuples and the first element inside the tuples is the key and the second element will become the value, the role of the value. So in other words, here I have the brackets. So this constitutes a list. So I'm basically passing only one list to the ticked constructor and the list is, of course, iterable. And then the requirement is that the iterable consists of tuples that whose first element become the keys and the second elements become the value. So if I execute this, I will get back basically the same dictionary as seen before just from a different notation. And then I said that the ticked constructor is quite versatile. So we can use it in a different way. We can treat it as a function and provide keyword arguments. The keyword arguments, here 0, 1, 2, they are not strings here, actually. So this is just kind of like a variable here, but they will become strings due to how Python is built. So the keywords here, they will become the keys and the arguments that we pass in by the keyword, these will become the values. In other words, this will also give us a dictionary which looks the same. So we have different ways of constructing dictionaries. Okay, these were the easy examples. And let's now look at a more realistic example of a dictionary that models what we will call nested data. And what you're about to see is a format of data which is quite common to see if you ask for data from a web API, for example. So oftentimes, when you download a third-party lab package to, you know, for example, make a request to the Google Maps API or the Yelp API or the Uber API or wherever, then usually the data that comes back will be represented in a nested structure and an example of which I provide to here next. So let's try to analyze this. So at first, this is one statement. We see that in the first line, the statement begins. It's an assignment statement. And we know that the right-hand side of the assignment statement is evaluated first. That means the entire code here that is marked will be evaluated first. And only after that, when the object that this evaluates to has been created in memory, the people variable will be assigned to this object or a reference from the people variable will be made to the new object. The new object will be a dictionary, of course. And this dictionary will have three keys called mathematicians, physicists, and programmers. That's it. And then these names or these strings are mapped to what? They are mapped to lists. So the mathematicians, the string mathematicians is mapped to a list that contains, as we will see, two more dictionaries. And of course, each dictionary within this list models a person. That's what we can tell already. And then we see that there are two mathematicians on the list, so to say. And then physicists is mapped to an empty list. So that means currently we have no physicists in our people list or in our people dictionary. And then we have a third kind of list here. We have a programmer's list. And this is a list as well. So in that way, the data structure is homogeneous. And then within this list, there is only one dictionary. And we end it here with a comma. That's a good convention to do this comma, the commas here at the end. Some of them would not be necessary. I put them there anyways to make it look unique. It also has some other advantages that are not so important. But again, this is also one person. And then we see that the three inner most dictionaries that modeled people are the persons that go by the name Gilbert Strang, who is a professor of math at MIT. Then we have Leonard Euler, one of the most famous, I guess, and most influential mathematicians of all time. And then in the programmer's list, we have, of course, Guido van Rosom, who invented Python. And then so let's look at the inner dictionaries. What can we tell about them? Well, what we can say is that every person has two attributes, so to say. Every person has a name. And here we just summarize first name, last name into one field. And then every person, again, has an attribute called emails, plural, which, again, is mapped to a list, and the list contains strings, like, for example, the email address, Gilbert at MIT. I'm pretty sure this email address does not work. I hope it doesn't work because I didn't ask Gilbert Strang if I can put his email address here as an example. And then regarding Guido van Rosom, so Guido has two email addresses. Why? Well, he has, of course, his Python.org email address. And then until last year, end of last year, he worked for Dropbox, which is why I made up his second email address, Guido at Dropbox.com. And so the question is now, what am I modeling here? I'm modeling here a nested data structure. And basically, we see that this has some similarity to the sequences we talked about in chapter seven, in that it basically consists of elements, and the elements are kind of like the dictionaries. And structure-wise, the elements look alike. So they are structured in the same way. They have the same fields. The email attribute is mapped always to a list, even if the list is empty, but it's still a list that we map to. And so this is data that is basically in a format which is similar to how data is stored in a database, in a SQL database, a normalized database. And again, if you make an API request to many of the web APIs out there, you will get oftentimes data that looks like this. If this were, let's say, modeling Google Maps or something, then this would maybe be a list of places in a certain area where we did a search via the API or something like that. Okay, so this is how we can model, you know, data in a program. And there is one problem set on the exercises list that makes you work with such data. So you will look into this in detail, and we will also use this example throughout this chapter here. Now, because people, it's kind of hard to read. Oh, I did not execute the code cells, so now people is defined. Now, this is hard to read. Why? Because the way Python, by default, prints out a nested data structure is usually very hard to read because there is no line breaks and nothing in it. However, the standard library offers a p-print module which has a p-print function. A p-print stands for pretty print. And I can pretty print with some settings, and I get back the nested data structure in a readable way, I would say. So it's like what we have here at the end of the day is like a directory, right? A directory of people and ways to contact them. So the question is now, because we care about this in this course, how are dictionaries modeled in memory? How do they work in memory? So let's look at first some observations we make, and then I will tell you how they work. So before we have seen dictionaries that map numbers, integers to strings, and also that map strings to numbers, and also in the last example with people, we mapped strings to lists, basically, and also strings to other strings. So now let's try to map a list of two strings to a list of two numbers. This will make sense, right? If I have a dictionary two words and from words that maps individual words to numbers, why would we not just be able to map a couple or a pair of words or numbers? So let's try what happens, and we get a type error, and it says unhashable type list. So there are two lists in this expression, one on the left-hand side for the key and one on the right-hand side for the value within the dictionary. So we are not yet sure which of the two lists causes the problem. The solution will be the left-hand side, as we shall see. And then let's see at another property that we have to understand. So I create a dictionary which maps 0 to 1, add 0 to 0, 1 to 1, 2 to 2, and then just for illustration purposes, maps the string 0 again, and this time to the number 3. This is down here. And now, so basically we are mapping 0 to 2 different values. In the analogy of the math function here, what that means is that that would be the same as if I mapped 1x to two different y values, right? And in math we know that this is not allowed. So let's see what happens in Python. Well, it does not cause an error, however, 0 obviously maps to 3 now. So the order seems to be consistent. So 0 becomes 0, 1, 2 is still in order, but the value to which the key 0 is mapped is now 3. So the rule seems to be the first key is kept and the last value belonging to a key is mapped. This seems to be the rule. We will see if that is the case. So what this slide shows is there are some issues that may occur and in order to understand what we see here, we have to understand what's going on in memory. So here is a brief overview of what a so-called hash table is and hash tables is the underlying data structure of how dictionaries are modeled in Python. However, I will go ahead and draw the dictionaries in the same way in our memory diagrams as always. So let's see. I will show you how one of the two simple examples of the dictionaries is modeled. So let's take the dictionary two words that maps the three integers to the three strings. How is that modeled in memory? So what Python does when we create a dictionary, it creates a box that technically is also an array, just like the list that we've seen many times before. But I have to draw this a little bit bigger to get some points. And of course, whenever we draw such an array, it will be bigger than the number of elements that we want to put in. Because we already know that Python tries to make or must make some assumptions about how many elements go in because we don't want to deal with the memory explicitly. That's why Python is a high-level language. So Python has to make some assumptions. And so what Python then does, so these are slots. Another word for slots would be bucket. This is why you see the word bucket here on the presentation. But the actual, the proper word would still be slot, I guess. And then what Python does, it draws another line of separation here. So we have now a crit kind of. And then what Python does, the slots, they are labeled. And usually I wouldn't do that, but let's put indices here. So this would be 1, 0, 1, 2, 3, 4, 5, 6, 7 and 8. And how many there are is not the issue here. It could be less, it could be more. This is an implementation detail and we don't care about this. But what we care about is the following. And then up here, maybe I should write this as well. On the upper side, we will write the keys. And on the lower side, we will put the values. And then because Python always keeps track of what the type of an object is, we of course have the type dict here. Because this is going to be a dictionary. And now I told you that I want to model the two words, the two words dictionary. So what's going to happen is we will have six objects inside the stick. Three keys and three values. So the first key would be this, which is 0. And it's going to be an int. So what Python is doing, and I will just do it now as if I were a Python interpreter. And then we will look how what I'm going to show you works exactly later on. So Python does some calculation with that, which we will call hashing. So it will call the built in hash function. And it will get back basically a number that helps it to map it into one of these buckets here. And the idea is that all the buckets are equally likely. That's the idea of the hashing algorithm. So let's say for example that Python decides to put this int into bucket number four. So it puts in a reference here. This is just like any ordinary reference in a list. And now the zero number is in here. And then what happens is if that worked, Python goes ahead and creates text string and because this is the value corresponding to this key, Python will go here and put a reference here. And now we do that two more times. So the next object it will create will be the object one. And this will be of course an int. And then Python goes ahead and via the hashing algorithm that I'm going to show you later. It figures out which of these slots it will put it in. And as I said, any slot is equally likely. So there is a chance that Python for whatever reason wants to put in the reference going to the one in the same slot as here. And whenever this happens, we have a name for that. We call that a hash collision. But let's assume Python wants to put it into the one maybe. And we get this reference here. So what happens next is Python will go ahead and of course create the string one and probably type it here. And then of course go ahead and make a reference go here. And then one last time, Python will go ahead and create here the true object which is of course an int. And go through the same hashing algorithm again and in the example now maybe choose slot number six to put this in. And then in the last object that it needs to create to model this, it will create a new object, put the word two in it, put str string here and then will of course put the reference here. And then now the dictionary has been created and now what happens, since everything on the right hand side of the assignment statement was achieved, Python goes ahead and creates a name, in this case two words and makes the name in the global namespace reference the dictionary object. So this is what happens when we create a dictionary. So this thing here we call a hash table and what is a hash table? Well at the end of the day what this is, it basically means Python does exactly the same as for lists and for tuples, but for every slot it just reserves twice the space. Why exactly twice? Well as we have already learned before, Python in the lists and in the tuples only keeps the references, so the objects are never inside the array, they are always only references and because of that, Python does not need, exactly needs twice the space as for a list here, right? And then it keeps the references, one reference to the key and the corresponding reference to the value that is mapped to it and this is it. And then again, which slot we take? This happens by random chance in a way. It's actually not by random chance, actually it happens by a deterministic algorithm called the hash algorithm, but it appears as if it's happening by random chance and the important thing is the hash algorithm has to sort in the references in a uniform way, so every slot has to be equally like we otherwise. The algorithm would not be good but we don't want to go into too much details that you would study in the field of computer science, we will just... All we want is we want to get a rough understanding of what's going on in the computer's memory and this is what happens when we create a dictionary. And now I want to show you the hash function, so Python's built-in hash function does the following, we give it an argument like zero, the string and we get back some number. We have seen such long numbers before and that was usually the kind of numbers that were returned by the id function. However, this is not an id, this is not a memory address, this is just a number that somehow deterministically gets calculated from whatever we give the hash function. So if I execute this function again, hopefully, I get the same number again. So yes, my computer is not broken, luckily, so I get the same hash number, the same hash value back. And then what does Python do? Python looks at this number and then it looks at the number of slots it has reserved and again, Python decides how big the dictionary is for us, so Python maybe decides, okay, this has now nine slots or eight slots or whatever. We don't care how many slots, we just decide how wide the array is and then what Python does is it figures out a way of how to map this number exactly into one of those slots and how does it do that? Without going into too much detail, it basically does modulo division and what does modulo division do? Modulo division is division that gives you the rest that remains after the division and let's say if you divide this big number here, the big number here, by the number of slots that we have, you will get a number between zero and the number of the slots we have. Because if you modulo divide by something, if we modulo divide by some number, the result will always be between zero and one less than the number we divide by because the rest can never be more than the number we divide by. Modulo division here again, I told you in chapter one that modulo division is basically a very powerful concept that we use all over the place in programming and so modulo division basically conceptually at least is what's going on when we map this long integer number here into one of the slots. Now the question is of course from the value here from the string value from zero to this number, well this is done by a hashing algorithm and in order to understand that we would really have to study computer science or do some reading on the web. It's not so hard but this is really not important for us as pragmatic programmers here because we only want to prepare ourselves for a practical data science career and not go into too much detail here in hashing algorithms but it's quite interesting. Let's see what does the hash function do with other numbers. Some numbers they seem to give you back an intuitive result so whenever we pass in an integer to the hash function we get back basically the number itself. However if I pass in let's say 0.1 I get back some of the numbers so that means just a small difference in the value going from 0 to 0.1 leads to a big difference in the hash value and that's on purpose. That's the whole idea of a hash algorithm the whole idea is to basically when deciding in which slot an object goes and the only thing that determines where this goes is the value of the slots this 0 will go in and also this 2 and this 1 here will determine in which slots they go what you want is if you have different numbers you want to have a very high chance that the slots in which they go are different. You don't want hash collisions you don't want Python to put in a reference where there is a reference with two objects as opposed to go inside the same slot. Then what Python does is if Python for example decided that the 1 should go in here where the 0 is and then there is already a reference in it what Python does is conceptually it's not the whole truth but conceptually it runs the hash algorithm again with a little bit of a different initial value so as to get a different slot back in other words Python puts in the first reference into the array and then once the second one goes most likely it will not collide because there is a very high chance it won't collide but in the case, in the rare case that the second reference going in collides with the first one that is already somewhere in the array then another slot will be chosen by the same hash algorithm and then of course what we see already is the more references we have inside the array the higher is the chance that we get a hash collision, sorry and that means in other words the more crowded this array gets the higher is the chance that we have a hash collision and the higher the chance for a hash collision is the more often the hash function needs to be run to find another slot and in that regard putting something into the dictionary will become slower ok do you see that so if you have to like check several slots if they are empty before you put in the reference it of course takes more time and that means if this array gets too crowded inserting in the dictionary will become slower which is why there is a rule of thumb if this hash table gets two thirds full what python does it will create a new hash table that is twice as big somewhere else in memory and will copy over all the references and when the references go from the small hash table to the bigger hash table what happens is they go through the same hashing algorithm again so they get a new bucket so if the 0 is in the fourth bucket here for whatever reason this dictionary gets too crowded and we get a new hash table where we put in all the references then most likely the 0 will not be in the fourth spot in the new hash table it will be in some other spot now if you study computer science you will have to learn a lot about hash tables and hashing but again this is abstracted away from us but still we just saw one slide before two slides before we saw some weird behavior in python namely what we saw is that something couldn't be put inside the dictionary because it was unhashable and then second if we put in a key twice then one gets lost so we must be careful and I want to show you I want to explain to you two things happen so let's first look at the second observation we made the second observation we made is if we put in a 0 if we put in a 0 twice then the second 0 will overwrite the first 0 so let's try so what happens for example if I try to calculate the hash value of the float 0.0 well the hash value interestingly is also 0 and maybe let's do one more thing let's create 0 plus 0j so what does that do that creates a complex number that is also equal to 0 basically let's check what is the hash value of this oh it's also 0 so in other words there is a rule and the rule is if two objects compare equal even if they have a different type but as long as two objects when they compare equal when the semantic meaning of the value of the object is the same then they get the same hash value and this has to do with the fact that if we put in a 0 inside the dictionary as the key we want to be able to retrieve it to look it up not only with the integer 0 should we ever do that? probably not but the fact is if we later on work with dictionaries we only want to really insert stuff and look up stuff by the semantic value in this regard I mean value in the terms of the value of the object and that's for technical reasons now know if two objects evaluate equal they will have the same they will have the same hash value and if you have the same hash value you will end up in the same bucket and then if for example I put in the 0 here where is it? the 0 in this one here and the corresponding value is the object with the string 0 if I then put in another 0 with the key and it maps to some other string as the value then whatever is put in last will overwrite what was put in first this reference will get overwritten and that means we lose we lose the value in this case so we remember that means what does it mean effectively that means that the dictionary behaves where a mathematical function and just as we wrote here earlier the axis have to be unique this is why it has to do with the way the hash table is implemented and the hash tables work ok now from what I just said follows the second observation so what I just did is I said if two objects evaluate equal they have to have the same hash value in other words the hash value seems to be dependent on the semantic value of an object and because that is the case that means if an object were able to change its semantic value throughout its lifetime it would have a different hash value and we know objects that can change their semantic value what did we call these objects before we called them mutable objects like the list type for example so whenever an object allows us to change the ones and zeros inside it that means its hash table is not unique and if an object does not have a unique hash table a unique hash value then what that means is the bucket into which it goes is not unique as well in other words for a mutable object we don't know in which slot to put it and this also has to do with the fact that whatever we put inside the hash table we want to be able to look it up later that's the whole idea as we shall see the whole idea of a hash table is to look up things in a very fast way that's the only reason really to use a hash table and if we want to look something up that change its value in the meantime how do we look it up where we can't because if it changes its value in the meantime it also changes its hash value in the meantime and if it's changed its hash value in the meantime it also got a different bucket that means we cannot find it anymore that means the hash table would work as if it were a black hole we would put something in and we couldn't find it anymore that is why we cannot put in mutable objects so here's the example I try to calculate the hash value of a list and this is why I get back the type error and the hash function complains that the object we put in is of a mutable type that's the problem here unfortunately the error message here is not so nice it doesn't really it basically says the type we gave it is unhashable but it's actually quite funny we call it the hash function and we get an error and it says the object is unhashable so what do we learn from this nothing the error message would be a little bit nicer if it said type error the object is mutable and it must be immutable but now we have learned this so now you know you can only put immutable objects as a key inside the hash table or inside the dictionary so how do we make the list immutable we've learned that in the first part of chapter 7 instead of a list the brackets we just use a tuple with a parentheses so here the brackets and down here we have the parentheses so that's now a tuple and we know that a tuple is nothing but a list that is immutable in other words it's somewhat similar like the dictionary just with only one let's say with only the upper half of the array and once the reference is inside the tuple it cannot be changed and because of that a tuple has a hash value and because of that I can put in or I can use a tuple as a key in the dictionary let's execute the next line I have as a key I have the tuple so an immutable object and on the right hand side this is mapped to a list so the value within a dictionary does not have to be immutable and that's okay because the dictionary when we want to look something up we always look it up by the key we never look up the value by value the value can be immutable or immutable, we don't care so the learning here is whenever we want to use a let's call it a composite key a key in a dictionary that is composed of more than one object then we must use a tuple to collect them together and then also all the objects within the tuple must also be immutable so on the left hand side that is used as a key must be immutable okay so now I think we have analysed all of what we need to understand about hash tables so that now we know how the dictionary type works and now let's look a little bit about how they behave so in chapter 7 we talked about the four properties of a sequence in abstract terms and in this section back then I said that three of those properties when they come together in a type then the type is called a collection and only the fourth property if it also applies the type is called a sequence so let's go over the four properties again and let's go over the first three ones that constitute a collection and then add the fourth one to get to the sequence that we already know so the first property that is important it has to be a container I mean that makes sense if we look at the name collection collection seems to be a collection of some other things so a collection is always a container like a list for example a list is also a container however a dictionary is also a container why? because obviously it's a container so that's the first property the second property was that it has a finite length or a finite number of elements we called this formally sized and we see of course that the dictionary is also finite the hash table is finite of course if we put in more stuff we will as I said get a new hash table that is twice as big but even the twice as big hash table is finite dictionaries are finite and then what is the third property? well it's called the iterable so we can loop over it and now when we said that we want to be able to loop over something we also said that looping does not necessarily have to occur in an order that we can predict I mean if we loop over something naturally there is a looping order but we cannot predict the order and however for sequences we could so note the fourth property which we formally called reversible or reversibility this means that we can predict an order because if something is reversible there is a forward order however dictionaries don't have a predictable order so let me summarize we have four properties three of which make up a collection the collection as a concept the container property the size property and the iterable property and then the fourth property which is reversible only holds true for sequence types and again the term collection is to be used just as we use the term sequence and mappings mappings as we saw mappings from let's say access to y's sequence without an order so they basically have all the behavior that a sequence has just anything that relies on the order on a predictable order is not given so let's look at the concrete examples here so I can for example ask the dictionary hey how many elements do you have now I get three so what did we put in into from words so let's look at from words again so we have zero is mapped to zero one is mapped to one and two is mapped to two so obviously it's the number of mappings that is relevant here it's not the number of things in it because there are six objects inside from words but when we call len len then we get three and now maybe let me introduce another word I said that the things that are mapped to the access we call them the keys and the things to which they are mapped we call them the values if we look at them together so if we look at a key value pair then we call that formally an item unfortunately some people in the python world don't use the word item so strictly as I do so oftentimes the word item is confused with the term element and the term element is actually what we use over a sequence so the members of a sequence so to say or the objects inside a sequence we call them elements and the items inside a collection or inside a mapping that's what we call an item but it's not so important these type of words here so again but just to summarize the number of things inside a dictionary is equal to the number of items which is equal to the number of key value pairs we can loop over dictionaries and that's interesting because I just said dictionaries don't have an order so let's loop over them and what do we loop over I said forward in from words so from words is the dictionary that maps the English words to the integers and obviously over from words I seem to be getting out the keys right so these are the keys inside a dictionary and that's the default behavior when I loop over a dictionary I loop over its keys I totally disregard the values in other words I go over I go over these the upper half of the array now what is the order I just told you that there is no order that was not the full truth so there is of course an order because as we said in a computer nothing is random everything is deterministic so what is the order here well I tell you the order the first object we put inside we get the index one the second object we get inside we get the index one here zero one and the last object we put in we get the index two Python does and this is only true starting with I think Python 3.6 and but it's definitely true for Python 3.7 which is the one that we are using currently the dictionaries remember and order and what order do they remember they remember the insertion order so that's when so in other words when we loop over a dictionary we get to the keys in the dictionary in insertion order so it's not here the index of the slot that counts but it's the index that we remember somehow magically in which we inserted the objects and this is only true in the last as I said in starting version 3.6 and there is a nice talk that I have in the further resources section there is a guy who is a Python core developer and he's also a very entertaining guy he also has many many good talks that you can listen to to study Python in detail and he actually has a 60 minute talk where he only talks about how dictionaries work internally which basically starts where I just left off here by explaining to you how the hash table works and in a way we are quite old I think they were invented sometime in the 60s or something but then the Python people actually so for a long time everyone thought that we as mankind we know everything about hash tables but then a couple of years ago some very smart Python core developers figured out that hey we can also kind of store an order here and they suggested that and they implemented that and now dictionaries remember the insertion order and it's a good talk but it's very much in detail and I think that's a level of detail that as a data science practitioner we would not need but it's still interesting to know so let's go on can I go over can I loop over a dictionary in reverse order let's try and I can't think and that's the fourth property that sequences have and that collections don't have or collections I mean collections is the more generic thing so a collection may be reversible but if a collection is reversible it's a sequence and a collection that is not reversible is not a sequence and the the dig type is a collection that is not ordered we cannot reverse it I mean by now you understand why in all the previous chapters whenever I talked about these four properties I always used the reverse built in to reverse over the text over I don't know we basically reversed all the types that we introduced but for all the types it worked but now we see the first data type for which it doesn't work and now you may wonder why would I ever need a data type or would I actually use a data type at all if it's not ordered because I mean if I can use the list or the tuple I have an order for free basically but the thing is if you want to model order in a data type this does not come for free as we shall see and then the fourth property is the container property and so we can check if a key is in the dictionary and the answer is true and if I check for example the German word eins this is of course false and I don't get an error so it's the in operator at work here who checks if the operand on the left-hand side is included in the dictionary in this case and all that happens all that means for dictionaries Python goes ahead and it checks if somewhere in the in the upper half in the key part of the hash table we will find this reference that goes to this this object and let me take the chance of how does the key look at work so how does the in operator work well the in operator is very smart because now there are two ways of how we can do searching we are looking up here in the keys area so one way would be we start at the left and we walk through it and whenever we hit a reference we follow the reference and check is the thing we are looking for equal to this now this would be very stupid if we go from left to right the search would take quite some time and that's actually a pattern that we called a linear search I introduced this word at first in chapter 4 so linear search means we go from beginning to end and we check if what we are looking for is equal to the thing we are to the value of the object at which position we are at the moment so we go in the worst case scenario from beginning to end without finding it however in a dictionary the in operator look works differently in a dictionary the in operator is so smart but it takes the operand on the left hand side irons and puts it into the hash function that we saw, calculates the hash value and then knows immediately in which bucket it has to be and then it just goes to the bucket and checks is it in there and in this case so let me go back to the original example of one so one is of course in the dictionary what python here does is it calculates the hash value of one and the hash value of one happens to be a number that would be sorted into the first slot and then python goes directly to the slot number one and sees oh there is a reference so it follows the reference because there can only be one place and it compares it for equality and then if it's equal then it says true so we have one here or membership testing the in operator is also called membership testing works super fast because there is only one slot at which the word can actually be stored and then of course if I do let's say the german word irons what happens is the german word irons goes through the hash function we get the hash value back and then with the hash value we determine the slot into which it should go and then there are two possibilities first let's say irons should be stored in let's say bucket zero and then python goes there and sees there is no reference and then it immediately returns false because it knows okay there is nothing in it and the german word irons can only be in this slot it cannot be somewhere else the second possibility is that it also gets back a number let's say for example sorry let's say the number six so it looks up some slot where there is something in it so all python has to do now is follow this one reference and compare this value with the value irons and this will be different and because it will be different it will then of course also say false so in other words looking up something with the in operator in a dictionary is always fast it will cost us at most at most one reference following so we always determine the bucket and follow the reference if it's there and then we know if the object is in the dictionary or not and also just note here in the example in the code I'm looking at the from words dictionary and here I drew the two words dictionary so it's just vice-versa but don't be confused here you get the idea okay and this is something that we don't have with lists so searching in lists is slow searching in lists means we have to do a linear search and searching in a dictionary means we can do the hash table lookup and the hash table lookup is super fast okay and actually I have a section on that membership testing lists versus dictionary on exactly that topic so let's do a little simulation just to illustrate to you how much faster so I imported the random module and let's create a needle and a haystack so the needle is of course 42 and how do we create the haystack well the haystack has to be big so what I do here is I create I use a list comprehension and I loop over range with a very large number 10 million and then 10 million times I draw a random integer between 99 and 9999 and then once I created this list I append the needle to this list okay let's do that this will take a little bit because this is a huge list with 10 million numbers in it hopefully my computer won't die and it's still creating the list and then now the list is created the list is of course unordered right sorry pardon me we know exactly where the needle is because I appended the needle so now what I do is I use the shuffle function in the random module and pass it the haystack and the shuffle function in the random module what it does is it has no return value we see that I don't even try to capture anything so what the shuffle function in the random module does it takes any sequence and reorders the elements in the sequence so if you read the documentation on the shuffle function you will see that it does not require a list for example it only requires a sequence so it only requires the 4 behaviors that we always talk about and so now if I call random.shuffle with the haystack then the haystack now gets mixed and now the needle is somewhere else in the haystack and we have to look for it so let's just see let's look at the first 10 numbers the first 10 numbers are just random integers and let's also confirm that the needle is not at the end so let's look at the last 10 numbers in the haystack and the needle is also not at the end so the needle is now somewhere in the haystack so how can we do that now that we have a list so the haystack is modeled as a list so what we do is we just do the following here I loop over so what I do here is ok here's the following what I just marked is the important part I only call the in operator so I call needle in haystack and what needle in haystack will do it will let me maybe also just to compare this put an ordinary list up here and this would be the haystack and now of course there is a lot of stuff inside the haystack and now what the linear search will do Python will of course go to the first slot in the list and initiate the linear search so it will compare the first element to 42 and probably it will be no then it will go on compare the second one to the 42 and probably it will go on and then somewhere probably somewhere in the middle of the haystack and the haystack is quite big it will find the 42 it will find the needle and then what a python will do it will short circuit out of it and then that takes some time why does it take some time to do a linear search in a way and in order to see the timing differences I do that 10 times so I I run a linear search 10 times just to see the timing a little bit to make it a bit more apparent what the timing is so now this cell runs and I have the time it magic there and it took about almost 5 seconds that means because I ran the simulation 10 times searching for the needle in the haystack took on average half a second and now let's do the following I use the dick constructor and I create a dictionary from the haystack and note how what we learned in the last chapter fits seamlessly seamlessly here this is a generator expression so I don't create an unnecessary list and we'll get lost anyways so I just write a generator expression and the generator expression loops over the haystack and for every x in the haystack I have to return a tuple and the tuple has its first element the x and the second element will be none and this is a dummy value because the dictionary of course needs to have a key and a value but we only model numbers so we don't need the values so I put a dummy value in here and the dummy value would be none and then I store the dictionary under the variable magic haystack and magic, why is it magic? well because it's a dictionary and now this is created and now I do the same thing again I am looking for the needle in the magic haystack however, note one thing I am running this cell 10 million times so 10 million times I initiate a search and let's see how fast the cell is as a whole it's 600 milliseconds wow, that's just half a second but what did we do in this half a second? in half a second we did one linear search that was the upper cell in the same half a second we did 10 million searches with the magic haystack in other words in other words looking up or searching for something in the dictionary is 10 million times faster than searching for something in a list okay so what do we learn from this? we learn from this that whenever we do need to model an order we cannot use the dictionary in these situations we have to use let's say a list or maybe a tuple however, whenever we have to use a list or a tuple we lose something we have to pay a price and what's the price we pay? the price we pay is we don't have a hash table and by not having a hash table means looking up keys takes forever in other words we have a trade-off here we have a trade-off between having an order modeling a predictable order which sometimes is necessary and looking up something in a very fast way so that's a trade-off and we cannot do anything about this we have to choose one or the other and as a data scientist depending on what problem you are working on and what data you are given you will choose either a list or a dictionary or something else of course there are other types as well what we learn in this course but what we see here is it really pays off to study the various data types in Python in detail because once we understand them in detail we can make better choices when modeling a problem in code so that's the trade-off and how did we get this speed? what is the price we pay? what is the price we pay for the speed? the price is that we have to have as of now at least we will see towards the end of the chapter that I can also mitigate this but as of now I have to pay the price that I need twice the memory because I need keys and values and also note that we have lots and lots of empty slots in a dictionary so a dictionary is always wasteful and the dictionary is wasteful on purpose because if the dictionary were not wasteful with the memory the hashing wouldn't work and if the hashing doesn't work then we don't get the speed so we have we lose in two directions we lose space here the first one is that we have a second dimension so to say and the first one is all the emptiness we introduce another technical term here data structures that don't have spaces in it quite often like the list here we call them dense and then we have data structures that have lots of empty slots in it and we call them spars so that's two technical words of how to analyze the memory in a computer and again here we trade off memory against runtime by using more memory we get better runtime and these days because memory has become very cheap often times that's a good trade off because buying just more RAM is not so costly anymore and the second problem the problem that we have keys and values we will see there is something called the set data type towards the end of this chapter and this means we will get rid of the values here so we can mitigate this problem so let's see what we can do with indexing or what we can do with dictionaries namely we can do indexing but we don't call it indexing we call it dict lookup or key lookup so the syntax here is what we know from lists so this is the indexing operator the brackets but now in the from words dictionary I want to look up two so let's look it up not really rocket science here the only thing that you need to understand is we use the indexing operator as before but instead of putting in an integer we put in often times something else for example a text string of course if we map integers to words just like we did in the two words example then of course we would also be able to write a zero here so let me just show you if I write two words we can of course index zero and get back to the string zero and now if we look at what we see here just by reading this two words index zero we cannot I repeat it we cannot tell what data type this is we don't know if this is a list a tuple or a dictionary and that's what we mean when we say in Python only the behavior matters so all of these data types that I just mentioned have in common that we can index them and depending on what content they have we index into them with integers okay so let's go back here old example and then what happens if I try to look something up that is not in it so if I try to look up the German word 3 for 3 I get a key error because I don't find the key and this is analog to what happens in the case of a list or tuple in the case of a list or tuple I get an index error if the index is out of range and here I just get a key error but a key error and an index error is really the same now what can we do with that let's go back to our people example so remember that people is just a list is three lists of people and let's see I only want to look at the mathematicians how do we do that well I just look up people index operator and then I write the string mathematicians and then I only look at the two mathematicians Euler and Gilbert Strang and now let's say I want to look at an individual mathematician on the list how do I do that what did I just get back by indexing into the dictionary I get back a list here we get back a list of dictionaries so what we can do is we can of course index I say with the zero and now we get back the first mathematician which is Gilbert Strang now the list is of course ordered so zero will always be the same so now I'm looking at Gilbert Strang so now I want and what do I get back now I get back a dictionary I see that because of the curly braces here curly braces and the colons here which lists the items so this is obviously a dictionary and let's say I want to index or yeah index so to say into the dictionary how can I do that well I can chain another indexing operator and let's say I want to look up the name of the first mathematician this would be just a string Gilbert Strang and when you get data from a web API then often you will do this you will often reach deep into the data structure to get something out of the data structure that you want to work with and of course I can also look at the list that Gilbert has and this is a list that contains one string which is his email address okay next topic mutability mutability means I can change an object after it was created in memory dictionaries are mutable so let's look at the example two words so I have the integer 012 map to the strings 012 let's say I want to change it for example I want to translate it to German how do I do that several ways but one way is I just use the indexing operator on the left hand side and I assign to it I assign to an index and I just assign to it a string with the string I want to have so let's execute this and this will overwrite what is in a dictionary for now 012 is mapped to the German words 012 and why does it overwrite well if I try to insert something with the same key that is already inside the dictionary then you know writing into will just you know it will look up the same key and then at its corresponding value reference this will be deleted and will be replaced with a reference to the new object that we create so that's it and we learned that the access have to be unique so that's why we overwrite them here and in the same way I can do something that does not work with lists I can basically add new items to the dictionary so let's add the numbers 3 and 4 3 and 4 in German in the dictionary so we can not only overwrite items we can also add new items and of course we can also get rid of them how do we do that we use the del statement so we del 2 words and then index 0 but it's not the index it's the key 0 actually and I execute this and now the 2 words dictionary has no 0 in it so let's play a little bit with the example of the people list so note that if I look at people I don't have a physicist so I want to put it in the physicist it's a proud species the physicist so let's have one and how do we do that well if I look at physicists I get back an empty list and now if I want to change the list in place how can I do that I can call for example a method on it in this case append to it a new dictionary with the name Albert Einstein note how I use the dictionary to model a person but so far all the dictionaries we have seen in the people dictionary that model people they always have a name and emails and here I just put in Albert Einstein who only has a name but it's okay he can live Albert Einstein he doesn't live anymore so in a way he can live without an email address and of course I can change stuff so for example if I go into people's programmer in the programmers list inside people I have it gives me back a list and it contains only one dictionary in order to obtain it I index into the list now with zero now I have a dictionary and now I want to look at only the name of this person and the name is of course Gido and let's say I just want to give Gido also his last name I can just override it and now I changed it and maybe one more time also regarding Gido van Rasm so let's maybe use the upper cell to change that so Gido van Rasm has two email addresses but unfortunately last year he retired from Dropbox so let's just delete his Dropbox email we can of course also delete use the delete statement and delete something within the NASA data structure it's also possible and now if I print out the people again now I have Albert Einstein in it without an email and Gido has only one email address and he also has his last name so this is how we wrangle with data in memory so now just as we looked at list methods and tuple methods and what else all kinds of methods in the last couple of chapters we also look at the dictionary methods string methods we have also seen before of course so what are some typical dictionary methods for example dot keys so what does dot keys allow us well it allows us to for example loop over the keys now you may wonder do we really need it well I couldn't I just loop over the dictionary and that automatic loops over the keys what it does but by saying dot keys it allows us to be a bit more explicit here and we learned from the sign of python that explicit is better than implicit and then of course what is keys is a special type that is returned which basically means I get a few inside the keys of the dictionary and this is an optimization in python 3 I can look at only the keys inside a dictionary and without copying them this is what the dot keys method really does so dot keys gives me a way to look at the keys inside a dictionary without making a copy and without changing anything it's just a way to look at the keys now if I want to loop over the values I just say dot values and same as with dot keys dot values just gives me back a few inside the dictionary and I only look at the values and of course to complete the picture I have a method called dot items which allows me to loop over the pairs to the key value pairs also works in items of course also a few inside all the items as a tuple in case I wanted to loop over it but not so important here but you get the point and then let's see what other methods we have well there is two words that's what it looks like two words does not have a zero in it and we saw basically what that means if I try to look up zero I get a key error because key is not in two words anymore however if I still want to look up a value if I still want to look up an element or a key and I'm not sure if the key is in there I can use the dot get method and the dot get method takes as its first argument the key I want to look up and the second argument is a default value and in the case that I don't find the key in the dictionary the default argument gets returned so not available here and of course if I try to get let's say a key that is inside the dictionary then like for example the one then I get back 1 because it's in it so get is kind of like indexing just without the error messages and then let's see two words what else can we do we have two words which is the integers 1 to 4 map to German to German words and now I have a second dictionary to Spanish which maps the numbers 1 through 5 to Spanish to Spanish numbers Spanish words and now I can say two words dot update and I pass it to Spanish and what happens is I don't get a return value here that's important and we know already what that means if I call a method on an object and I don't get a return value most likely this means that something was changed in place right so let's look at two words all the Spanish words did actually overwrite the German words and also the 5, the single is now also in the dictionary however two words is still the same dictionary object so the object never changed the object we only created one object, one object in memory and we only played with the contents of it so if we look at from words of course what can we do here we can use dot pop and then we give dot pop a key like zero and what it does it pops off the value and what popping off means number is now we capture it so we get the zero but also the zero is now gone from the dictionary it's just like pop for a list but the pop for a list also works without an argument and here the pop really needs an argument because there is no obvious or no predictable order which means we have to tell dot pop what to pop off the dictionary and in the same way there is a function called pop item and pop item is kind of like it's more like the dot pop that a list type comes with it just takes a random a random item and pops it off of course nothing is random in a computer but the point that I want to make here is because there is no predictable order in a list we don't know we cannot predict which item will be returned and of course if we look at from words now two and two the tuple here was indeed removed so pop item means to also not only get it back but also remove it okay and now comes a nice section we will look one more time at packing and unpacking so we have looked at packing and unpacking before in the section on tuples and when did we use or among others there was also a list unpacking and so on but now we list what is in this section here we look at we will look at we will quickly review also packing and unpacking for tuples and also add packing and unpacking for dictionaries and then we will put it together into one bigger picture which allows us to understand how to write functions in with arguments or parameters in any way we want so let's see so let's create a new a new two words dictionary with the English words again and then there is a second dictionary called more words and this is also English and then note how the integer 2 is mapped to 2 in uppercase and this is just to illustrate a point because the 2 in lowercase is already in the first dictionary and now we create a second dictionary more words with a value 2 in uppercase so just to illustrate a point now let's say and this is often a use case you have let's say you are given two dictionaries and you want to create so to say the union of them you want to take the items from both dictionaries and put them into one new dictionary and but you know that there is some overlap and so there shouldn't and we know by now that there cannot be two keys with the same object with the same value so in other words something has to happen with the two here so let's see how we can unpack the items of the two dictionaries into a new dictionary well we can use the unpacking operator but now because we work with dictionaries we use the double asterisks so a single asterisks unpacks a list or a tuple and we've also seen the single asterisks to unpack a generator for example but now we use the double asterisks to unpack a dictionary and this kind of makes sense because when we unpack a dictionary we will unpack key value pairs so we have to do something for two objects at a time so what happens if I execute this I get a new dictionary now let's look at what items it has it has the zero from the first it has the one also from the first the two it has from the second one this is the first item from the second dictionary and then the three and the four are also there so this is kind of like the union of all the items and the one that goes last here in this syntax this is the one that wins because the second one the key two in the more words example in the more words dictionary overwrites two in the first dictionary but this unpacking syntax is really nice so you often do that when you yeah when you pass values around it actually you want to put inside a function call or something you will see that often when you read more Python code and now let's look at what that means in terms of function definitions and how we call functions and this is now the final section of the final thing I say about how we can specify functions so here's an example function called print arcs and all it does we give it our arguments and all it does it prints them out it prints out all the arguments we give it and we have seen a function called product taking a star and arcs a star arcs before in the end of the first part of chapter 7 and now what we see here is we specify the function by giving it two parameters arcs and quarks so arcs are usually the so-called positional arguments and the quarks are the so-called keyword arguments and then we have here the asterisk what does the asterisk mean here let me tell you it has a different meaning then we have seen the double asterisk before so on the slide before the asterisk were there for unpacking so we unpacked the items of existing dictionaries into a new dictionary but what does the star do here and the double star well what they do now they do packing so whatever in other words whatever I pass into this function when I call it gets packed into one of the two parameters and now the rule is if I call the function with positional arguments the positional arguments get collected inside the arcs and if I call the function with keyword arguments the keyword arguments are collected inside a and now you guess it correctly a dictionary because that's what the keywords are the keywords are really the names by which we pass in an argument to a function so when we use keyword arguments we can think of it as pairs and we cannot we cannot call a function and specify the same keyword several times doesn't make sense because what should the function do it only can specify a parameter once and then what do we do inside the function well arcs is going to be a tuple and because of that we enumerate the tuple so enumerate the enumerate built in gives us back indices for free and then we loop over the arcs for i in arcs and we just print out it's a position and then we print out the index and then we print out the individual arc and then we do the same thing for the keyword arguments and what do we use here for quark because I just I told you that quark is going to be a dictionary we call the dictionary items method and we loop over the items or the key value pairs and then we print out the word keyword and the key value so let's see how this function works in action so that means print arcs number one I can call it without arguments this should work it does not do anything well we have I mean what should it do I mean we we define it to print out arguments but we did not pass it any arguments so it doesn't print anything out so next I call it with three positional arguments abc and what does it do it prints out at position zero I passed in A at position one I passed in B and at position two I passed in C so that's positional arguments what happens if I call the function with keyword arguments well I pass in first equals one second equals two and third equals three what happens now the function says the first the keyword first is mapped to one the keyword second is mapped to two and the keyword third is mapped to three and because of what I'm saying because a keyword is always mapped to an argument it makes sense to use a mapping type and the most common mapping type is the dict type and now we can of course mix them so in this example I call the function with two positional arguments and then one keyword argument and then I get back this output okay so by in other words by adding the asterix and the double asterix to pack together the positional arguments into a tuple and the keyword arguments into a dictionary I make the function usable in many many different ways of course you have to know what you're doing and you shouldn't do that if you don't need the arguments but you are extremely flexible in Python in the way you define your functions and then also in the way the function can be used because the function is obviously used here in three different and four different ways it even works without arguments and now the second version of this is the final one and this now shows everything about function definitions that you need to know so what is different here well first I specify a positional parameter and I don't give it a default value and then I specify star arcs and then I define another parameter with a name and then I define quarks so how does this read well this reads as such the first positional argument I have to provide the function cannot live without it why because there is no default value so I must pass in at least one positional argument and what happens if I pass in more than one positional argument all the additional positional arguments will be collected inside a table called arcs and then we are not done yet so what does it also mean so the required key means I have to provide one argument by keyword at the very least and this argument also must go by the way the required name so in other words this required key parameter here is also a keyword only argument remember the keyword only arguments these are all the arguments that back in chapter 2 we specified to be the arguments that come after a single asterisk so that's why I have to pass in by name the only thing that was different in chapter 2 is in chapter 2 these arguments we use them for example for the scalar in the average events example so in chapter 2 we gave this keyword only argument a default value here we don't have a default value because of that I have to pass it in and then of course exactly the same so once I have a single asterisk somewhere in my parameter list here any argument that comes after it must be specified by name explicitly it's a keyword only argument I cannot pass it without a keyword and then all of the other or all of the additional keywords they are collected in a dictionary called quarks and then in the body of the function I print all of the four out individually and then I can pass quarks, arcs and quarks I can loop over I can have possibly more than one so I have to put the print inside a loop so let's check how this function works so I can can I still call the function without an argument well the answer is of course no because the positional argument is missing let's call the function with one positional argument and one keyword argument this is a very minimum I have to provide otherwise the function won't run and now I can of course go ahead and put in as many positional arguments as I want and I can also put in as many keyword arguments as I want okay and now okay we call this the mandatory one, the required one these are the optional positional arguments and so on and remember that once we are back here once we start with the first keyword argument there must be no further positional arguments so once we see the first keyword argument all the other arguments must pass in by keyword as well okay now let's have a very small topic here dictionary comprehension this is very easy this is kind of like the list comprehension just for dictionaries so let's assume I have my two words example again the dictionary mapping integers to strings and now I want to reverse it to the from words dictionary how could I do that but here is the first naive way I first initialize an empty dictionary and then I loop over all the items in the two words dictionary and what I do then is I put the number in the word over which I loop the from words dictionary in reverse order so I whatever was the key first now becomes the value and what was the value first becomes the key I just reverse keys and values in other words this is inverting a function that's what I'm doing here so let's do that of course it works now I map the strings to the numbers how can I make this in a concise way by using a dictionary comprehension how does it work just like a list comprehension I just write curly braces and then inside it in the beginning I write the expression the key value pair that I want to remain so in this case and then I write a for because I want to loop and I could also add and if actually in the chapter if you read it you will also see that I can add additional functionalities I can also have several values and so on this is just a simplistic example so I loop over the two words items I call I look over them as key value pairs and I store them as value key pairs that's it and if I evaluate this expression I also get a new dictionary that is derived out of the existing dictionary and that's the point so just as we used a list comprehension to derive lists out of existing lists we use a dick comprehension to derive new dictionaries out of existing dictionaries but it's a very nice syntax and this yeah it's also a little bit faster because we know that for looping is a little bit inefficient and also of course memory wise if let's say the for loop is quite long here then Python may initialize the hash table in a too small way in the beginning and then Python has to do all the copying and create bigger hash tables and so on so it makes sense to just go over this here and that's faster and looks nicer so now the question is I want to show you one application of dictionaries so what can we use dictionaries for remember in chapter 4 we talked about the Fibonacci numbers and we talked about recursion and we observed solving the Fibonacci numbers in a backwards fashion was extremely inefficient and now I show you how we can solve the Fibonacci numbers in a backwards fashion in an efficient way so Fibonacci numbers revisited that was the problem in order to calculate the Fibonacci number for let's say index 5 I have to calculate the Fibonacci number for index 4 and this requires me to calculate for index 3 and 2 but in order to calculate Fibonacci of 5 I also need to calculate Fibonacci of 3 so we have repeating function calls and this is an example of exponential growth I don't want to talk too much about this here if you want to look it up go back to chapter 4 we covered this in detail and the problem was that we have the same function called over and over and over again so what is the solution the solution we had in chapter 4 was we calculated the Fibonacci number with a for loop from beginning to end but now what we do here is we still calculate the Fibonacci number from backwards to forwards from end to start and we do it in a way that is still efficient how can we do that well let's do this we have a very cold memo in which the 0 and the 1 are mapped to 0 and the 1 what are these numbers these are the first 2 Fibonacci numbers so maybe real quick the first the first 2 Fibonacci numbers they are defined as 0 and 1 and the next Fibonacci number is just a sum so this would be 1 this one is 2 the next would be 1 plus 2 is 3 and the next would be 2 plus 3 is 5 and so on but the first 2 remember by definition and because they are like this by definition we put them into the memo and what's the memo a memo is a thing where we can look up things real quick so what do we want to look up we want to look up on the left hand side as key we want to look up the index for which we want to calculate the Fibonacci function and on the right hand side as the value we will have the solution to calculating the Fibonacci number with the index coming from the left side so in this case it wouldn't make a difference if we mess it up but we have to know what we have to map from where to where and also note it is essential here it is absolutely essential here that the memo allows us to look up keys in what is called constant time and how does that work well just as any dictionary that is based on a hash table whenever I want to look up a key all that is done is Python takes the key puts it into the hash function gets a hash value and then after getting the hash value it calculates a bucket and then it just has to look up it just has to look into one bucket and if nothing is in the bucket if the bucket is empty then we know that we have not we don't have this in the hash table anywhere else it can only be in one position and this is essential if we don't have this property the speedy look up in a hash table then everything I'm about to show you wouldn't work but now let's create this memo that's where the idea where the technical term memorization comes from so memorization basically means we calculate something we store it away and whenever we need it we just look it up and so let's look at the fibonacci function one more time so fibonacci takes an index i as the argument the debug on the right hand side I just built it in with the immediate output and you can look at this in detail when you read the chapter then we have some type checking going on we don't check if i is an int but we just check if it behaves like an integral so this is called goose typing in chapter 5 at the end of chapter 5 we talked about that and then after we do the type checking we do the input validation so fibonacci index it cannot be negative and then we do the following our new fibonacci function does the following at first before it does any calculations it looks up if we have some time before already calculated fibonacci for i and if we have so we have stored it in memo that's the assumption so we look with the operator membership testing we look if the index i is in the memo as a key and if it is we just return its value and remember that the value to which we map the index is just the result of the fibonacci function which is a fibonacci number okay so that means before we do anything in this function we check if we have already calculated fibonacci for this particular argument before if we have so we don't calculate it we just return it we can look it up for free basically in the memo that's important then the debug here it just ignored it just brings out what is it just brings out that I mean let's not ignore it maybe so what what does it do if our function is in debug mode then what it does is whenever we reach whenever we go past the memo checking here and looking up in the memo then we just print out that we are currently calculating the fibonacci number for a given index by the way return memo as index i is also an example of the early exit pattern meaning if we return the memo from memo here then this will not be executed and then what is down here well down here is basically the same logic that we have seen in the recursive formulation of the fibonacci function in chapter 4 the fibonacci function calls itself with i decremented by 1 and whatever we get back from here eventually will be added to the return value of fibonacci of i-2 whenever that returns and the problem back in chapter 4 was that it took forever until these two functions returned that's what the exponential growth was so and then we put that in a variable called recurse and then after calculating the value what we do is since we calculated it we put it in the dictionary in the memo with the index i and recurse is the solution of course and then we return the recurse value in other words this function does the following either we have calculated the certain fibonacci number before in which case we just look it up and if we haven't done that before we will actually calculate the number however we will only calculate every fibonacci number only once at most because once we have calculated it it's in the memo and once it's in the memo we never calculate it again that's the solution to the exponential growth problem in chapter 4 and again this only works because the memo allows us a constant time lookup with the keys via the hash function so let's call fibonacci of 12 and set debug to true and now we get back 144 which is the fibonacci number with index 12 to 13 and now the interesting thing is if I call if I make this call again a second time I get back 144 but where is the debug info the debug info is gone what happens if I call it again well I just get back 144 and what does that mean let's look again in the function so whenever nothing is printed we know we must have returned here in other words we did not calculate anything that means if I don't see an output here we didn't calculate anything we just looked up the number and that's of course true because in the global namespace and if I look in memo the memo now contains all the fibonacci numbers from the indices 0 to 12 now that means if I call fibonacci now again it just doesn't run it just looks up the number here however of course if I calculate now let's say fibonacci with the index 15 then three more fibonacci numbers and if I now update or now look at memo again three more numbers are in the memo and if I run it again just to illustrate the point nothing is calculated anymore so this is how the memoization works it's like cheating we calculate the solution once store it away whenever we need it we just look it up one reason why I don't like this function whenever a function is called and changes a global object or something that is outside its function scope we kind of don't like it because that means calling the fibonacci function here has a side effect of changing the global memory and the side effect is of course intended here however we don't like it it's not good coding practices to do that and so here in this presentation but what you can read in the chapter is yet another implementation of the fibonacci number a function where we also use memoization but we don't have a global memo and if you are curious to know how that works you have to read it up in the chapter and now just the last a bit of information regarding the regarding the fibonacci we said that the recursive formulation of the fibonacci function is of exponential growth which means it's super slow and that means we couldn't calculate the fibonacci for a large index and the largest index for which we calculated the fibonacci back in chapter 4 was actually index 36 and that already took several seconds now I want to calculate the 100 fibonacci number and again it works super fast to calculate even a larger fibonacci number it's like the 1000 fibonacci number super super fast microseconds and again we get this speed from looking up the values in the in the dictionary and this again the speed where does the speed come from this would be a very good exam question the speed comes only from the hash function that takes the object, the key that transforms it into a hash value which then determines in which bucket we put in the object or the reference to the object ok so this is the first part of chapter 8 there is a second part which is quite fast to go over so we will just do that so chapter 8 is called mappings and sets and now we have talked about mappings and the main example of a mapping was the dict type in this second part of chapter 8 you will find many specialized mappings but I will not show them here in this presentation but you should quickly read over it because python comes with the so called collections module in the standard library which contains data types that are special cases or special subtypes of the dict type that model different things that the dict type doesn't for example they model a predictable order one of them there is one called counter which you can use whenever you want to count things and it is a lot more efficient when you are in this world so there are many specialized data types and if you know what data type you use for a given problem then often times you can just solve the problem in a faster way obviously but now we are not looking into more mappings so mapping is also of course an abstract concept and the dict is just an example of it now we look at something else we look at sets and sets are really nothing new we have actually learned everything we wanted to know we have already learned we just want to see a little bit more syntax of how we use them but the theory regarding sets we already know so python has a set type and the set type models what in math we would also call a set so example how do you create a set where you put the curly braces there and you will now say well wait a minute curly braces haven't we just used them for the dict type and I will say yes we have but the syntax for the dict type was a little bit different in the dict type we would also have colons right to map a key to a value here we don't have colons okay and then also note that these are the same 12 numbers that we see as always you know from starting the first chapter the same example however see that within the curly braces I have two 7s and at the end I have two 4s so if this were a list or a double this would not have 12 elements it would have 14 elements because the 7 and the 4 would be double however this is a set type so let's create the set and let's look at numbers first numbers is of course numbers is of course an object on its own it has a reference it has a type which is set and what's the value well the value is really the numbers from 1 to 12 and now this is interesting here so what you see here is first the 7 is only in their ones the 4 is only in their ones so double entries don't exist and also somehow the values and numbers are ordered and when I created numbers up in the first cell it was definitely not ordered so the question is what happened and our numbers is the set type ordered so let's look at a different way to create sets so the first way with the curly braces is called the literal notation again and now we can also use the set constructor in order to create an empty set I call the set constructor with no arguments and now I have an empty set now why can I not use the curly braces to create an empty set quite easy if you put just two empty or just the curly braces without anything in between you will of course create an empty dictionary so Python doesn't know it, you want to get a set and not a dictionary so because of the double meaning of the curly braces in order to create an empty set you have to use the set constructor and then of course how does the set constructor work well if you read in the documentation the set constructor takes any iterable and of course it has to be finite so any finite iterable so I give it a text string here and what set does it gives me back only the unique letters in the word the sets are good for they work just as sets in math and that means any object that exists can only be contained in a set once and the important thing by what do we mean here is any object with regard to its semantic value can be in a set only once that's really what we mean so if you were to put the integer one into a set and then we tried to also add the float 1.0 into the set then at the end of the day we cannot put the 1.0 into the set because the integer one and the float 1.0 they evaluate equal they have the same semantic meaning to us and because of that only one of them can survive in the set so some subtleties again here so how do sets work as if they are dictionaries just without values and that's the whole idea when I told you what is a hash table I said okay one downside is that a hash table uses a lot of space unnecessarily but we get speed because of it but now what is a set well basically a set is a hash table without that so maybe in other words if we I'm not going to try it but if I were to try it a set would actually look like the list the only thing that's different would be that in a list we would always start to put in the reference from the beginning in an ordered fashion and if we wanted to put the references inside a set the array type but then we would also do the hashing for the objects that go into the set and then the idea of how we put something into the hash table here for a dict is exactly the same as we put things into the set into a set type into the hash table that is underlying the set so really in all the theory from the dictionary also applies here just without the value row here so let's see what that means that of course means I cannot put anything that is mutable inside the set I will get a unhatchable type error here and then yeah that's of course it and then what can we do what can we say with regard to the behavior well sets are of course finite they are sized we can loop over them and somehow I seem to get an order here the only reason why I see that in order is because we saw that an integer if we hash it we get back the same number as its hash value and the sorting inside a set is basically somehow related to the hash value that's the reason, the technical reason why we see an order here but there is no order trust me just like there is no order for dicts no order for sets and now there is even one thing that is different between sets and dictionaries in a technical way so we remember that in a dictionary we don't have a predictable order but the dictionary is still able to remember the insertion order however a set is even more optimized so a set does not remember the insertion order the set really just is something that we use to make sure that we only have unique things in it that's what the set is good for and also of course the membership testing looking up things that is also what a set is very very good for because the key look up for sets work exactly in the same way as for dictionaries and that's what we want to use sets for so in other words here we do have an order it seems like we have an order but we don't have an order for the entire point however we can still loop over it so a set is still iterable and now to see that a set has no order we try to reverse it and see and want to see if it's reversible but then the type then we see a type error and it says the set is not reversible but the container property is still there so zero is not in numbers and let's see if the integer one is in numbers and let's see just to illustrate a point if 1.0 the float is in numbers and the answer is true as well so even though the float 1.0 is of course not in the set because the set obviously only contains integers the look up works by value, by semantic value and the semantic value of the 1.0 float and the one integer is the same in other words the 1.0 float and the integer one have the same hash value which means they end up in the same bucket which means they evaluate equal and again this operator, the in operator which is called the membership testing operator is super, super fast for sets it was already super fast as we saw in the haystack example for dicks but it's again also super fast for sets and that's what we use sets for and now the problem is because sets don't have an order there is no way to look stuff up so let's try to look up the first element so to say in the set and of course I get an error message type error set object is not subscriptable and the reason why, well because there is no order without order the indexes don't mean anything sets are mutable so let's look at some set methods so let's see let's first just add another line of code what is numbers well it's numbers from 1 to 12 so let's add number 99 and now the 99 is in there the add is basically like the append for lists however append means we put something at the end of a list but because there is no order there is also no end that is why the method is called add so even though the 99 is shown at the end it's not at the end really because there is no order and then update is the corresponding method to the list method extend and extend takes an interval and inserts all the elements of the interval but here for the set type update takes also an interval so range 5 will give me back the numbers 0, 1, 2, 3 and 4 and they are now added to the set but however the numbers 1, 2, 3 and 4 are already inside and because of that only the 0 gets added I can of course also remove numbers with the remove method so let's say numbers does remove now the number 99 has been removed and what happens if I call the method a second time I get a key error and now the key error may sound totally weird but I mean we are the keys we don't talk about keys really if we talk about sets but because of the same underlying implementation as dicts are we see this error method that there is a key error so now there is another method called discard and the difference between remove and discard is the following if I discard let's say the number 0 now the number 0 is gone it's not in the list anymore I don't get an error message so in other words the remove method fails loudly and the discard method fails silently and I'll see numbers only has the numbers 1 to 12 and now let's look at some set operations so sets they overload some of the so called bitwise operators that we saw in the chapter 5 on numbers for the integer type in particular and why do they overload them as we shall see you know as there is something called set theory in the field of math and maybe this I can draw to make it a little bit more intuitive so let's go ahead and starting with an example we create a new set called numbers which is the numbers from 1 to 12 we create another set which is the 0 set which only contains the 0 and then we create the set called events which contains all the even numbers from 2 through 12 so maybe let's do what do we do let's do the following let's draw a box and let's put the numbers in there maybe the 0 1 2 3 4 5, 6, 7, 8, 9 10, 11 and 12 and now we want to basically show let me get another color so we will take black as well I need 3 more colors so pardon me now what we do is I color all the numbers in the sets and that they are in so first numbers, the number set has the numbers 1 to 12 so the 1 would be in there 2, 3, 4, 6, 7, 8, 9, 10, 11, 12 so that would be numbers and then there is the set called 0 and let's do that here 0 and this only contains the 0 the number 0 and then lastly we have events and this contains the numbers let's do this 2, 4, 6, 10 and 12 so this is how the sets correlate to each other and let's see what can we do typical operations in math in set theory is to take for example the union and how can we do that? we can do that with the bitwise OR operator which is the pipe character which is also called union so let's take the union of 0 and numbers means let's take all the numbers that are in red and or in black or both so this will be a set of all numbers from 0 to 12 so let's see it works and just to illustrate a point we can use the operator but we could also call this method here the union method on one of the two objects and get the same result so I personally prefer operators but we could also use the methods so for every operator in the set case there is also a corresponding method but again I think the operators look a bit nicer and now we can also extend this so we can take the union of all sets and this will then of course be the same set here why? because the green set is a true subset of the red set so we don't get any new elements then what we can do is we can look at the intersection the intersection of the black set and the red set is 0 because there is no intersection so what should be the answer? the answer is of course the empty set and what character do we have? this is the bitwise and operator and yeah which is also sometimes called yeah which is also called the end basically what it looks like and then we can also take the intersection of let's say numbers and events and this will of course be only the events because how do we see that? the events are a true subset of numbers so what is the set that contains numbers that are in both sets so it's of course only all the numbers that are both red and green here that's nice we can subtract so this is now an arithmetic operator we can take all the numbers minus events this will give us back a set that contains all the elements that are in numbers but are not in events and these are of course the odd numbers right? these are all the red numbers that are not crossed out in green and we can take that around we can take all the numbers that are events minus the numbers and this will of course also result in the empty set and this difference in other words taking a difference is a symmetric operation now an asymmetric operation so it depends on what we subtract but of course if there is an asymmetric difference there is also a symmetric difference and the symmetric difference uses the this sign here the carrot which is also a bitwise operator and this again gives us the symmetric difference and the symmetric difference in set here means all elements that are in one set but not in the other so it's an exclusive either or and this is how the operator works and that's basically it and then we only have I think one more slide yes that's the last slide finally set comprehensions they work exactly like dictionary comprehensions just that we don't have the code on here okay and I think this here now the set comprehension looks very very close to math because that's usually what mathematicians say they say some set is derived from some other set by some rule so this is really as close as it gets to math okay so this is now the end of chapter 8 so again it was quite a long chapter but what did we learn we learned that there are indeed collection types that are not sequences why were they not sequences because they didn't have an order a predictable order and mostly we talked about the dictionaries they behave like functions they are mapping from access from unique access to possibly not unique wise and you remember now and this is an important idea that dictionaries are implemented via a hash table and hash table means we waste on purpose we waste lots of space in memory but we get a very good speed when we try to look up keys so when we want to search for something in the hash table that is super fast and we call that a key look up and that's one of the primary reasons to use it we've also seen an example of how a nested data structure looks like this is what you will often see when you get data from the internet and other than that I think we have just seen a lot of things we can do with mappings and also from concept wise programming wise we have learned about the idea of memoization which allows us to finally make the recursions from chapter 4 that were rather inefficient back then allows us to model them in an efficient way and when we continue the study of memoization then we get into a field called dynamic programming and this is I think super important because dynamic programming helps us to solve many many problems in the field of supply chain management anything in logistics that has to do with optimization often times can be solved with a dynamic program and so knowing memoization is a good trick and the sets of course may be helpful if we want to ensure that we only have unique elements from some sets so imagine you are given some data and you want to find out what are the unique things in this data set that you are given then sets are your way to go so you have seen many many data types chapter 9 the next chapter will be on arrays and data frames and arrays and data frames will put together what we have learned on sequence data and on mappings in a way so the next chapter will put together the previous two and yeah so I will see you soon