 So I hope you enjoy the keynote. Welcome to the first talk after the keynote. So Gisela Rossi is gonna be talking to you today about dictionaries and what is behind the scenes and she's a software engineer and co-organizer of PyLadies London, which is a really nice meet-up that I Advised you to check up. I have been going there for like three months and they do a really good job. So please give her applause for Gisela Rossi. Thank you all for being here. It's really nice to see you. Asa and Asa, my name is Gisela and today we're gonna be talking a little bit about dictionaries and what's so beautiful about them. So this is a quote that was in a call in a book that I was reading. It's the beauty about things changing so much. It's a lot of that book is already obsolete, but But dictionaries are everywhere. It's not only in the code we write as users, but also behind the scenes. So the classes we define have dictionaries behind for the attributes in namespaces for modules. They are literally omnipresent and because they are omnipresent, they are constantly changing, constantly evolving. So this talk will have three parts. The first part we'll talk about the main engine behind dictionaries. So namely hash, hashing, and what you can tweak about this if you need to personalize. The second part we'll talk about some security stuff with dictionaries. It's a topic I'm very interested about. I have been working on security for the last year and I think you would like it. Honestly, I think it has been a quite interesting evolution. The third part we'll talk about some misalignation optimization dictionary have had. The selection is completely personal. I think they are very interesting and there are some quite tall from like nine years ago and some that have been quite recent added in the Python 3.6. So let's start with hashing. Let me start from the beginning. It's just in case someone is not familiar with this. I don't want to lose anyone right now. So what is a hash function? A hash function is a function that can be used to map an arbitrary size data into an integer. And it must satisfy that if two objects are equal then their hashes will be equal. But we will not settle here if we talk about a good hash function because a good hash function should always be efficient and it has to minimize collisions. We'll talk more about collisions in a second. So what do we do with hash functions? We build one of the most beloved data structures in the world, the hash tables. Hash tables are amazing and you probably run into them many times. You may be even running them into coding interviews, which are an all-time favorite, aren't they? And the beauty of them is that they are very efficient. Most of the things you want to do with them are doable in constant time. And this is why Python dictionaries and also Python sets are implemented using hash tables. And here we have an example of something that is quite puzzling when you just start coding. It's talking about, okay, I create my list. I try to add it as a key to a dictionary and it says type error and hashable type list. And this is because we have this data structure behind as the main engines for dictionaries. And we say, we have type error and hashable. Okay, fair enough. Give me the definition of what hashable is and I'm gonna read it with you. An object is hashable. If it has a hash value, it would never change during its lifetime. This is the official definition and it can be compared to other objects. Hashable objects, which compare equal, must have the same hash value as we set as a condition for a hash function. And hashability makes an object usable as a dictionary key or a set member. This is exactly the type error we run into. And now let's digest this definition a little bit more, right? So it says it needs a hash function. So this under the scenes needs a DunderHash method. And let's digest it a little bit more. We never change this in lifetime. This means that it cannot depend on multiple attributes. And this will be important if you need to customize this behavior for your own classes. And it can't be compared to other objects. And this basically means that it needs a DunderHash method. This will use a defined object hashable. If I define a class with this hashable, well, by default, yes, and it's the identity. But this is personalized. This is something you can personalize. And this is something you have to proceed with caution. But hopefully this will serve as kind of a roadmap or on the things you don't want to stumble upon and the things you do want to do. So you can override them. If you override, so you already saw in the definition that you need, you need a DunderHash and a DunderHash. And what happens if you override one and not the other? Or what happens if you override both of them? So if you override DunderHash but you don't override hash, Python automatically says this is unhashable. I'm not going to consider this hashable. And I said DunderHash to none. OK. What happens if you don't want an override egg but you still want unhashability? Well, you can do this yourself. You can set DunderHash to none and this is done. OK, fantastic. But I do want hashability. So what do I do? I mean, if I do want, what can I do? Well, you can define these methods yourself, but you have to make sure the attributes that you use for the definition never change. And this is part of the definition we saw when we saw hashable, which is fine. And then here I put an example of how you can do it. So you don't need to become a cryptographer to define DunderHash. You can use the built-in. So basically imagine I'm defining a class person and I want to say if two persons have the same name, I'm going to consider them the same person because I have this weird system. Then basically I use what is very common to use is just a tuple of those immutable attributes. And I'm done. This is the way I'm going to go about it. Let's talk about security now. And there has to be quite an interesting evolution on dictionary security. So the first version of our dictionaries was use a non-crystographic hash algorithm, kind of a variant of FNB. This was very efficient, but it was exploitable to a deny observer's vulnerability. What does it mean? If an attacker would send you a key designed to collide, then it could trigger the hash table worst case performance, which is quadratic. Nobody wants a quadratic performance in their things that are everywhere. So then we came up, well, not we, I wasn't involved, but as a community we came up with an evolution for this. We came up with saying, okay, we will keep the core of our algorithm, we will keep FNB, but we will try to add some randomization to it. This was actually configurable with Python hashed, so you could turn it off by setting Python hashed to zero, or you could play with it if you know what you're doing. And although this was never abused, it was rightly pointed out that it didn't quite solve the problem at the bottom of it. It was still theoretically possible to abuse this. And it also created some performance issues because this was considerably worse than the first version. So we got to the one we have today. This was introduced in PEP 456, and it's basically this PEP introduced a family of options of cryptographic functions that we could use to replace FNB. Basically, from this, what the beauty of this is that the analyzes, compare and say, okay, SIP hash is gonna be the one we will be using now. And this will solve the problem at the bottom of it, and also improve the performance we had in version two. So what's the state of the supported part to Python versions today? If you're using a Python supported version, where are you? There are a couple of two seven that they still use original and randomized, so version one and two, but all currently supported Python three versions use SIP hash. And this is at least something, and until something better comes out, this is gonna be what we will be using. Okay, sorry, optimization. So as we said, Python dictionaries are everywhere. And because they are everywhere, they are highly optimized. And this means not only we base them in a structure that is efficient in itself, but we then start to tweak and tweak and tweak until we get to the best we can do. This is an optimization that was proposed by Hettinger. And it's actually a space optimization. We have on the left, what was until Python 3.6, I think, how our hash tables would look. You see that there is always a lot of empty space, and if you start in certain elements, at some points it will just resize because it has to guarantee that there is a certain percentage of space free at all times. But what does this mean? This means that basically you are wasting a lot of space and you are guaranteeing that you will waste a lot of space all the time. Exactly, for every empty road, you have 24 bytes that you're wasting. And now we have the one on the right. So basically we keep a dense table with all the entries and then we keep just an array of the indices to see which ones are the ones we have been adding. This only affected the data layout, it didn't affect the hashing, it didn't affect the algorithms in place. So it was a very good change. It was significant memory savings because on the left side, on the old version, as I said, you were wasting at least, at least not exactly 24 bytes per empty road. And on the right hand, on the new version, depends on the size, but you are saving at least 20 bytes. Iteration became faster, resizing became faster and it touched less memory. And now dictionaries remember the order of the items inserted. This is actually a side effect, I think, but I think it's something that is quite nice for dictionaries to have at the moment. This is an old one, key sharing dictionary. And actually this one is related to the one I'm explaining next as well. So behind the class attributes, there is a dictionary and if you were here yesterday for the Python object model talk, you saw exactly how to access that dictionary and that is not so weird to do it. Okay, so there was a pep 412 that basically says, okay, so my dictionaries are gonna have these keys that if I instantiate my class several times, they all have the same keys, okay, maybe they can share these keys. I don't need to replicate the name, I mean the value will be different but the attribute name will be the same and this led to a memory reduction of at least up to 20% in some cases. This was a change from 2012, I think, but this is still today present in the dictionaries we use. And this is actually linked to that one. This one is also quite an oldie. So basically what it means, how to explain, sorry. Dictionaries with the string keys are handed differently. So what does it mean? It gets to a point where it needs to compare if two things are the same. If this is any Python object, then the comparison is quite difficult and it needs to call down the rack and it needs to do a lot of stuff but now what if it's a string? Comparing the strings is something much easier, much simpler. So actually we can introduce this optimization. So if we're talking about just strings, then I'm not gonna call down the rack, I'm just gonna compare them. And another thing that is good about this is that there is no possibility of regs in an exception. So this is something that is also quite linked to the other one because classes attributes names are also strings, so it's hand in hand. So that was my talk. I don't know if I spoke too fast. I think I did, it's only 15 minutes but I tend to do that, sorry. So yeah, thank you. I really hope you enjoyed it. What I wanted for you to take up to this is the will to kind of go and research more. This is a very interesting and constantly ongoing process. I'm also gonna leave this, I'm also gonna probably tweak the slides because it has all the reference to it. And yeah, I hope you enjoyed it. I'm sorry for speaking so fast, I tend to do that. I personally think that the space was right, ladies. I could follow. So unfortunately, we're not gonna have questions. I'm gonna be right. I mean, not right now. However, she's gonna stay here all day and we're gonna have the coffee just right after this talk. So you could ask her any questions or talk to her about anything that you might like. So again, thank you, Stella, for the talk. I really like it. Thank you. I'm appreciate it.