 Welcome back, everyone. Hope you all are enjoying the EuroPython conference. I'm Anmol Soseva, and I'll be your host for the three upcoming sessions. There's a small announcement before we begin with our next talk. Sprints are starting tomorrow, and there is a chance to find more developers and even have collaborators for your open source projects and educate people about them. But please consider listing your project on the Sprints registration page. The link for the same has been shared on the Matrix channel. It's time to welcome our next speaker, Eli Holderson. Holderness, sorry. So Eli is a developer advocate at Anvil and has worked in industries like telecom, biotech, and analog circuit design. Today, we'll be talking about Cpython's dark memory magic. Interesting. So over to you, Eli. Good luck. Hello. Thank you very much for the introduction. As Anmol said, today we will be taking a short tour of Cpythons. Well, Python, in general, and how it handles objects in memory, with a particular focus on Cpython's garbage collection as well. All right, slides are up. Good to go. All right, so today's talk, pointers. In my Python, it's more likely than you think. We're going to be talking about three things. The first of which, if my slides will increment for me, wouldn't be a conference without technical issues. There we go. The first of which is what pointers are and where you'll see them in your Python. The second is what the ID of a Python object is and why it matters. And lastly, we'll be looking at how Cpython can tell when you're done using an object in memory and what it does next. So let's kick off with what is a pointer? A pointer is a code object that basically references another. It points to another object. So for example, if you have an object in your Python name space, its name is a pointer to that object. And it's a concept that comes up a lot in C programming. So here we've got what pointers look like in some C code. You don't really need to care about this. And in C, when you declare a pointer, it actually matters what the type of object that it points to is. So void is agnostic. You can have something that points to an integer or something that points to a character. But this is Python and it's duck type. So you don't need to know about any of that when it comes to writing Python code. Where you'll see this come up really often is in pointer aliasing. This is something that tricks people up a lot when they're new to Python, which is also known as that phenomenon where something changes, but you aren't sure how you've changed a variable without realizing that you did it. So we'll see an example here. I've got a list and I've called it A. And yeah, just set that up in the name space there. And what we're gonna do is we're gonna set B is equal to A and then we're going to edit that list by accessing it through the name of B. So cool, we've got a list B and we're changing what the second word of the list says and we're changing it from cool to awesome. But you can see that we've also changed our list A by having changed B because A and B were both pointers to the same code object, even though they're two different names for the same thing. And if we don't want this to happen, we want to have a version of A that we can edit without editing the original. We can use the copy method. So instead of just setting it equal to A, we can set it equal to A copy. And what this does is it creates a new list object and then copies all the contents of that over. So if we want to change the second word of our list C from awesome to amazing, we can have that and we can see that it hasn't actually changed the value of the list A. This is all well and good, but when we create a copy of something and we copy all of its contents over, so all of those strings got copied over, what if they had been lists that had been copied over? So what if our pointers, so our pointer A to a list, what if there were pointers inside that list? So here we have a different list, it's a list of lists and what we're gonna do now is see how copy is not always sufficient. So I've got this list of lists, I've made a copy of it and I'm accessing the first element of it which is itself a list and I'm appending a new element to it with append C. And you can see that that has changed the list that lives within A as well. Even though we did a copy, we're still changing the contents of A because the elements of the list that got copied over when we did that copy method are themselves objects that are pointed to. And we can get around this, even if it's pointers all the way down with the deep copy method. And what we're doing here is we're instead of, we've got our list of lists A and instead of just using copy, we're gonna use deep copy and now what that does is it not only creates a new list object, but for every element inside that list it's actually creating and copying that whole cloth. It's not actually just sort of pointing at the same objects that are in A, it's creating new versions of them with the same values. And so now if we want to see what happens if we try and change some of the elements of our new list, we can see it's been changed in the new list that we've made C, but actually if we go back and look at our original list A, we can see that we haven't changed the contents of that. So deep copy is for when you actually want to create an actual copy that is not pointing to the same objects in memory, but new objects with the same values so that you don't end up with point A this thing. Just to recap, equals makes new pointers to the same object. So if you've got a list and you say list A and B is equal to A, both B and A are pointers to the same object. Copy will make a new list object, but any pointers that were contained within the original code object, you'll just get new pointers to that those objects. And deep copy copies the entire thing or rather recreates the entire thing and gives you a completely new object in memory. You wouldn't always want to do this even though it protects you from pointer aliasing and changing things when you don't intend to because it's more memory intensive. So depending on what you actually need your code to do, you might choose one of these three methods. Right, I promised some tuples behaving badly in the abstract. And the idea of pointers and mutating things in places obviously mutating things really closely related to mutability of objects. So tuples are immutable. Once you've created a tuple, all of its elements have to be the same for the lifetime of that tuple, but those elements can be pointers. So let's say that the first element of our tuple is a pointer. What if we change what the value of that that it points to? So let's say that we've got a tuple that contains lists. Now the first element of this tuple is a list 123 and we can append to it. The value of the first element of that tuple hasn't changed. It's the pointing to the same object in memory, but that object in memory has been mutated. But one thing we can mutate it directly, but what if we want to do something really similar and use the plus equals operation, which is modifying something in place? It'll throw us an error because we're trying to assign to the value of a tuple, an element of the tuple. We're trying to set a one equal to something which you can't do in a tuple. And we expect this, right? We expect it to throw us an error. You can't assign to objects within a tuple. But if we then print out the value of the second element of our tuple, we'll see that that Z that we tried to add to it is in fact there. And this is because when you do plus equals, the plus mutates the object in place and then tries to do an assignment. And it gets through the first part as we saw before, you can append to a list in a tuple, even though the tuple itself is a mutable. You can't then assign to it. So it gets through part of it successfully, adding the Z to that list, but then it fails, it gives us an error at the assignment part of it. So a bit of unexpected behavior when you're trying to mix mutable and immutable types in that specific situation. All right, gonna move on to object IDs. And when we're thinking about B and A and they're both pointing to the same object, like how can you tell when two objects are the same? Well, every object in Python has an ID and you can get it by calling the ID built in on that object. An ID is unique, no two objects can have the same ID if they are not, well, no two different objects can have the same ID. If two objects have the same ID, they're the same object. It is constant, the ID of a given object will not change over the lifetime of that object. And most Python implementations use the object's address in memory as its ID. So that's what CPython does, but not all of them. So for example, Sculpt, which is a Python to JavaScript compiler actually just generates and then caches a random number for every given object when the ID is queried for it. And this is gonna be true of any JavaScript-based Python implementations because of the way that the JVM works. So we're gonna focus on CPython, essentially, all these examples that I'm about to show you from CPython, because it's the most common distribution. We've got a list again, and we've called its ID built-in on it and it's given us a number. If we set B equal to A and we check the IDs of those objects, you can see that they are the same number. They're two pointers to the same object that has the same address in memory. So they've got the same ID, rather. Now, if we make a copy of A, it's creating a new list object in memory and then copying all the contents over. So when we call ID on them, we can see that it is different. These two objects are genuinely different, whereas B and A were the same. They were just two names for the same thing. So, some of the ways we can tell whether two objects are genuinely the same is with these two, oh, sorry, let me go back. My goodness. Sorry. Let's set up this situation again, where we've got A is a list and hang on, what is it? A is a list, B is equal to A, or we've set B equals A, so we've just made another pointer to that list and C is a copy of A called use created by calling the copy method. All right. What happens if we use some of the Python's ways of evaluating whether these two things are the same? A is equal to B, true. A is B, it's also true. But when we look at C, we can see that the double equals operator gives us true, but is gives us false. A is equal to C, but it's not C. So why does this happen? So the is comparator uses the ID of objects in order to determine its true value. So A is B is exactly equivalent to saying that their IDs are the same, whereas the double equals operator uses the double underscore equals method. So what is that? We're quite familiar with double underscore methods. I think in Python, you might have seen our double underscore string, double underscore wrapper, or double underscore in it. It's obviously a really common one. And these are also called magic methods or Dunder methods for as a shortening of double underscore. And double underscore equal method defines the behavior of the double equals operator when it's applied to instances of the class that the double underscore equal method is defined upon. So we can see an example of this. Here I've got a user defined class and I've defined an equals method on it. This takes the value of self, so the object that's in question and whatever it's being compared to, which here I've called other. And all I'm doing in this method is returning self is other, which as we just saw is equivalent to saying that their IDs are the same. This is actually the default behavior for any user defined class in Python. By default, two instances of it will only compare equal if they have the same ID. What we're gonna do now is look at a class that's got, I've given it a name just for the clarity of the examples that I'm about to show you. And I've defined its equals method to always return true no matter what. And when you're overriding Dunder methods, you have to be pretty careful because sometimes it can result in some unique behavior or a sort of confusing behavior. So we're gonna see an instance of that. I've got two instances of this class that I just defined. One is called A and one is called B. So they have different names. But they still compare true as class instances because I overrode that equals method to return true no matter what. Any two instances of this class will return as being equals, which is, you might not expect that. They've got different names, but we've told the equals operator that they're the same thing, always return true. We can of course do the opposite, always return false. And here we have a unique class that will never compare equal to another member of this class, another instance of this class. Not even itself. So we've instantiated my unique class and said is A equals to A? Well, we're using the double equals, double underscore equals operator. And as we just saw, that always returns false even if the instances are actually the same object. So you can see we can get some pretty funky behavior going on that. We've got object lifetimes as well. So we said that the ID of an object exists and is constant and unique for the lifetime of that object. And what that means is, as long as that object exists in memory. So how do objects stop existing in memory? Well, one way is, well, what happens when an object is about to be removed from memory is that it's double underscore Dell method is called. So that's another one of these Dunder methods, magic methods. And what the Dell method does is it's also called a finalizer. It's just what's called when the object is about to see Python. Python is about to remove the object from its working memory. So we're gonna look at this example throughout the next section, we're gonna use this example. And it's a really simple class. It just has a name as before for the clarity of examples. And when an object's about to be removed from memory, it just gives us a print statement to say that it's doing it. So, Python frees memory in one of two ways. Either by keeping track of how many references to that memory there is and freeing the memory when that number reaches zero or by garbage collection because reference counting isn't always sufficient and we're gonna see why in a little minute. So let's start off with reference counting. We have an instance of this class that I just defined earlier, my Dell class, called Dave, and then we delete him. And you can see that when the reference to Dave, the only reference that there is is deleted. The object is also deleted as you can see by the print statement that we put in being called. Now, that isn't the only way that you can create a reference to a thing by just instantiating and assigning it to a variable name. We can also create multiple references or pointers to the same object in memory. Here we've got Alice and we give Alice another name, another variable, another pointer called also Alice. And then when we write Dell Alice, she's not deleted because there's still a living reference to Alice, also Alice. But if we delete that, finally she is deleted. So, if there are references still around in your namespace to an object, it will not be deleted by reference counting. What happens if you have two objects that refer to each other but don't have any other references in the namespace? That's called a cyclic reference. And it's what can happen sometimes when there will still be references to objects. If you've got this sort of situation where you've got two objects and each one refers to the other, those references will be known to the reference counter. Their reference counter can never hit zero, even if those two objects are not actually reachable from the Python namespace that you still have. And this is where garbage collection can come in. So, we're gonna see an example without garbage collection of what happens in this situation. So, we've got Jane and Bob and their arbitrary Python classes objects so we can just set attributes on them. They're friends. Bob has a reference to Jane and Jane has a reference to Bob. When we delete them, we've now removed any way of accessing these two objects from our namespace. We no longer have a pointer to Jane and a pointer to Bob that we can get at by name. But they point to each other. So, the reference counters for each of them will not be zero. They will not be picked up by the reference counting mechanism and deleted. And as you can see, after we've hit, after I've typed in del Jane and del Bob, that del method on each of them has not been called. We're not seeing that print statement the same, deleting Jane and Bob. So, reference counting isn't gonna be sufficient to make sure that our memory is cleared when it actually needs to be. This is where the garbage collector comes in and we're gonna look specifically at C Python's garbage collection. So, C Python's garbage collector works as follows. It detects cyclic isolates, which is a term for that situation that we just saw when you've got two or more objects that reference each other in a cyclic way, but they are isolated from the namespace. There was no pointer, there was no name in the namespace that can actually access that memory. So, C Python's garbage collector detects these. It will call the finalizers. That's so that's the double underscore del method on each of these. And so long as those double underscore del methods, as long as those finalizers don't do anything that causes the garbage collector to abort and there is something that they can do which we'll go into in a minute, as long as that doesn't happen, it will go and systematically break all of those cyclic references. So, it will stop Jane being Bob's friend and then Jane can be deleted and so on and so forth. And so, once it breaks the right cyclic references, the reference counter goes to zero and then those objects are deleted. So, we'll go back to, well, we can have a look at the garbage collector. All you can need to do is import GC and the way that the garbage collector works is it tracks objects. So, certain objects aren't tracked. So, for example, strings, certain built-in objects aren't tracked by the garbage collector. They cannot contain pointers and therefore they can never become part of a cyclic isolate. So, you can look at certain objects and see if they're tracked by the garbage collector with this method. Strings are not tracked but lists are. They contain pointers, they can become part of a cyclic isolate. By default, all instances of user-created classes are tracked by the garbage collector because you can add arbitrary attributes onto them and so obviously they can become part of cyclic isolates. So, the garbage collector, the way that it detects these cyclic isolates is it uses a traversal method to access all the pointers that an object has access to. So, if we've got a list, we can call this GetReference method on it and what that does is it goes through your object and says, what are all the things that you point at? And this list has a pointer to the strings, list and A. And this is not guaranteed to be in any particular order, I think, by this GetReference method. And so, what this does is it allows the garbage collector to detect these cyclic isolates, essentially. So, let's set up the situation we had before, a cyclic isolate with Jane and Bob, the smallest one you can have with only two elements, Bob and Jane are friends, and then we delete them from the namespace. So, now, if we import the garbage collector and when it's collect method, which has essentially given it a bit of a kick and say, go and do your thing, you can see that those finalizers are now called and the number at the end is the number of objects that have been deleted from the namespace or deleted from memory rather. So, it's four because you've got Jane and then her name Jane and Bob and then his name attribute, Bob. So, four objects have been removed from memory. All right. So, I mentioned earlier that there is a way that this can not fail, but a way that that garbage collection process can be aborted. And that is if the finalizer method does something to create a reference to the object in question outside of the cyclic isolate. And we'll see an example of what that looks like. So, I'm gonna define a new bad class called my bad del class. And this, what this does is in its del method, it creates a new global variable called person and then assigns the object that is in the process of being deleted to that person object is creating a new pointer in the global namespace and assigning itself to it. And then it'll print the statement that we had before. So, again, cyclic isolate, same situation, Bob and Jane are friends, we delete them. And what happens if we call the garbage collection now? It calls their finalizers and you can see that it has by printing those print statements. But then it says zero, nothing's been deleted. And this is because now there is a pointer outside the cyclic isolate, it's not isolated anymore. And what we way the way we can see this is if we reference this new global variable person, we can see it's an instance of that class. And it's also at this point, I'd like to point out that Bob is the object that had its finalizer called second because that was the one that was printed last. And if we go and look at this person object, its name is Bob. So, in the process of deleting Bob, which was the second finalizer that was called, it set itself to this global variable. And we can see that Jane also still exists as the friend attribute of this object. But if we try and access Jane by the name, small j Jane, that name has been deleted from the namespace. Now we've got into a situation where these two code objects still exist, still have their name attributes, but can't be accessed by the names that we originally assigned to them. And they've had their finalizers called already. So what we can do, of course, is just delete the person variable, delete that pointer to the object. And now if we run the garbage collection again, you can see it says that it's deleted those four objects. It did not run their finalizers again. An object can only ever have its finalizer run once to prevent a situation like this from being unresolvable. Because if it ran that finalizer every single time, you'd never be able to delete the object because it would always be creating a reference to itself within the finalizer. So finalizers can only ever be run once for this exact purpose. And I don't really recommend doing this in code, unless you know that you have a really, really good reason to for exactly the reason that it creates this kind of unintuitive behavior. But that's been an example of garbage collectors and finalizers behaving badly. And I think that about wraps us up. Yes. And this has been pointers in my Python, was a stop tour of some of Python's memory management. I hope you enjoyed it and I will be available to answer questions. It was a really nice and well-structured talk, Eli. We can have the questions now. If anybody has any questions, please put them on the metrics chat. Till now we didn't receive any. Yeah, Anna, I mean, I'll be around in the chat as well afterwards in the breakout room. But no questions is fine too. That just means I was extra clear. Yeah, you were really speaking very well. All right, I see. Okay, a few people typing and they are really liking your talk. I have a question as well, just let me read it out for you. Give me a second. Okay, so the question is if there are a million objects in memory, then the garbage collector has to go through all of them to find isolates, even if isolates don't exist. So can you just comment on this? Yeah, so the garbage collection happens at given intervals, I believe. This is all configurable actually by setting. There's a certain, I'm trying to remember exactly what the behavior is. You can find it, I can link the reference docs. Essentially, there are a certain number of pools that objects go into. And if an object has already survived one round of garbage collection, it gets moved into a different pool. And I think there's, I think there's by default three of these. So it survived zero times, it survived once, it survived twice. And I think there is different behavior for the objects in each of those different pools that the garbage collector will sort of know that some objects have been around longer than others. I believe it is, I mean, if you've got a million objects in memory, you probably dealing with a pretty hefty processor anyway. Yes, because I mean, how else is it going to find them? It's going to have to go through all those objects. Given that those objects are tracked by the garbage collector, because obviously objects that don't contain pointers aren't tracked by the garbage collector. All right. I hope that answers the question. And yeah, people are really liking your talk and they are just both saying super informative and comments alike. So if anybody has any questions, please feel free to reach out to Eli on the breakout October room. They will be there to help you and you can get a chance to interact with them. Thanks Eli. Thank you.