 Hi. Yes, I'm Eli. I am a developer advocate at Anvil and I love talking about Python. So this talk today is about how Python handles objects in memory and we're going to be talking about what pointers are and where you'll see them in your code, what the idea of a Python object is and why it matters and how the Python in particular see Python can tell when you are done using an object that it has in memory and what it does when that happens. So what is a pointer? Well, here are some pointers. This is a bit of C code. And here we've got, you can see it sort of saying this is a void and a pointer and that is saying that this is a pointer to an object that the type is not specified. You've got an int and therefore the pointer has to be a pointer to an integer or a char and it has to be a pointer to a character. And this is because in C it's very strongly typed. Obviously in Python we've got duck typing. So pointers function a little bit differently than they do in C and you don't actually need to know any C at all to ask for anything that's going on in this talk. But the reason that I call them pointers is because that's what they're called in C. So a pointer is basically an object in the namespace that corresponds to an object in memory. So when you name a variable something, that name is a pointer. So for example, I instantiate an instance of my class and I call it my class instance and that name is a pointer to that object in memory. And through that name, you can access and manipulate that object in memory. So where we see them in our code, a list is actually a pointer. What lists is we're going to use a lot of lists in the examples that I'm going to go through because the way that a list functions in Python is it's actually a wrapper around some pointers. So the name if you've got an A and you've said A is some list with some objects in it. A points to the list object in memory and this object has all of your list methods in it. So like append and get item, which is when you get an item from the list by indexing it. And then it also has pointers to all of the list contents. So when you have a list, it automatically has pointers within it if it has any contents. And so because of that, we're going to be using lists in quite a lot of the examples to see how pointers behave. So one behavior that you might often see and can often trip a lot of beginners up when they start using Python is something called pointer aliasing. And this is the phenomenon when you change one variable and another one changes and you didn't intend to do that. And we're going to see an example of that happening here. I have got a list. I've got my list A and it has some strings in it, my cool list. We've defined it and we've got the interpreter to print it back out to us. And I want to make a copy of this and make some changes. So I'm going to set B equals A. And now I want B to be my awesome list, not my cool list. So I just change the variable, the value of the second item in this list. And I set that to be awesome instead of cool. And so we get the interpreter to print us out our list B and we can see that it's changed. Fabulous. This is what we expect. But let's see what happens when we print out A. A has also changed. And the reason this is is because when we set B equals to A, we didn't actually make a copy of that list object. We just made a new pointer to it. And so when we access that list object through the point of B, it's the same list object that A points at. So we're actually changing the same object that's just got two different names. So what if we didn't want to do that? Well, you could use the copy method of a list and I'm going to make a different copy. And instead of just setting it equal to A, I'm going to make it be equal to A copy. And now I want this one to be my amazing list instead of my awesome list. And if we do exactly the same thing again, we just changed the value of the second list of the object in the list. And we can see that that has happened as we expect in our new list. We can see that it didn't change A. This is the behavior that we wanted. This is how you get the behavior of making a copy of a list without changing the original. And this is all well and good. But the reason that this works is because when we actually changed awesome to amazing, we simply replaced the string. Strings aren't mutable in Python. So what happened here was that within that list object, we got rid of the awesome pointer and replaced it with an amazing pointer. But what if we have pointers inside our pointers? What if we have a list that has a list inside it? So that's exactly what we're going to do here. We've got a list A. And you can see there are double square brackets here, meaning that this is a list with one item in it. And that item is itself a list with two objects, Alex and Beth. So we're going to do exactly the same thing again. We're going to say B is equal to A copy. And so this is the behavior where we're not going to change the original list object. But now what we're going to do is we're not going to replace that first object. We're going to mutate it because the object inside our list, the first item, is itself a list and it's mutable. And so we're going to change that object in place and we're going to add Charlie to the list inside our list. And you can see that this is exactly how we expect it when we access B naught again, the first object in our copied list. But it also has changed the list within A. So just copy isn't enough if you've got more than one layer of nested pointers. So what if it's pointers all the way down? What if I wanted to make a real copy of my list within a list and have it not change the original object? Well, you can do something called deep copy. There is a copy module in Python and it's got this method called deep copy. And what this does is it copies pointers all the way down. So we're going to try again. We're going to make another list and we're going to use deep copy to copy our list within a list. And now we're going to try and append Dan to our list of names and it works exactly as we expect in our new list. And when we check A again, we can see that it didn't append Dan. And this is the behavior that you want if you want to completely copy a list that has multiple layers of nested pointers and not mutate the original. And I've got some diagrams to explain exactly what's going on here because it's a little bit abstract. So when we have copy, when we have equals, if I set B equals A, here's what's happening. I have A, I have its original object in memory and the strings that were pointed to the strings or lists that were pointed to by that original list. So in this case that we just looked at that list, that object that is being pointed to is another list. So if B equals A, you can see it just points at the original object. And then when you mutate the original object, you'll mutate it when it's asked, it will be that change will be reflected under the original name as well. If we do copy, what we're doing is we're making a new list object with its own set of append methods and get item methods and all of that stuff. But the things it points to underneath are still the same. So this is why when we did A copy and we mutated that list, you're still mutating that same orange object. And that means that when you access it through A, those changes will be reflected there because you've got another layer of pointers. But so that's what happens when you make copy. And then finally, this is what happens when you do deep copy. It actually recreates everything all the way down and only showed two layers of pointers. But no matter how many layers you have things pointing to things pointing to things, deep copy will create an entirely new instance of all of those things in memory. And so I can do whatever I like to be now. It will never affect A. So that's the difference between all of these different ways of making copies of objects in Python and depending on the situation, you may want one of these behaviors, not the other, it might not matter. And if you are really memory constrained, if you're working in a situation where you don't have a lot of spare memory, while deep copy gives you a completely independent set of objects, you might not have enough memory to do that. So you might decide that it's better to actually have them point at the same thing and just be really careful about the changes that you make. So that's how that's some pointer A listing and how to avoid unwanted behaviors with it. So we're also going to see an instance of tuples behaving badly, some unexpected behavior in Python owing to the way that it handles pointers and mutability. So tuple A, it's like a list, but it's immutable, which means that when you instantiate it, when you give it the objects that are in that tuple, those things cannot change. So let's say we've got a tuple A, its first element must point at the same object during the lifetime of A, that pointer cannot change. But the thing that it points to could be mutable, like a list. So what if we had a tuple that had a list inside it? And you've got orange, I've got the round brackets there are orange to just highlight that this is a tuple, not a list. And so we've got, yeah, we've got this tuple, it's got a list inside it, we print out that element, it's a list. Now, if I want to mutate that object in place, I can do that if I want to, because I'm not changing what the thing points to, I'm changing what the things pointers point to. So I can mutate this object in place, I haven't changed where that pointer points, just what it points at. So that's all fine. But what if there's another way to do appending to a list, right? You can use the plus equals operator. But this gives us an error. And after a little bit of thought, this makes total sense, right? You can't assign to objects in a tuple, you can't do a naught equals something when it's when it's a tuple, because tuples are immutable, you cannot change them once they have been instantiated. So that's totally fine. Actually thinking about it, this is what we expect. Except if I now get the interpreter to print me out the value of a naught, we can see that that five is actually there. Even though it threw us an error, it's still done the thing we wanted it to do. That's a bit weird, right? And here's why. The plus equals operator first mutates in place just like we did with append, and then does an assignment of that new list to that value in the tuple. And mutation is fine, we can do that no problem. But assignment we can't do if it's a tuple. So the interpreter gets, it throws an error even though it's done the thing that we wanted because it did the thing that we wanted and then tried to do something that it was illegal. So a bit of unintended or unexpected behavior in Python when you're trying to assign to tuples there. Right. I talked about that tuples have to be, they're immutable for the lifetime of that object. Like so long as our tuple exists, oh sorry, sorry, scratch all that. Going back to making copies of things, how can we know when two objects are really the same? So we did a thing where we did B equals A and then we did C equals A copy. Are these the same? In what ways are they the same? You can find out the ID of an object in Python, there's a built-in method called the ID. And this is a function that will give you a unique value for every X in memory. The ID of it is unique and it is constant for the lifetime of that object. So once that object doesn't exist anymore, that ID that it used to have is freed up and potentially could be used for another object later. So what does this mean in terms of equality? So many Python implementations uses the object's address in memory as its ID, but not all. So for example, C Python uses the memory address. Physically where is this object stored in memory? Or I guess depending on how many levels of virtualization you want, but it uses the memory address of the object as its unique constant ID. But Sculpt, which is a Python to JavaScript compiler, actually just generates some caches a random number. So you cannot necessarily rely on the address in memory being equal to the ID. We're mostly going to be focusing on C Python during this talk and a lot of people primarily use C Python, so it's useful to know that, but it's also useful to bear in mind that it's not universal. So we have a list again. And I've called the ID function on it to get a big long number. Now, if you do program and C at all, you might be more familiar with this if it were in hex, because C normally prints out its pointers, the value of its pointers in hex. Or in C, pointers are the address of the object in memory. So that's kind of the connection between the ID and pointers. So we can see that the ID of A is just this big long number. And if I say B is equal to A and I print out their IDs, you can see that they're the same. And this is because B and A are both pointing at the same list object in memory. They are both pointing. They both refer to the same object. But if we do a copy, you can see that those two IDs are now different. C is actually a different list object than A, even though they point at the same things underneath. So when can we tell if two objects are actually the same? Well, there are a couple of different ways of defining equality in Python. So we're going to have the same setup again where we've got a list and we set one to be equals and one to be a copy. And we're going to look at some truth conditions about their equality. So A is double equals to B. That's true. And A is B is also true. But A is double equal to C is true, but A is C is not true. So what's the difference here? You might expect that these two comparators are actually doing the same thing, but they're not. And here's what's going on under the hood. Is uses the ID. So saying A is B is directly equivalent to saying that their two IDs are the same. So and the double equals comparator uses the double underscore equals method. Double underscore methods are also called Dunder methods or magic methods. I'm going to be calling them Dunder methods for the rest of this talk. So double equals uses Dunder equals as its method for defining whether or not it is true. So what is this Dunder method? Well, you might already be familiar with quite a few Dunder methods such as Dunder str, which determines what happens when the str built in gets called on an object. Dunder wrapper, which is does a very similar thing, but in different contexts, and of course, in it. So this is a method, the double underscore equals method defines the behavior of the double equals operator when it's applied to instances of the class that that method is defined on. So we can write our own if we like. I've written a class here and I've defined an equals method on it, a Dunder equals method on it. And all I've done is I've said return self is other. So we're actually just delegating to the other kind of equality that is. This is saying that one is double equals to the other if its IDs are the same. This is actually the default method. This is the default behavior for any user defined class. If you don't define a Dunder equals method yourself, this is what's going on under the hood. But we could also define something completely different. I've got one, I'm defining a class here that it has a name so we can tell our instances apart. And for its Dunder equals method, I've just said return true like indiscriminately return true no matter what's going on. And you might think, okay, like, I'm sure you can do that. Let's see some of the behavior that comes up with this class. So I've got two instances of this class. I've got one named A and one named B. Their names are different because one has named A and one has named B. They've got different attributes on them. But when I compare them using the double equals equality comparator, it returns true because it uses that Dunder method and we told that Dunder method to return true no matter what. Obviously, this isn't a great equality method. So let's see what happens if we do the opposite thing. What if we set it to always return false? So I've got a class here that's doing just that indiscriminately whenever this equals method is called, it will return false. And you might think that's great. Well, no class will ever be equal to a different class. That's what we want except for you get the behavior that A is not equal to itself. Because even when you're comparing two objects that are literally the same object, you're still using, you're still calling that Dunder equals method under the hood. And we told that to always return false. So you end up with a situation where A is not equal to itself. So as with all Dunder methods, with great power comes great responsibility to not do silly things with them. Otherwise, you can end up with silly behavior. So we talked about object lifetimes earlier and the idea is particularly relevant to this because it is only guaranteed to be unique and constant for the lifetime of an object. And the lifetime of an object refers to the length of time that that object exists in memory for Python, in Python's working memory. So what happens when an object is about to be removed from memory? It has another magic method, another Dunder method, DunderDell. And this is a method that is called when an object is about to be removed from memory. It doesn't actually do the removing, but it is like the method that gets called right before. And because it's a Dunder method, you can write your own if you like. I've got a class here that I've defined and we've given it name as well just like with the other one so that we can tell it apart from different instances that we're going to be using in examples. And in my DELL method, all I've done is I've said I want it to print out that it's about to be deleted. I want it to tell me that it's about to be wiped from memory. And we're going to use this to see some of the behavior of Cpython in particular when it is removing objects from memory. So Python frees memory in two ways. The first is reference counting and what references are pointers. So for each object, Python keeps track of how many pointers there are to it. So for example, when we had that list and we said A is our list and then we said B equals to A. Now we have two pointers to that same list of objects. That's two references. So once an object has no more references to it, there's no way to reach it. Nothing can get at it because there's no pointers pointing to that object in memory. Python goes, oh, we don't need this anymore and gets rid of it. There is another way to free memory in Python and that is the garbage collection or garbage collector. Implementation of this varies between implementations in Python. We will be looking at Cpythons later. And garbage collection handles the situations when reference counting isn't sufficient and we will see an example of that in just a moment. So let's see some reference counting in action. I've got Dave. He's an instance of that class that tells you when it's about to be deleted and I've deleted him. And as we can see, that Dell method was called because the output has printed out what we told from the Dell method. It's told us that Dave is about to be deleted. So we can see that we made this object. We have a pointer to him, Dave. And once we remove that pointer from the namespace, the object gets wiped from memory because there's no more references to it. So we can see an instance of this where we have two pointers to the same object. We have Alice and then we make another pointer to the same object, also Alice. And now if we delete the first Alice pointer, we can see that nothing gets printed out. That object hasn't had its Dell method called, otherwise we would see that print deleting Alice. We have to delete the second pointer too. But once we delete the second pointer, also Alice, you can see that the reference count has now dropped to zero and Alice has been wiped from memory. So what happens if you have objects that all refer to each other? So let's say, well, I think it's better if I just show you an example, essentially. Let's have, oh well, okay, how on? In order to understand what's going to go on here, we need to understand what's going on in our namespace. What pointers do we have in our namespace? So the way that we can actually see this is you can call this built-in function called globals. And this gives you everything that you could possibly call from, let's say the interpreter or your script, whatever. And when we call it at the moment, we can see that my class is, we've got my Dell class available to us, essentially. I can call this, it's been defined in the namespace. And you can see is my class in globals? Yes, that is true. And the way that we're going to be seeing if something exists in the namespace at the moment, if there's a pointer to an object in the namespace is by doing this truth condition, like is this thing in our globals? Because that dot dot dot at the top of the slide there is actually there's a whole bunch of stuff that gets printed out when you do globals, which is why I've truncated the output here. Okay, so, psychic references are when you have objects that all pointed each other. And we're going to see an example of this. So we've got two instances of this class that I've instantiated, that's class that I've defined, that has a name and tells you when it's about to be deleted. We've got Jane and Bob. They are arbitrary Python objects, which means we can set arbitrary attributes on them. So I'm going to make them pointed each other. Bob has a friend and that's Jane. And Jane and Bob is also Jane's friend. So if we now look at for Jane in globals and Bob in globals, we can see that they both exist. They're both pointers that we have access to in the namespace. Now, if we delete Jane and Bob, we can see that they're not in the globals, they're not in the namespace anymore. But after deleting Jane and Bob, we didn't see that printout of saying that they were being deleted. And this is because even though the pointers from the outside Jane and Bob have gone, those references have gone, they each contain a reference to each other. And that means that the reference count for both of those objects has not fallen to zero. They have now become what is called a cyclic isolate. They form a cycle of objects all pointing in each other, a very small cycle, cycle of two, smallest you can have. And they are isolated because there are no pointers in the namespace that can now access those objects in memory. So reference counting isn't sufficient. Because if reference counting was all we had, those two objects would last in memory forever until you closed the interpreter. And if you had lots of such situations occurring, you could very quickly run out of memory. This is where garbage collection comes in. So we're going to be looking at CPython's garbage collector. And CPython's garbage collector is specifically designed to tackle the problem when you have cyclic isolates. And the way that it functions is it detects the cyclic isolates. And I'll talk a little bit about how it does that. It will call their finalizers, which is another word for Dunder Dell. And if all goes well at that point, it will break all the cyclic references. What it does is it goes to Jane, and it says Bob is no longer Jane's friend. And now we have no pointers to Jane. So Jane can be deleted. And once Jane is deleted, there's no pointers to Bob either. So in that way, it basically destroys the structure of the cyclic isolate, and then reference counting comes and mops up everything else. And we will see an example of that little asterisk where not all goes well when the finalizers are called. So the garbage collection module in CPython, you can import it. You can look at it. And the way that it works is it keeps track of objects that can have pointers in them. So strings aren't tracked by the garbage collector because strings cannot point to other things. However, lists are tracked by the garbage collector because they can point at things. Instances of user-created classes are always tracked by the garbage collector because they are arbitrary Python objects and then can arbitrarily point at whatever you like. So the garbage collector uses a traversal method to access all the pointers of an object. Every object in Python has a list of all the things that point at it. And you can actually look at this method in Python. I went and looked at how it's defined in C, and it's really interesting. But you can call it from Python with this GCGetReference method. So let's say we have a list, and it has some strings in it. And all the things that that list points at, you can see what this list points at by calling this method on it. And this is the way by which the garbage collector detects cyclic isolates, because by this method it can find situations where things form a cycle, things eventually get point back to where they started, and that they're not pointed to by anything outside of that cycle. So we're going to set up a cyclic isolate again. We're going to do exactly what we did before. We've got Jane and Bob, they're friends, and then we remove the external pointers from the namespace. And we're actually going to look at some diagrams to see what's happening here. So here is us instantiating those two objects. We've got Jane and we've got Bob. And then we make them point at each other. Then we delete both the external pointers. And now you can see it's a cyclic isolate. There's nothing around it. That's why there's no links to it from the namespace, but they do point at each other. So this is quite a nice way of visualizing the cyclic isolate. So what we can do in order to resolve the situation is we can just force the garbage collector to do its thing. So we import the garbage collector, and we call it's collect method. And you can see that it's printed out the output from the Dell method, which have now been called because the objects are about to be removed from memory. And it gives us a number, and that number is the number of objects that have been removed. You might think it would only be two because we've got Jane and Bob, but their names, the string names for them are also being deleted when they're deleted. So it's actually four. So now we're going to look at what happens if that step where all of the finalizers are called actually ruins something in the garbage collection process. So it is possible for a finalizer to make a cyclic isolate not isolated anymore. It's possible for a finalizer to create a pointer from outside the isolate into it. And therefore it's no longer a cyclic isolate and the garbage collector shouldn't really get rid of it because you might actually be wanting it still. So we're going to do that. We're going to instantiate a new class. We're going to define a new class called my bad Dell class. Still got a name. And what the Dell method does before printing that it's about to be delete the object, it sets a new global variable. It sets a new, it creates a new object in the global namespace. What was, what we can inspect by looking at that global function. And we're going to call it person. And we're going to set the object in question that's about to be deleted. We're going to set it equal to that. We're going to make that pointer, that person pointer point at it from outside the cyclic isolate. So what happens when we do the same thing we did before? We set up that cyclic isolate just as we did before, seen the slide before. And we're going to call the garbage collector. And what it does is it calls those finalizers, but it doesn't actually delete anything. And you can see that because it's printed out zero. It didn't delete Jane and Bob and their names. But it did print, it did run the finalizers. And it's worth noting here that Bob is the one that was run second. And we're just going to look into globals. And we can see that when we check globals, that person object is actually there now. That person object was created by Jane and Bob's finalizers. They're done to Dell methods. And so now we have it in our real namespace. So let's inspect this person object. We can see that the person's name is Bob. And this is because Bob's finalizer was the second one to be called. So that is Bob is the one who was most recently set, person most recently pointed at Bob. And we can see that person has a friend and that friend's name is Jane. So both person, which is Bob and the person's friend, which is Jane, those two objects are still alive in memory. They have been kept alive by this external pointer. Now, what do we do in this situation? We've got this cyclic isolate and we couldn't delete it with the garbage collector. What's going on? So here's what we did. I'm sorry for the interruption. I'm sorry for the interruption. We only have a couple of minutes left. And there's not. Yeah, no, I'm really close to wrapping up. Sure, thank you. Yeah, we're on the last three slides. Okay, here's what happened. We have the cyclic isolate, but when the Dell method of Bob was called, it created a new pointer from outside that cyclic isolate. And at this point, the garbage collector realizes this and goes, abort, abort mission. I will not delete these things. But there is a way out of this. And we're just checking here that Jane, the pointers that we had before, the pointers that we had to use to access are not in the globals anymore. We've deleted those. What we can do, of course, is we can delete that external pointer and run the garbage collection method again. And the key thing here is that an object's finalizer, it's done to now method, will only ever be called once. So we do not get into a repeating cycle where we have a cyclic isolate that keeps getting new pointers made to outside the namespace. Once a person has been created once by one of those finalizers, that person pointer can never be created again. So you see that it prints this out for, it deleted the objects, but it didn't run their finalizers again because we didn't get that deleting Jane, deleting Bob print out. And so this is some unusual behavior with the garbage collection and how it handles things and why pointers are important to understand how Python haunts its objects in memory. And I really am done now. This has been pointers in my Python. I've been Eli Holdness. And thank you very much for having me. Thank you, Eli. We have a couple of questions if you would like to take. Yes, of course. So the first one is Python garbage collection has been a mystery for me. So many pointers and so much memory cleanup is required. How can we control it? Well, one thing you can do is just fire up your favorite interpreter and hopefully see Python. I'm not sure how the garbage collection module is implemented in other implementations of Python, but import GC and go to the documentation, have a look at what the methods are, play around with them. All of the code that I showed you in these slides and the slides will be available as well, you should be able to just do in your favorite interpreter and sort of see it in action. I am a real nerd about this stuff, so I love going through reading all the documentation. I actually looked at a lot of the Cpython implementation for like PyObj, which is the sort of generic object implementation in Python. Yeah, the documentation is really, really good for this stuff. And if you're interested, I would really encourage you to check it out. But also you can just import GC and have a play around with it and see what you can find. And if you find any really weird behaviors, do a call for us to talk about it. Sure, thank you. The next question is, what are some of the methods which are used apart from reference counting for garbage collection? Sorry, can you say that again? What are some of the methods which are used apart from reference counting for garbage collection? So I'm not, I don't actually know. I mean, so the garbage collector is specifically for the situations when reference counting isn't sufficient. And if I remember correctly, I think there is noise about in 3.10 or 3.11, Python is going to move to, is going to drop reference counting as a method for memory management. But I actually don't know. I know that other languages do other really clever things. I am sure that, for example, like, well, I mean, you have some languages where there isn't garbage collection. For example, C, you are responsible for freeing all the memory that you use. Not every language has garbage collection. Yeah, I really only know about Python in this case, I'm afraid. And Python is reference counting and then the garbage collector.