 Now we have a talk from Piotr, Piotr Primus, difficult name, sorry, sorry Piotr. Everything you always want to know about Python, about memory in Python, so yeah, but we're afraid to ask. I think have fun and I give the word to Piotr. Thank you very much. Thank you very much and at the beginning I have to say that bad news for you. So it won't cover all the topics about Python memory because the subject is too complex and I had to choose something, but I will try to do my best and probably I will run out of time. And so if you have any questions, please catch me during the lunch and after lunch I'm going back home. It's okay, so few words about me. I'm still a PhD student and I work as a research assistant at Nikolaus Copernicus University. My main interests and scientific interests are databases and GPGPU computing. I try to combine these two and also I did some stuff with data mining. I have at least eight years of Python experience and I did some projects with Python. I released here three of them. So I was working with, I was responsible for preparing parts of a trading platform for an asset management company. It was mostly backtesting and trading algorithm. I also was responsible for preparing a muscle biomonitoring analysis and data mining software for laboratory use and now we are thinking about commercializing it. And for my PhD thesis I prepared a simulator of heterogeneous processing environment for evaluation database query scheduling algorithm. And I mentioned these products here on purpose because all of them had something in common. They all were memory intensive. They were long running. And during these computations they tend to grow with memory and at some point I decided that I have to know something more about how the Python manages the memory and what are the size of different types and what are the strategies for allocating different containers. So this will be mentioned in the first two sections and later I will try to say a few things about memory profiling tools. So let's start. First some basic stuff. I have a CC++ background and I also teach this language to students and after I don't know first two months they already know what are the sizes of different types in C or C++ and in Python this knowledge isn't required for you. So actually you don't even have to care about what is the size until some point. When your application is large enough and it allocates a lot of memory then you start to think about what are the sizes of different Python types. So I won't go into details with this table. I'll just show you some interesting stuff. Okay, one of the interesting stuff is that long in Python 2 and in Python 3 are actually limited by your memory. So as far as you have enough memory you can allocate large enough number and you have to also know that these types are pretty large if you compare them to C or C++ but that's because there is an overhead of garbage collector and other things and it's also not how the strings and unicodes are represented in memory. So we have a pretty large header and we also have to pay two bytes for each element in your string on unicode. The same goes for tuples where you actually have even larger header and you have to pay four and eight bytes for each element. So you can do it yourself. From Python to C++ you have C++ get size off and you can put anything in there and you will get size in bytes with some restrictions. So all the built-in objects will return your correct results but if you use some tiered party libraries it might return some crazy stuff so be aware of that. And actually it calls the size of method and it will add the additional garbage collector overhead if the object is managed by the garbage collector. So let's do something more interesting. Here is a fun example because creating a place is fun. Here we create two lists with the same size and here we create numbers with this formula and here is exactly the same number plus something more. Do you think that there will be any difference between memory allocated between listing one and listing two? This is a fun example so yes there should be some difference. And as you can see the size of the first list is actually less than half of the second one. So this is because of object interning. So what's that? So okay in Python there is a general rule for creating objects. So when we create object and assign it to a variable so this object is created and assigned. So the variables just point to this object. They do not hold the memory. And interning of objects is an exception to this rule. It's mainly due to the performance optimizations done in Python. And it's highly implementation specific. All the examples from this presentation are from C Python. And actually it might change over time and there was at least one change in the Python implementation about object interning during the time. So what is the interning of objects? So often used objects are pre-allocated and instead of creating every time I don't know how to say a equals zero so the zero won't be created all the time. It will be shared among all the instances. So here we have the code that will visualize this. So here we assign zero to a and b and if you write a is b we will get true and of course value is also true. And here in this example you can see that we assigned a large number and a is b will return the false and the values are of course the same. Actually someone showed me an amazing test from two days ago and there was similar question there but still this is highly Python implementation specific. So let's talk about something more about the object interning. First of all warning. This is I will say this once more. This is Python implementation dependent. This may change in future and probably will. This is not documented in the Python documentation for programmers. If you want to reference for those values here you have to console the source code. So in C Python 2.7 to 3.4 we got object interning for integers from this range. We will also have object interning for strings and unicode in Python 2 and Python 3 and unicode and strings in Python 2 and Python 3 and the interning will be for empty strings and all strings that are length of them is equal one. And with the restriction for the unicode for only the Latin one symbols. And also empty tuple is another example of object that will be shared among. Now something a little bit different but still interning. It's string interning. So we start with a simple example. We will create two strings. Almost the same. Here we will add the missing letter to the A and try if it's the same. And we will get false, of course. But if we use intern this is for Python 2. And try this one and we will get true. So let's try to use it for something evil. We will create a large list with those strings. And as we can see we got 57 megabytes of memory, resident memory used. And if we do the same with the intern here we actually reduce the memory usage. But what's actually happening when we use intern? So string interning this is almost Wikipedia definition. It's a method for storing only one copy of each distinct string. But we have to remember that they should be in the table. So for Python 2 and Python 3 we have function intern. And in Python 3 it was relocated to the CIS model. So we have CIS intern. And if we use this function we will actually enter a string into the table of intern strings. And we will get a reference to the intern string. It might actually be the same string if it was already interned. Or it might be a copy of the string. And when can we use it? So we can get this also from documentation. We can get a little performance on dictionary lockups. And some of Python names will be automatically interned. Or in programs and actually the dictionaries that hold model, class and instance of attributes have interned keys. And as the previous example we can also reduce the or it will reduce the space used when we have a lot of same strings in our code. Okay. So let's say something more about mutable containers. There are different mutable containers in Python lists and actually behind the scenes there is a strategy for allocating these containers. So a good strategy we will try to prepare for growth or shrinkage. So to prepare for growth we will slightly over allocate the memory. So each time we append an om and to a list we won't have to relocate the memory in our system. So we leave the room for growth. And we also have to remember that sometimes we have to shrink the allocated memory for a mutable container. So this will reduce the number of expensive function calls like reloc, man copy and so on. And of course we will try to use an optimal layout for performance reasons. So let's start with a very, very simple example. This is a list. The first time we put an element into the list we will get an allocation but not for one element but for four elements. And after that if we append something or change something in the list we will have a free append so it's memory operations free. So we can put another element, another element and another element. When we put fifth element we have to reloc. So the Python will relocate the array for our list and we will prepare with four more elements. So how does it exactly work? So lists in Python are represented as fixed length array of pointers. So we just point to objects and by design it will over allocate the list. So at the beginning it will be something like that. But for a large list it will be less than this percentage. So okay, some consideration about performance due to memory actions involved when using lists. So we actually, when we put something at the end of the list, these operations will be cheap. But if we put something in the middle or in the beginning we will have to copy the memory or shift the memory to perform this operation. It is also noteworthy to note that for one, two and five elements list we waste a lot of space. So if we have a large number of small lists, we will have to over allocate for more elements. Okay. And here is the overhead for allocating arrays and you have to pay this price for each element for different architectures. And the shrinkage of the lists will happen when the number of elements that we use will go below the half of the allocated space. Okay. Let's talk about allocating for dictionaries and sets. It's pretty similar. But here we will over allocate when we reach the two tiers of capacity of a dictionary or set. Actually, for small dictionaries and small sets we will quadruple the capacity in when the set and our dict is big enough, we will double the capacity to not exceed the memory. Then we will have to calculate actual used size for this object and allocate the memory. And the shrinkage of the dictionary or set will happen when we remove the large number of keys from it. Okay. So another example, we can represent data in various data represented in different ways. So we can use old style class, new style class, we can use slots, we can use name tuples, tuples, lists and dictionary data. And I recreated an example from, I don't know, Python 2010 for current versions of Python but I added more objects and added some more fields. And actually you can see how they differ for storing the same data just by defining different types. So as you can see with some restrictions because when you put slots into your class you've got a lot of restrictions for this class but you can gain a memory minimization boost for you get less memory used for those classes. Okay. Some notes on garbage collector and reference count in Python. So actually, as you probably all know, Python has a garbage collector and it will collect objects when the reference count goes to zero. There are some operations that will increment the reference counter, there are some operations that will decrement the reference counter but there is a warning and it's put into the official documentation that if you actually overload Dell method you can have problems because if you have, because Python garbage collector currently can deal with cycles in object references but when you use Dell method it's not possible for Python to guess the correct order of using the Dell methods for the objects in the cycle. So actually this cycle won't be deallocated from your memory. Okay. I have some more time. So I will talk about some tools that you can use for Python memory profiling. Let's start with a PSU tool. It's pretty simple. It's actually cross platform system for API for system utilities and actually to get some information about current process memory you can just use the PSU tool process, get Peter for your process and transform this information into a dictionary and then you can return the simple information. For most of the examples I just used the code because it's most reliable for this purpose. And another tool is memory profiler and it recommends to use PSU tools so it's good to have the PSU as a dependency it will work faster and memory profiler might work in three different modes. So you can get a line by line profiler, you can get memory usage monitor and you can use it as a debugger trigger. So let's start with line by line profiler. You have to put in your code profile, the curator on the function you want to profile and then you can run it with something like that. Of course you should be the name of the code. And then you will get such results that you will get line by line memory usage and the increment from the memory usage for each line. And here we see that the for loop is the main memory contributor. And the second way that we can use memory profiler is by using it as memory usage monitor in time. So we will just monitor the process memory usage in time. And actually you can use it for any type of process but not only for Python but if you want to use it with Python you should put profile decorator for functions that you want to track and run it with the option Python. And here I run some simulation and here is the result. I got a plot and here I get the connect function marked as the one that does the operations here. So I see that probably connect is responsible for the growth from here. And the run function is marked here and as we can see it doesn't change a lot from our memory. And the third option for memory profiler is to use it as a debugger trigger so we can set up a threshold of used memory and run our process and then we will get, we will step into the debugger when we reach the memory that we set as the threshold. Another tool is object graph. It's a cool tool for visualizing object references in Python. And actually for small projects it's pretty cool because you get such plots like this. It's a good tool for finding reference cycles in your code but if your project is large enough, the plot that will be generated will be pretty large and it will be hard to track something there. But with some code manipulation, all of this is in the tutorial of object graph, you can actually track down the object reference cycles pretty easily with this. Okay, the next two tools probably there will be covered more in the third talk from this session. Hippie and Melia. There are a hip analysis tool. They are pretty the same with some difference that are pretty good described on the page of this project. So let's see what we can do with them. So we actually can run some code and do a hip snapshot here and do some more memory in intensive operations and do another hip snapshot. And we actually can do some arithmetic on those hips and get such results so we see that we allocated a lot of integers and lists with this one operation. Another tool is combination of Melia and Run Snake Run. So actually you can use this tool to dump all the objects in your code and then use the Run Snake Mem with the dump of memory that you did and get such interactive plot so you can zoom in, zoom out to see how the memory is allocated for different objects. Okay, and this is almost the end of my talk. So you can also use different malloc implementations with Python. It's pretty easy. And you will find probably many block entries about using different memory allocators. And it's got some pros. So you can actually gain with very little ingrants in your code some additional, I don't know, better memory process to system memory retrieval. But it also comes so actually it might work against you. So it depends from your application type. So if you want to use different malloc implementation you'll have to install of course the different libraries and then you can run the Python with all the preload and with the path to the library you want to use. And you can get different results. I actually prepared some small tests. So my code got some several steps and I used malloc, gem malloc and TC malloc with the same code. And you can see how the memory changes in different cases. So as you can see for malloc, malloc is actually pretty good now on Python. There were some, a few years ago there were some problems but now it works pretty, pretty good. But as you can see with gem lock you can get pretty the same results and probably for different applications you can gain something. But with TC malloc for this example actually you end up with a little bit more allocated not returned to the system. But again this depends on the application type that you will, of your application type. Some other useful tools. You can always, you can always build Python in debug mode. You can use valgrid with Python. It will pretty good cooperate with it. You can use the experimental extension for gdb. And probably for most of web developers you can use one of dozer, of dozer. Dozer is probably more convenient because it's a version of CherryPy memory. So this one and you can just put it in your with g and get some memory profiling. So summary. Try to understand better the underlying memory model. Pay attention to hotspots. Use profiling tools. This is actually the hardest part. So try to find the root calls and fix the memory leak. So the next talk will be about this. And there are also some quick and dirty solutions. Sometimes dirty. So you can delegate memory into another, memory intensive operations into another process, process it and collect the results and then kill the process or stop the process. And you can actually restart a process if it generates too much memory overhead. And also you can try to get the results like slots or try different memory locations. Here are some great references that I used during when I was preparing this presentation. So give it a try. Some of them are outdated. Like this one, but they give a great insight about Python memory insights. So thank you very much. Okay. I think we have some time for questions. So please come to the microphone now. So I've experienced it sometimes that I've had created many objects in Python and then I removed all the references to them and actually forced the garbage collector on. But the system memory still wasn't freed. So does it just take some time or why does Python sometimes not free? It's actually made depends from the version of your Python interpreter. This is one. And it's a little bit more complicated, but you can try the different allocation libraries that I showed and try to see if it will help with your problem. Okay. Thanks. Yeah. Of course. Do we have any hints? You showed this heapy and so on I knew already. And do you have any hints how to debug off-heap memory problems? So what I experienced sometimes using PsychoPG2, like the whole process and was using 4 gigabytes of memory, but heapy only showed very few heap memory stuff. So I guess it was related with something, yeah. So the question is? The question is, do we have any hints how to debug like this off-heap memory problems? So you can, I don't know if you tried the debug version of Python, so compile it with the debug version and then you can see the objects that were and the allocated by garbage collector and you can use Walgreen if you want to go low level. So we can talk about it in a moment. Okay. Thank you very much.