 Okay. The next talk. Again about memory. It's a good track. Python memory from Tomek, right? We had a normal accident with the water, but I think everything is cleaned up now, and we can start now. Have fun. Hello. My name is Tomek. I'm Oynopian on the Internet, and today I wanted to talk a little bit about Python and its memory, and some disclaimers. I'll have code examples on my slides, and everything was executed at Lumbuntu. That's the 1204, 64 bits on CPyton 2.7. This is kind of an older setup. It comes straight from my company's production. If you run this code on other systems and on other Python implementations, you will get totally different results, and this might have been invalid. I'm not expert in CPyton. I do Python for a long time, and I don't like to dwell into its guts. At some level, this talk is kind of, I don't know what I'm doing. Everything that I will touch here is way more complicated than I have time to get into, so this is also a lie, as in it's oversimplified. I'm not even sure anything here is true. This talk is a case study or a report from Battlefield, mostly a battle to keep my sanity intact. I'm a web developer. I work for a company, and we have long-lived web processes. This is normal Django setup, so we want them to keep leaving for a long time. For some legacy reasons, we have a couple of requests, actually one request or one resource that if you hit it with HTTP, it will crunch some numbers and generate reports and show it to very important people. During this request, there will be maybe half a gigabyte or a little bit more memory allocated just for this report. It's fine. We have capacity planning that includes this. We know from our business this request happens only maybe twice or three times every day, so it's okay. If there's one process churning this memory, the other processes should be unharmed. But for some reason, this memory, this half a gigabyte memory, it's never released. This kind of links to the question we had for a previous talk. Sometimes it was okay for us. We just restarted processes periodically, but then we wanted to kind of dig deeper and know what's the reason and to improve our capacity planning, to improve resource utilization. We wanted to get rid of this problem. So request stays. We just want this memory to be released and for our application to get back to its kind of baseline memory usage. After spending many hours fruitlessly trying to find what it is that it causes it, we've reduced our code to something like this. We have a function or a piece of code that allocates roughly 100,000 strings to a list and each of these strings is roughly five kilobytes big. After that, we do some report on this and then we allocate some small amount of memory to keep results, gathered results and display it as a summary later on. The code doesn't need the big list anymore, so it deletes it. In real code, it was more of... The big variable would just go out of scope, but for here, we just delete it and we report some more afterwards. The report functions here will report memory usage as in resident set memory. So here's the output of this program, this reduced program. So after allocating this big list of strings, we have half a gigabyte of memory used and after using delete, so reducing the reference count to this list, the memory stays intact, more or less intact. This is not normal Python behavior. I like Python because I don't have to deal with memory, I don't have to deal with manually deallocating memory about thinking about this and just leave it to the interpreter and go on about my business and my employer's business features that I want to implement. At the first site, I was feeling, well, maybe there's some kind of hidden cyclic dependency. Maybe I'll just need to push garbage collector a little bit more to work. So that's what I did. I've introduced garbage collection into... A forced garbage collection into this small program that didn't help at all, actually. You know, importing module caused the program to use a little bit more memory. Yeah, at this point, I've started to question my own sanity. I needed to rest, I needed to gather my thoughts and after resting some, I've decided I need a friend, I need someone to help me with... to debug this problem. I was thinking, well, where's the memory leak? I have to have memory leak. Probably somewhere this code silently sends some strings into outer space and never releases it in the memory. I've tried to find a couple of friends on the internet, mostly in terms of tools I can use to debug memory usage. Piotr in the talk before described some of them and I found those tools usually unusable to the person that is in despair need of knowing what's happening. If you have time to get to know those tools, they're kind of nice, but the documentation is mostly horribly convoluted and the output of those programs is really complicated. The only thing I found that... the only tool that I found that worked for me and I could understand it is GAPI and it's a more memory-related part called HIPI. You can find it on the Sourceforge. You can also do PIP install GAPI and it works. Documentation, well, they claim to have it. There are some examples, but most importantly, it is kind of self-explanatory so you don't have to actually... you don't need that much of documentation. It's still better than others, although I've received email from Victor Stiner who implemented a very nice module for Python 3 called tracealloc and I haven't had time to test this on my code, but it looks like it's actually helpful and well-documented on the Python documentation page. So how does GAPI work? What does it do? And by GAPI I mean actually the only piece that I've used, so HIPI. You import HIPI from the GAPI and then you insert... then you request that you want to take a snapshot of your HIP. It's kind of an overloaded list so you can also slice on it and then if you print it, it will display a nice overview of what's happening in your program. I've introduced this into my small case study to see what is happening and if I'm saying or not. And that's the output it gave me. So we can see that after the allocation of Big List we have half of gigabyte represented in strings. You can see it in red bold number, right? It's 504 and so on. And the count is actually what we kind of expect. It's around 100,000, a little bit more, but the rest of it is just Python's... C Python's implementation detail and it keeps a lot of strings in memory. And after the allocating the memory we can see that roughly 100,000 strings go away and memory used by strings is only 800 kilobytes. HIPI reports memory as seen from inside of Python interpreter. And if you look at the total memory, after the allocating it claims the Python... the memory used by Python... by objects that Python is aware of is only one and a half megabytes. So, where did we go? Well, the answer is easy for anyone who recently finished university course on system operating systems or memory management but for me that was almost 10 years ago so it wasn't that clear for me. So, a little refresher. There's a phenomenon called memory fragmentation. What is it? Well, if you think about memory from the point of view of Python interpreter the memory is supplied to Python as a continuous address space and it has this property of growing and shrinking only on one side, let's say right side. So, if your program allocates some memory it's there, then you request some more memory like in my example test case it's being added to the right. And then we release the big chunk of memory, right? It's released but it cannot be reused because that would require the memory space to shrink from the other side. So, you know, this gave me a really good hint where the problem can be and I thought, well, I will go and relentlessly remove all the small allocations and do them before the big list allocation so to say prepare the memory so that the big allocation can be freely released to the system memory. But this never happened. I mean, I did that. I prepared this. But still the memory has not been released to the system and I still hasn't been sure that I'm sane. Now, this part will link a little bit to the previous presentation and at this point I had to go out into wild internet and vast planes of undocumented interpreter implementation details. And I had to learn about how Python actually uses system calls to grant you memory you wish to use. And basic lessons learned from this is that Python doesn't use the malloc. It's not actually system call. It's a standard library call but for our purposes here I will call it system call. Python doesn't use it directly because it's too costly for small objects. There's penalty for calling a function that leaves in the kernel space that is separate from your program and kernel has to do all the tasks required to protect kernel from your malicious program. Of course, we know our programs aren't so there's a lot of overhead there and Python implements more sophisticated allocator on top of malloc system call or standard library call. There are a couple of improvements it tries to do. One of them is called free lists and because Python interpreter runs your code highly dynamically it actually is true that you can look at your code classes and instances and there are just dictionaries and lists that have some special semantic to it some special way of describing it but from the perspective of a program running in memory it's just lists and dictionaries and you can imagine that if you have been for a previous talk you've seen that there's a bit of overhead memory overhead to dictionaries and lists so Python tries to keep those objects in close at hand so it doesn't immediately release your list if it goes away. It tries to keep a handful of lists and dictionaries ready to be reused because it will be reused to the next function internally it will be represented as object called frame and it will have a list of variables that are inside and so on so for many, for a handful of most common types there is a special cache where they are kept after you've released them to Python and from what I've been told it speeds up code execution immensely but this also gives us ability to play a little bit with this and see how we can abuse it so to check whether it's actually what I've read on the internet has some kind of relation to the interpreter that I have installed on my system I've devised a kind of a 3D torture which just allocates lists of growing length so the alloc function will allocate a list that has some strings in it and it's of length of i so each list will be a little bit bigger and immediately after making this long list those growing lists we just take every element of this newly created list and we put it in another one so we release this newly created list as I said before the three lists are kept in pools that have similar memory footprint so shorter lists will live in shorter lists slightly bigger will be nearby and so on and so on so here after deleting the big variable the big list the memory usage will not drop because of this problem at this point I've decided that the whole thing about my company having a request that allocates half a gigabyte of memory is actually completely ridiculous and I've decided to offload this work to a task queue so to say to more direct terms to a sub-process and kind of that freed me from the immediate danger of going mad with trying to debug CPython internal memory management so if you ever have this kind of problem and you try to think about it there are a couple of solutions and a couple of recommendations I have for you first of all try to make better use of memory usually if you have objects that you know will be leaving longer than other ones try to allocate them in a kind of longevity reverse longevity order the longest leaving objects first and then the rest if you have no ability to do that try to offload memory usage to sub-process and then let the system the operating system take care of reclaiming memory and cleaning up after you this is the lazy man solution and I'm lazy so I used it as Piotr said before there are other implementations of malloc one I've tried and found that it actually helped me in my own problem was GE malloc and you can load it using you can force Python to use it by using LD preload system environment no environment variable and I've tried to use it with my example you can see that it helps but it has drawbacks so in the upper output you can see what we've seen before it's just copied here and after using GE malloc you can see that first of all all-time peak memory usage is bigger GE malloc is much more sophisticated than malloc and it kind of implements its own memory allocation algorithm in quite sophisticated way so here we can see that it actually has bigger overhead but then it's actually easier for Python to release this memory to the system so this would work for my case as well on other hand I didn't want to replace this for everything because I was thinking that this would require me to test every piece of my program to see whether I haven't broken any other part of it and you can actually see that after this GC has released some more memory so this is even better GE malloc is used by some big names in our industry I think Facebook is using it for at least part of their systems and they are heavily involved in development of GE malloc so my time runs out so a couple of conclusions first conclusion is that sometimes memory leak is not what you think it is and sometimes you have to go back to the school to remember where your memory might be hiding and the other thing is that malloc from G-Leap C is not the best of breed funny story is that I've tried to share this problem with a friend of mine and I brought program with me from work to move to home and at home I have and the problem was totally solved I mean it was nonexistent and the reason being kernels on Mac use some kind of one version of GE malloc so this is a nice feature to have on your system as I mentioned memory intensive work work best in sub processes and as a kind of offbeat mention here if you are using any kind of C extension they are not using Python they might not be using Python memory allocator which means that they might kind of break Python ability to release system memory to the system because they will allocate it using malloc it's kind of complicated but we've seen this actually do harm okay that's it any questions? so any questions please go to the microphone hi thanks for the talk I just wanted to comment on the use replacing malloc with something else it's good for a stopgap solution for now but please remember that malloc is going to improve over time the Linux guys are going to make it better and better over time and all the other tools are not going to take advantage of that in 10 years from now or something like that so if you use the default malloc the program will improve even without you totting it well that is true it's not only malloc actually Python management in memory management in Python 3 is actually improving a lot as I've said Victor Stiner has written a very good piece of module called trace malloc that allows you to debug memory which is really really nice I was just wondering so in your case you were having a case where you would allocate a bunch of memory for a request and then that was happening over and over again and then you were building up what looked like a memory it was a lot of memory over time I haven't done a good job describing the problem the problem was we had a couple of web workers web worker processes and our capacity plan said they are allowed to be at baseline memory of let's say 50 megabytes of RAM but we haven't planned for them to be half of a gigabyte all the time it was okay for them to go and eat memory once in a while provided that there weren't too many of them concurrently and with this problem we've seen that they've started eating memory and then keeping it in there at the high peak and not releasing it and that caused our capacity planning to fail and that wasn't pleasant was it increasing over time though or was it constantly? it would just peak at half a gigabyte and then stay there no other request would even touch it so we could see that this memory is being reused internally in python but not return to the system and that wasn't good for us one last question please you said there's memory fragmentation inside the python memory area so would there maybe be a way to defragment that area like that python reallocates the memory areas I didn't get the question are you asking if it's possible for python to defragment memory by moving objects around? no that's for C python it's not possible once the objects stay and you can actually see that by running ID on an object it will just give you a rough a raw pointer value if I'm correct I might not be correct but they're not moving around and if you want that I think Java has this feature so you could try running your program with jiton and that would help also pipi has a lot better garbage collector and although it's as far as I checked last time it hasn't had this ability to move objects around but it was much more sane than what we have in C python okay thank you Tomek thank you very much