 We are here for the last two talks for the closing and I want to say hello and welcome to Itamar. Hi, how are you doing? Good, how are you? I'm fine, thank you. Where are you streaming from Itamar? I'm in the United States and Cambridge, right outside Boston. Cool, so it's early for you. I hope it's not too early. Cool. Okay, so Itamar is going to be talking about measuring memory, Python memory profilers and when to use them. Yeah, that's super interesting. And I see, yeah, you have been using Python for a long, long time since 1999. And your first era of Python was in 2002. Stay for the social event if you want, because we are going to... I want to have people talking about the first era of Python and saying some memories about that. So Itamar has been working in particular with Python performance. So he has some... Yeah, it's a book, right? In a blog, right? In Python speed. And you were also a maintainer for Twitter for some time. So yeah, I used it. Okay. I was using Twitter a lot, like, I don't know, five years ago or something. Okay, so I'm not going to steal more of your time. Let's put your screen live. So if people have questions, please write the questions in matrix. I'm going to be showing those questions later. Thank you very much and good luck. Hi, my name is Itamar Turner-Trowring and today I'm going to be talking about measuring memory, Python memory profilers and when to use them. And before we actually get into the actual talk, there's a... Before you get any software projects, one thing you should be asking yourself is, should you be writing the software at all? So today you're going to be learning how to measure your software so you can make it more efficient. But if your software is hurting people or hurting the environment, the more efficient your program is, the better you are at your job, the more damage you're causing. So what becomes a positive improvement gets turned to a negative improvement. So before you start on the software project, you should be asking yourself whether the software should exist at all. So assuming you've asked that, the bulk of our talk is going to be about what to do when you have high memory usage. And in particular, we'll be talking about the symptoms of high memory usage so you can recognize them and know that you have a problem. And then we're going to talk about a bunch of different causes of high memory usage and how for each one you might want to go about diagnosing it. And this is a large subject and different operating systems are different since this is a Python conference, I'm just focusing on Python. And to simplify, I'm going to focus on Linux. macOS isn't that different from Linux in practice. The tool is a little bit different. Windows is somewhat more different, but the fundamentals are all the same. So even if you're on Windows or macOS, most of this will apply. There's a bunch of examples which I've written specifically for this talk, but I've tried to make them at least slightly realistic. So let's start by talking about symptoms. So the first symptom you might notice is a slowdown in your program. It might be a little bit of a slowdown. It might very be a very significant slowdown. And the issue is that when your program, running program allocates some memory, it can be stored in two places. It can be stored in RAM, which is fast, or it can be stored on disk, which is slow. And RAM is usually much less available than disk. So, for example, the computer I'm using right now has 16 gigabytes of RAM, but 500 gigabytes or more of disk. So there's a lot more disk available, but it's also much slower. It's like 150 times slower or something like that, even for a fast SSDs. And so when your programs on your computer are using a lot of RAM, the operating system might decide to move some data to disk or swap it to disk. And then when you access that data, the operating system will say, oh, you actually need to do something with that data. And so it will move it back from disk to RAM. And both the writing out the data to disk and then reading it in from disk are fairly slow operations. Again, 150 times slower than reading or writing to RAM directly in your program. And so this procedure is in many cases perfectly fine. For example, imagine you have some data in memory that you use in the beginning of your program and then you never use it again. So writing that out to disk is fine. It frees out some fast RAM for the rest of your program. You're never going to touch it again. This will happen mostly in the background because you're not interacting with this memory at all. And so it's perfectly fine. It's actually making everything faster and more efficient. But if your program uses too much memory, the operating system will find itself spending a lot of time reading data to and from disk, which again is 150 times slower. And so your program will go from running really fast to running really slowly. And in the extreme, your computer will just completely grind to a halt because doing anything at all involves multiple reads and writes to disk and it just breaks up completely. And so there's a couple of ways you can diagnose this. There's a tool called VM stat. You can look at proc pressure memory. And I'm going to be mentioning a lot of tools in this talk. And so these two tools are, if you see their links, when you look at the slides, you can follow the links, learn more about them. So I'm not going to go into too much detail about all the tools because there's so many, but I'm giving you references mostly so you can learn more. And this will be true for later tools as well. So that's the first symptom, which is just your program, running fast on the grinds to a halt when you use too much memory and use too much data. Another symptom of using too much memory is that you fail to allocate some memory. So your program is running and the process says, hey, operating system, can I have some memory? And the operating says, nope, we're out of memory. This actually doesn't happen very often on Linux because of the other subsystems and issues, symptoms are more likely to happen, but it can happen. If, for example, you try to allocate a 302-bit array, you'll get a memory error. So if the failed allocation reaches Python, you'll get a nice exception and it will print out and so you'll get a sense of what happened. If it's extension code like C or C++ or Rust, you might get an error message in your logs or your program might just crash. So if you're lucky, you'll have something in your log saying, oh, you got a memory error or ran out of memory. Otherwise, if you crash, you should hopefully have a core file and you can look at it with a debugger and look for failed allocations in C code. The good news is this particular error mode doesn't happen very often. So your program can be slow, your program can fail to allocate some memory or your program might be killed. So a third symptom of using too much memory is that the operating system is always watching what's going on and it says, oh, we're using too much RAM and RAM usage is increasing very quickly and we're gonna run out and when we run out, everything's gonna grind to a halt. And in fact, on Linux, I think this actually happens when things are actually really pretty far gone and so things might actually be starting to slow down at that point. The operating system will say, well, if I don't do anything and this computer is just gonna stop, it's gonna grind to a halt and they come unusable. So I'm gonna at least try to make some things work. And the way it's gonna do that, it's just going to kill a process and it has some heuristics to figure out which process it should kill. If your program is just in a loop allocating memory, the operating system will notice. And so what will happen is your process will be killed with SIG kill equivalent of kill minus nine. It'll just die. And this isn't a crash or seg faults like we talked about earlier. It's not that it got corrupted in some way. It's just the operating system killed it. And to diagnose this, you can look at maybe if you're using Docker, the Docker logs might say it was killed with SIGL minus nine. If you're in Linux, you can run the Dmessage command and see the Linux kernel messages. It might say something like I killed the process because it was out of memory. The out of memory killer killed the process. So it's a third symptom. We have slowdown, we can have failed allocations which might give you a log message or a crash or you might just be killed by the operating system. The fourth symptom of too high memory usage is somewhat different because all three of those problems are basically due to not having enough RAM in the computer you're running on. And you can solve that by just throwing more money at the problem. You can buy more RAM for your computer. It doesn't cost, if you're buying your own physical computer, it doesn't cost that much. If you're doing cloud computing, you can spend more per hour and have a virtual machine with more RAM. And in some cases, it's all you have to do. So before, if you feel like you have too high memory usage, the first thing you should do is just figure out if you can just get more RAM and then the rest of this talk might be irrelevant. But once you start scaling up, spending more money starts becoming a problem because you might be spending too much money. If you're running it, if you need to go from 16 to 32 gigabyte RAM and as you just have buying one computer, that's fine. If you're running thousands of these computers in the cloud, then that gets really expensive. And it's a waste of carbon dioxide once while you're at it. So these are the four symptoms of too high memory usage. Slowdown, crash, being killed, or just spending too much money. And so next we're going to talk about the causes of too high memory usage and then some tools you can use to help you narrow down which part of your code is causing the problem. And I'm going to start from the most obvious and directs cause and move to more and more obscure and somewhat more difficult to solve causes. And so the most obvious reason you might be using a lot of memory is because you're just loading a lot of data. And this happens a lot in scientific computing, data science, generating a report. If you're loading a lot of data, then unless you take steps to structure code correctly, you might also end up using a lot of memory. More data, more memory. And this is solvable with a bunch of techniques, you have to actually apply them and it's easy to make mistakes. And so here I'm using the memory profiler tool PIP install. It comes as a command line tool called MPROF which lets me run a program that you can plot memory. And this gives me a plot of memory used by this program across time. And so the y-axis is megabytes, goes up to about 1200 megabytes, and the y-axis is seconds. And this is sort of a typical pattern for data processing applications. They start up, they load some data, they do some processing, we load some more data, and then they free up some data, and then they do some more processing and use more data. And so all this memory usage is driven by the processing and by the data that got loaded. And it's just a lot of data, a lot of memory. And so in these sort of situations, what you want to do is reduce peak memory. And this is a little different than from making your program faster. If you're trying to make your program run faster, then if you make the first half of your program 10% faster or the second half of your program run 10% faster, it doesn't matter because the speedup is cumulative. If you're trying to reduce memory usage, the thing that drives all the symptoms we talked about from crashes to high costs is typically the peak memory usage. So imagine if we optimize memory over here, and we got this instead of being at 1,100 megabytes, we got it down to 900 megabytes, or 600 megabytes. So that's fine, but we haven't changed the peak. And so if we haven't changed the peak, it doesn't matter that we've optimized anything else. So you want to measure memory at the moment in time when it's highest and then optimize that. And to do this, you can use the Phil-FIL memory profiler. And it turns out this is a pretty useful tool for in a bunch of cases. And Phil will tell you which code generated memory allocations that were present at that peak moment. And I should mention that I wrote it, so I'm a little bit biased. So you can install it. PIP install Phil-Profiler. And then again, instead of running your program directly, you run it with Phil-Profiler run, then in this case, your script. That's what the output looks like. And this particular visualization is what's known as a flame graph. It's a common visualization of performance problems. And it's showing you that at peak, it thinks it was using about 600 megabytes of memory. And there's a series of frames. And the width of the frame is what percentage of memory at that time was a result of that frame is responsible for. And so the wider it is, the more memory it is. And in this particular variation of flame graph, the redder it is, the more saturated the color, the more memory is being used. And so basically what you do is you try to find the reddest parts of this. And if you look at it, what it's actually giving you is a series of stack traces. It's saying line 38 of data processing the PY is calling function main. And then line 26 in function main, we call this makerA. And then the makerA function we called what looks like a NumPy function to allocate a large array. And you can see you have other stack traces here. And then I can double click on it and I can just read an actual stack trace and the code responsible for this allocation. And so basically this shows you a sort of tree of stack traces of all the code allocated in memory. And you can say, oh, in the makerA function in line six, I created this really large array. That's why I'm using so much memory in this program. Maybe I can figure out a way to not do that. And so that's the way you would start by optimizing a program that's just tied to how much data you load. Find the peak memory. And then you find which lines of code are allocating. And you start thinking about how to optimize it. But a lot of applications don't actually load that much data. And that leads us to our second cause, which is the load-in-release. For example, a web application, quite often it doesn't actually load that much data. It loads only a small amount of data for each request, just enough to render the result. And so if you measure memory over time for a web server, this is a little Flask application I wrote, there's a little bit of like noise here as different requests generate different amounts of data. But like this application is only really likely to use up to 45 megabytes of memory. It's pretty stable because it just isn't loading that much data. And so this application is actually fine. You don't have to worry about its memories. But sometimes you can have a memory leak. Sometimes you might have memory that isn't being freed. And so each request, even though the request itself doesn't use a lot of memory, over time this memory that's being lost, it's being leaked accumulates. And so here is a different program, another web server where this one is leaking memory. And so it starts out using 30 megabytes of memory. And after half a minute it's using 38 megabytes of memory. And so as requests come in, it's just you can use more and more and more memory. And eventually it'll use so much memory that it will slow down or the operating system will kill it. And then it will restart again back over here 30 megabytes and then the growth will start again. So the problem is that you're loading too much data. The problem is that you're leaking memory. So again, if you're leaking a memory, the peak point in time over here is a result of all this leaked memory. So it will show up if you use fields. So the field profiler can actually be used to catch moments, parts of your code that are responsible for memory leaks. And so here we look at memory usage. We look at the reddest part and it's like half the memory was just importing flask. So that we're only importing it once. That's not the issue. This looks like it's just that exit. It's just some sort of shutdown code. And over here, this is the actual code that's running requests. So we scroll down the stack trace and you click on it. And it looks like something about the responses for our web application are these strings are being allocated and never freed. And field doesn't tell you why that is, but now we know where to look. And if the memory leak is in your Python code, the issue is going to be because of references being held on. Python is a garbage collected language. And so it's Python will free up memory for you when it notices an object doesn't have any references anymore. There's no references. You're never going to use it again. So Python just frees it. If you're holding onto a reference, however, that object will not get freed. And so memory leaks in Python are usually caused by references being held on. And this particular code that we sort of got the general area from using field, if you look at the code, this function has a decorator called cash, choosing from func tools. And it's just going to cash all inputs and outputs so that you this can save you time if you're doing a slow computation. However, this is going to cash all responses forever. And if there's an unlimited amount of responses, then it's going to use an infinite amount of memory. So maybe you want to switch to at least recently used cash or cash or can only have 10,000 items or cash only does last 10 seconds. So we don't just use memory indefinitely. Sometimes you can't use field because the memory that's being leaked is small enough that it's not showing up as peak in your peak memory. And so you can try the scaling or memory profiler profiles. They give you sort of a line by line in your code. This line of code generated this much memory allocations. And so you can, that's an alternative way to find memory leaks. And then once you found a memory leak and you're trying to debug why are there references to this object? What's holding onto it? There's a tool called Muppie that's part of Pimpler that can help you find leaks, a tool called Object Graph that generates a visualization of the references to objects. And you can use those to figure out why are references being held to my object. And it's keeping it from being freed. Sometimes the memory leak is not in Python code. It's an extension code. And the patterns here sometimes it can be a sort of a reference issue if you're using smart pointers and C++ or Rust. Sometimes if you're using more C++ code as you allocated some memory and you forgot to free it. As an aside, you should be using Rust, not C++. Rust is much more secure. And so if this is the issue, you still need to track it down. But it's not about your Python code. It's going to be down to the CRC++ code. And lots of... There is a lot of C code and so on in libraries you use. So, map.lib pandas, you might write some of your own. Python itself is written in C, so it could potentially be responsible. So there's just a lot of potential places that leaks can be coming from. And so you can't just say, oh, it has to be this place that the leak is. It won't be obvious. You need, again, some tool to try to narrow it down. And again, you can use field to sort of... Once the leak is large enough, you can use field to sort of see that, oh, this particular stack trace is responsible for 62% of the memory at peak. Again, if you can't... Peak isn't showing enough of it. You can switch to scaling or memory profiler to try to catch it. Scaling may not work quite as well. For C extensions, depending how they allocate memory though. But once your leak is large enough, field will catch it. And then once you've narrowed that down, so here we see this line of Python code is probably where the memory is coming from. But that's like... There might be a lot of C or C++ code behind that Python code. So you want to narrow it down even more. And so what you want to do is you want to create a minimal reproducer. The tools for finding C memory leaks are very slow. So the smaller the code, the better. Once you have a reproducer, you can... If it's an open source project, like you found a memory leak in Pandas, let's pick a random example. You can file a bug with a reproducer and hopefully they'll fix it. If it's your own code, you can compile the code debug flags, minus G for C program. And there's a tool called Vailgrind that you can use to find memory leaks. So here we're doing a Vailgrind, a leak check equals yes, and then we're running the minimal reproducer. And then it's going to basically put out a lot of output saying, this particular stack trace, a bunch of memory was lost. And if you look at the top here, so there's a bunch of Python code, and it's calling to a ScytheMuthBuffer in it, which is what we expected from the field output. And then it's calling to Malik. So we should be looking for a call to Malik in the Bethra class, and that will tell us sort of, that was where the memory was allocated. It's not going to tell us why it's leaking it, but that can help us spot the bug. And so here's the ScytheMuth code that's leaking the memory. And again, this is just an example. So if you look at it, we can see there's a Malik call in and in it of Buffer, and that's what both ScytheMuth, Vailgrind and Field identified. And so it's allocating the memory, which is what we expect. And over here, we have some code that frees it when the object is freed. And so we expected that member, whoever wrote this code, expected this memory to be freed. But it turns out in ScytheMuth, which if you're not familiar, is like a Python-like language it compiles to see. In ScytheMuth, the special method for cleaning things up is actually de-alloc, not dell, unlike normal Python. And so this free was never actually being called. So if we change this from dell to de-alloc, we would fix the bug. And then we wouldn't have a memory leak anymore. And this is just an example. But the broader point is once you've used memory profilers to identify which parts of your code are responsible for allocating the memory that's not being freed, that doesn't tell you why it's leaking. It just tells you where that memory came from. And you have to then read through the code to see what's going on. Maybe there is none. But using these tools to find the source of allocation is a good starting point. It's just a starting point to fixing the problem. So we've covered memory use being high because you're using loading too much data, leaks in Python, leaks in C. And all of those are arguably problems in your code. There's one final cause I'm going to talk about for high memory usage, which is not actually your fault. And so when you allocate memory, you're sort of allocating a chunk of memory, then freeing it, allocating and freeing it, this is kind of like the knapsack problem in computer science. You're trying to fit all these chunks of memory into as little space as possible. And the knapsack problem is extremely difficult to solve optimally. And so all the systems that are trying to fit all these allocations into a limited amount of space are doing so with some heuristics. And these heuristics can go wrong. And what that means is your program is doing nothing wrong, which is freeing and allocating memory. But you hit a certain pattern of allocations and freeze that end up with holes in your memory that cannot be returned to the West. So you end up using memory, filling it up with garbage through no fault of your own. It's just memory got fragmented. And the symptoms will be something like a lot of resonant memory, like the RSS, which we run PS, I'll show you for process. But even though the resonant memory is high, you're not actually using any memory. You know that you freed everything up. And there's a bunch of workarounds. So one of them is you can try setting this environment variable on Linux. But it's a annoying problem. And there's a bunch of references here, links to the slides, which discuss it and various other workarounds. So if you're not using too much data, if you fix it, there's no memory leaks and you're still saying high memory usage, try looking into memory fragmentation and trying some of these workarounds. So to recap, the first thing you want to do is identify whether you have high memory usage via symptoms like slowness, errors crashing, being killed by the operating system, which is more likely than crashing or just spending a lot of money. You can typically use field to find the responsible code so long as the memory is... that you care about is sufficiently high percentage of the peak memory. For memory leaks, you might want to use scaleen or memory profiler instead. And then there might be other tools you use to, for example, debug C++ leaks or to find Python memory leaks, find the references. And the final step, which basically we haven't talked about at all, is to fix a problem. And that's depending... there's a whole bunch of techniques and tools and approaches you can use, which you obviously don't have time for, but if you go to my site, there's a whole bunch of articles about more of the focus on data science and scientific computing, but more broadly, articles about reducing memory usage in Python. So you can find the slides at pythonspeed.com slash your Python 2021. If you go to pythonspeed.com slash memory, you'll find a whole bunch of articles about reducing memory usage and measuring memory usage in Python and what I talked about today. And I think we might have time for a few questions. Okay, thank you. Thank you for your talk. So it's 28, so let's try to show one or two questions. So one question is, what's the overhead of using MPREF? MPREF will sample every 0.1 seconds to see what memory usage is. So the overhead is very trivial. The downside is if you have a brief spike in memory, it might miss it because it's not fast enough. The overhead of field, which does not have that problem, it catches everything is much higher. It's more like 40 to 50%, 40 to 50% of your runtime. I'm working in a version of field that is fast enough to run production workloads that still work in progress. If you're interested, you can touch. Okay, so then the last question is, does field accumulate memory load loaded from a specific position when it's using loops, for example, or it does represent the data individual? It accumulates it. Basically it unifies allocations based on stack trace. So if it's the same stack trace, it will be reported in the frame graph as one frame. Okay, so we are out of time now. So I'm going to copy the rest of the questions. You can go to the breakout, or Optiva room, and Itamar will be there. There is also Jitsi. So if you go there, there's a widget link in join and there's going to be a video call that you can use to keep asking questions. So thank you. Thank you very much, and I hope we see you again next year.