 I'm Manuel Miranda, software engineer in Skyscanner. I've been working there since November from the last year, and previously I was working in Research, which was about the algorithm optimization and simulations and stuff, so that's basically from all my experience comes from. I hope you find the talk interesting, so then I don't have to see people with their phones throwing pokeballs to me, so let's start with it. So first of all, like, well I put the basic level in the talk because it's first stack and I wanted to try it. And I'll go first with the strategy, what I usually check before trying to optimize a new program or not, key points I've learned through all this experience, and then I'll just show some tools I like to use, and for me they are really useful. I'll go from operating system tools, which I'll skip very fast, then I'll show some resources tools like memory, CPU and stuff, and then some more fancy that I call them advance because they do more stuff than just checking one resource. So don't worry, this is the slide with the most text in the presentation, so I promise it will be just this one. So the main key points I like to check when I start with a new program is that those, so the first one is focus. So when I start checking a new program, I like to think, my head is there, so I like to think what exactly what I want to achieve, like which is the degree of acceptance I want to achieve by optimizing this new program, because sometimes you're so happy about optimizing a new program, oh, yeah, I'm gonna raise the speed up, I'm gonna make it faster, and blah, blah, blah, and you end up being so deep in, sorry for the word, shit, like you don't know, like you start on Monday optimizing the program, and then you stand up and your teammates say to you, like, good weekend, man, oh, shit, I didn't do anything and just still struggling with it. So next key point is cost. Like, is the optimization we are doing to do worth it, like the company is paying you money or you are wasting your time, and for example, if it's a speed up of just winning some minutes for each execution and the execution takes some hours and maybe doesn't worth it, so the cost and the focus are a bit like related. Then the third one is code knowledge. If some of you have been through legacy code, sometimes you see these bunch of code that it's like, I mean, why this guy is programming in Python and doing these big 1,000 line files when you have just built-in functions like all or more, these comprehensions and this kind of stuff. So before starting optimizing the code, you should, if it's not your code, you should ask, like, why is that way? I mean, because in legacy code, believe me, you change one line, one print, and bad stuff is starting to happen. Then also, context awareness. So context awareness. This is related more like, do you control all the environment, the things you are trying to optimize, do you just call or do you depend on random stuff going around like network issues? So random stuff like network issues or my SQL queries that sometimes take random time. So before trying to optimize with this noise, you should be isolated from this stuff. Optimizing these queries or network performance stuff, it's related with another scope. You should go into that part. And the last one is local context. So sometimes you just start and you say, I'm going to optimize that. I'm doing a git clone and I'm starting coding and everything. And you spend like two days coding and then you move that to production. You obtain it like half time for executing your code. You move it to production and it takes you more time. Why? You don't know. Maybe because virtual machine stuff, the resources are different, operating system, kernel version, whatever. So before starting, set up a nice environment. Try to reproduce it as much as you can. And if it's not possible, don't just wait two days for moving your code to production. Do it iteratively so you can have feedback soon enough. So that's after three years of working with this kind of stuff. I'm still not applying all of them sometimes. But I think they are really important points that can save you a lot of time if it's like a big job you have to do about that. So here you can check my skills in design. So usually I try to approach this from the outside part to the inner part. So usually when I start with a new code, I just try to check how much, how long does it take to execute. Once we know how long, we also want to know the resources it consumes like memory, CPU, or this kind of stuff. And then you have these two things, the good thing, like the thing you have to do then is understand why. Why is taking that time and if it's taking this time, is it normal to consume all these resources? And to do that you have to have the code knowledge we talked before. Because you have to know what the program is doing and this way you will know if it's reasonable or not to be consuming all these. So once you understand the code, you can just start writing or like going more inside and this kind of stuff. One thing I want to command is that when you take one code you don't know, usually this measuring time, measuring resources and all stuff, you apply it to the whole code. But if it's your code, you usually know which is the problematic part of the code. So you just go to monitor that part. Of course once you apply the code part, you just have to execute again so you don't mess up with the whole execution. But that's basically the flow I usually use. Oh, I lost this. So let's start with very basic tools. I know you all know them. But for me the most interesting ones, I mean the first thing I do, maybe the first five minutes, is using time and age stop for checking how it goes. The thing I like usually for time is to check if my program is really resource bonded or are you bonded? Because I don't know if... How many of you don't know the time tool? So basically the output it gives you is that the execution time in CPU and the execution time of the user. So if you have many blocking stuff like network queries or MySQL queries, it will tell you which is the difference of it. So it can give you an idea of the... of the things the program is doing. Then this age stop is basically... Well, I started using because I started using... Well, they made me start using Mac. And I don't know if you have checked the output of top, but oh, wait, I think I have to move the presenter view. I don't know how to get out of it. So if you check the output of top from Mac, it's like, I mean, if someone can read something about that, I'll give you a rare Pokemon because it's just like messy. I don't know. I mean, in Linux, it's pretty much there. So if you go with age stop, I mean, at least it orders by CPU. It shows you the four different processors, memory, and blah, blah, blah. But as I said, this is really basic and you already know all that. So let's go to some more... Wait, please. More interesting stuff. Really. So first one is memory profiler. This one is quite interesting because it allows you to check the whole flow of the program, like the memory consumption for the whole program, function-oriented, bylines, and all this. So just to... Well, there is another feature, interesting feature that can you trigger at the bugger once you've reached a maximum capacity for memory. So you can tell when executing, like, if the memory I'm using is more than one gigabyte or 100 megabytes, just drop me into the bugger console. So then you can check the status of the program, the objects you have initialized, and all this. So this is some examples of the output. Let's start with that one. For example, here, if you can see from the back, I don't know. It shows you, like, for example, we have main, that it took, like, 17 seconds, and then we have first costly function and second costly function. And it just marks you where its function starts. So it gives you an idea of the whole program, which is the function or the function you should focus on, like, to really improve the memory consumption, which sometimes can be a problem. So as another example, here, we have the terminal output for just checking the memory consumption per line, which is kind of interesting, too. This one is really slow. It takes, like, eight times more to execute, but, I mean, for one time, it's pretty enough. So let me show you the... Oh, my God. Does anyone know how to move the terminal to the presentation part easily? No, it's just because I don't want to be out and just out that, but, well. So the program I used for that profile, can you see from the back? I don't think so. So the program I used as a test is that one. So, I mean, you've already seen the graph, but when I was first trying that, I thought that this prime would take much more memory than a trial division because of the big number and stuff, but it resulted the other way. So with just this basic tool, you can just track useful information and something interesting to have in your tool set. So the next one is for the other resource, which is called Line Profiler. You know, people in Python, usually, we get really cool names for the tools. We are really original. So this one is an advanced version of Cprofile. Cprofile is the built-in profiling tool for Python. I'm not presenting it that because it's, well, the one used always. But basically, it's useful to know the CPU consumption time of your programs. It shows you per line, per function, the average percentage of consumption, of time consumption of your lines, your functions and everything. It's compatible with Cprofile output. And that's something interesting I found out that, I mean, sometimes you start profiling and those profiling tools, like multiply the time of execution and then you are profiling and you get pissed off and said, fuck it, and then you take control C and usually with some kind of tools you lose all the progress because the report is generated at the end. With that one, it just generates what it displays you, what it calculated from that time. So it's pretty cool. So that's the example of the output for that one. I mean, it's the same code. And for example, here you can see just the execution time, like the percentage time, the number of hits for each line, the time in total, and the time per hit. So, for example, here we can see which is the problematic function too. Like this one is one grade of magnitude above the first costly function, which if we wanted to go to check the contents, like which one we want to optimize, we would go definitely for that one. So that's pretty cool. It also takes some time to execute with the full code. I mean, if you have code that takes, like, hours to execute using these tools are kind of, well, it's too long to use them. And it's too long because usually you have to monitor the whole code. And if you have done this kind of stuff sometimes, it's a bit, well, difficult because you have to modify your source code. I don't know, well, the profile decorator it was there. So if you want to monitor the functions, you have to use that decorator. Then you have to change your source code and then you execute. You realize that that wasn't the function and then you go to another function. You have to execute again. So it's kind of a messy stuff. So that's where our super ipython comes to help. So ipython line profiler and memory profiler are supported as plugins for ipython. And that's really cool because it allows you to profile any function in your code or source code of any other library interactively. So you don't have to execute the full program. So just let me show you one of the example outputs, but now we will play a bit with it. So we are, if you check at the top of the screen, we are just using the load extension line profiler. Then I'm importing my program and then I'm using the magic command lp run with the function I want to profile, which is the minus f. And then the program I'm calling, which is run malts. Well, run malts dot second costly function. So here we are having the report for the line profiler, which is one level deeper than we had previously. It's just monitoring for the outsider function. So here we can see that why is the second costly function more costly is because it's indeed calling this eratos tenes method, which I don't know what it does. I think it's something with prime numbers, but that's the one that it's actually taking lots of time. So here we have also the hits, the time, the per hit time, and the amount of time it's consuming, which is what we want to actually work with. So just to show you that I'm not lying, okay, so, oh cool. So if we go to, I don't know which, I think it was that one, iphone, and then I'm, let me put that on top. Then I'm losing the load X line profiler. Then from my program, I'm importing the second costly function, and then from algorithms, now we are going to introspect the eratos tenes function without touching our source code. So you can check how easy it is to check, like, how much time each line of the eratos tenes function is spending. So from algorithms.math.seef, well, I knew that already, but sometimes you will have to check in the source code to see where each function is, but you will know that. So we are importing this function, and now we are running the Alpy run, we want to profile the eratos tenes function, and we are going to run, again, the second costly function. So now we are running the second costly function monitoring the time that eratos tenes is taking. So what if the second... Now it's taking only, like, 20 seconds of stuff. So what if the function took, like, two hours? Really, we have to evaluate, like, spend two hours just to check the cost of the eratos tenes function. That's another cool thing we have to... We can do just... Well, control C, you know, there's a report now. So we can just call the eratos tenes function by itself by calling it with random arcs, and it will show you as the report as fast as it calculates, just one iteration, because in the original program it was calling it, like, 800,000 times, which is not necessary to have a basic profiling of the function. So we can see, again, here, which are the functions that are, like, well, functions... Well, operations that are the most time-consuming. So if that was a real problem we want to optimize, we would check, like, those ones are really loops we need. Are these operations the most optimal ones and stuff? We are not going to go into this detail, but now at least we have spotted where we want to do all our stuff. And, I mean, for me, iPython is just pretty handy for doing this thing interactively. So back to the presentation. I mean, if some of you have worked with that already, I mean, come on, like, if you search for profiling in Python, line profiler and memory profiler, and these kind of tools are the first ones you obtain. So this second part of the presentation, which I call the advanced ones, are more cool or fancy tools that displays visual graphs so you can play more interactively with them, which, I mean, that's cool. At least it looks cool. So the first one I like, well, I kind of love and hate at the same time, you will see why, but it's this one cold blob, it doesn't have much commits, and it's not that much maintained, but it does its work. So the features it has is that it's a really low overhead profiler. So why is that? Because it uses the Strace and Ltrace function from the operating system, which basically it just reads the stack of the program being executed, so it doesn't interact with your program, like we've seen with memory profiler and stuff, which just puts stuff in the middle of your program. So it does this stack analysis, and it basically displays a call graph of all your functions, how they are called between them, and the time spent with them, and it also displays flame graph, which I don't know if you know what it is, but it's something pretty fancy, and, for example, in Netflix, they are using it. I don't know if you follow their blog, but it's pretty useful. So there has a server running with tornado, and with a decent setup, you can just feedback this viewer so you can upload this visualization in real time. So to show you these things, for example, here we have the profile view. That's why I hate the program, because, I mean, the call graph is a bit messy, and then if you want to just move the stuff, you have to play a bit and do, like, well, move it a bit so I can see it better, so it's kind of shitty. But here we can see that the Rhetosthenes function is indeed the one consuming most time, and the main is with the one calling secondCosly function and firstCosly function, so we can see the flow of it. And then we also see the size of the width of the arrows is the time spent also calling these functions. So it's pretty useful. And also with the flame graph, I don't know if you don't see anything here, but let me explain it to you. So it's the same. Like, here at the base we have the main function, the main function calls secondCosly function and firstCosly function, so you have to go from bottom to top, and the width of the block tells you how long did it take to execute this function. So here at the bottom it tells you that main took 99%, this Rhetosthenes call took 89%, so that's pretty cool. And let me show you why it's called flame graph, because with a real program it looks like that. Which is lots of stuff, but you, you know, have to iterate a bit and learn how to isolate stuff there. So next one, it's called, oops, Pyformance. I'm running out of time, so I'm going to skip the code I had prepared, but you can reach me out then later. So Pyformance, it doesn't have fancy graphs, but it's pretty cool because I bet you have done a lot this typical from time, import time, start time, time dot time, then we do some stuff, then end time, time dot time, and then we apply the subtraction and then we log it and stuff. So basically Pyformance is a set of tools that with context managers and this kind of stuff, it gives you how many times a function has been called, how long it took a function to process, how many, this one is really interesting, the measure rate of events over time, so it tells you how many times your function was called during the last second, during the last minute, during the 15 minutes, so this is really cool for generating metrics for an API or for this kind of stuff. So it's not exactly an interactive or optimization tool like for checking the profile, but it's more like for generating metrics so you can just monitor how your application is going. Something you have to take into account is that these timers, which are the ones measuring rate of the events, are set variables, so you have to be aware that if you are using threads and this kind of stuff, it uses logs internally so it can just end up being a mess if you don't use it properly. So that's what I had prepared, but I'll upload the slides because I want to present this last one, which is the one that I... basically. So this last one is called Kekachegrint. This one was originally for profiling C and C++. In the research job I had, I was doing C++ stuff, and that's the one I was using for optimizing all these pointers and operations and things. It's a really old tool. I think it started at 2002 or some stuff, so it just uses... well, it displays you the call graph, it displays the execution time, it displays block view of the time spent for the functions. Also, the time cost for line, which is not true in Python, but it was in C++. I haven't been able to make it work properly in Python, but I think I want to do it. It also displays assembly code, which, of course, I use every day, and it reads from C profile output using this tool, which is by-proof to call tree. So just to check the output, this is the call graph we have from the same program. So just to... instead of showing you an image, let's open the program so you can see how it goes. So I have this profile already generated, but basically what you have to do is call your Python program with the C profile module, which will output the profile information, and then you have to convert it to call green format, so call green can... well, K, category can understand it. It's called with kill because in Mac, K, it doesn't work, you have to use the kill category. Yeah, it's just like, well... So here we have this. So basically here we have the other functions that have been called in our program. We just want to check the ones that we basically control. We are not going to optimize the math s-square root of Python or we are not going to... or the length or that. So we just want to check the main and the hostiness and the ones we actually programmed. So in that view, we have them ordered by the inclusive time that it takes. So the main has taken 100% of them, obviously, and then we have from the most time consumed to the least time consumed. So if we order by itself, we will see again that at a time since it's the one that consumes more time. So we can go, for example, here and here. We want to select main. Okay, so in the first one, it's a fast... just rather than ordering here and all the stuff. In the top right part, we have a block view, which just lets us easily check which is the function or the functions that are taking the most time to execute, which is like, I mean, just by one look, you can check. Okay, here is the problem. And in the bottom right part, we have the call graph of our program flow. So we called main at first, then main called first costly function and second costly function, and then first costly function called is prime, second costly function called trial division, and both of them called error toastiness. And again, like in the blob tool, we have the size of the arrow and this thing here that tells you how much time was spent calling that function, which is also the same here. And then that 1x is how many times we called that function. So that's a really interesting tool. Another thing I really like about that when I was programming C++ is that, which now obviously doesn't work, but it was showing you a report of, like, we've seen in line profiler. So for any function of your code that you have called, it would display a cost, well, CPU time cipher next to the line. So you know every line how much has taken to execute it. I'm still working on it, so I'll update if I make it work correctly. So just to finish, how are we about time? Are those five minutes with questions or without questions? Oh, cool. Then let me show you, like, the code I skipped from the PyForments one, which is, I think, it's interesting to have to here. So now we are going to go to the terminal. It looks better. So this is a code that is using PyForments. So basically, I'm importing timer, which is a tool given by performance. So you can do with timer test dot time. That's the shared variable I was talking about. So in any other part of your code, you can just access test timer and just print the mean get max, get bar, which basically it's printing the mean time it's taken to execute that part of code, the variance of the executions, mean rate, which is the number of executions for one second, wait one minute, wait for one minute, and this kind of stuff. It's pretty handy because, I mean, imagine you had a code and you want to keep it, like, you know, it's being executed around with 10 seconds or so. So you can just use this tool to trigger an alarm or lock and tell you, oh, God, this function took, like, 20 seconds now. Something's going wrong. So you can use that for your performance tests or this kind of stuff. So for a test also, let me show you. This basically it's telling me that there is a low slow execution because I don't know if you saw the threshold Cypher I had here that I have threshold 0.21, then if the get mean is above the threshold, it's telling me, slow execution, something wrong happened. And why is that? Because I'm doing a slip, round dot dot uniform 0.10.3, which not exactly, but the time it should take should be 0.2, but because of internal stuff, it takes a bit more. So that's handy to have. So now, yeah, for finishing the presentation, I know I've presented a bunch of tools, some of them won't fit on your toolset, some of them will. So one thing is really important before starting with this kind of stuff, it's just building your toolset, like what exactly is it? Do I need this kind of tool or do I need something more advanced for the framework? I'm using Django, the book toolbar or if I'm using GF and Greenlight Profiler because C profile doesn't work well with them. So just try to do some kind of research before using any tool I find in Stack Overflow or Google results. So that's pretty much all. I hope you found it interesting or at least you learned something new. So if you have any questions or if we don't have enough time for questions, you can just reach me outside, I can talk about these or any other random stuff. I'll be happy to talk with you. So questions? Well, first of all, thank you. Very nice. Just a simple question. Which one of these do you use the most? So as I said, my favorite one is K-Cachegrin because the execution is really slow but just by one execution you have the full view of what's going on but the thing is that since I don't have the CPU time after that execution, usually I end up using line profiler for that interactively with ipython because once I have the big picture then I'll, okay, that's the function that it's pissing me off. I'm going just to call to the ipython I call it manually so I know which lines exactly inside that function are the problematic ones. So that's basically the two of them. But the other ones I like also to use them because sometimes by using different tools, I mean by just using one tool, sometimes you just have a narrow view of what's going on. So by having different tools, sometimes with this flame graph or this blob stuff or memory for example you have more big picture of what's going on because with just one tool you don't see all of it. And sadly there is no tool that does everything so you have to play a bit with it. Okay, thank you. We have more time. We have some minutes for questions. Very interesting talk. Thank you. Do you have any advice on measuring the performance of an ongoing process like a long running process or something like that? So you mean something like API or? Yeah, something that is running in the background. Streaming things. So I mean there are two kind of things here. For me long running processes can be simulations using in research that can take like one week or I don't know, a lot of time. So for those kind logging is a really important one. Like logging, for example this performance one I've showed you it's really interesting because then you can just have feedback different metrics inside your program. Like how many calls how many times have you called this function you know is the big one is that number of calls the one you were expecting or not. So that's for one side and for the other one for example APA calls and this kind of stuff. As I said by performance is also an interesting one because it can tell you like how many times this endpoint has been called or this kind of stuff. And the plop one, I'm not using it in live services and in production services but I know people is. They are using it just to know how stuff is building like this call graph is building during the execution. You have to set up because the comments I'm showing in the presentation are for the whole process. So you start the monitor and then it ends and then it creates a file but it has context managers and this kind of stuff. So you can open the file register the activity close the file and then call another function so this file gets dynamically evolved. So then you can check the visualization from time to time to see how it goes. But for me basically the most interesting part is the logging. For the long ones, logging. Oh, I don't know if it was your. Hi, interesting talks. Thank you. The line or the memory profiler, do they also work with app parts or with command line parameters? With command line parameters? Yes. If my script is working on command line and expects parameters and they do they mix up or I saw that you called it with a minus. Because you mean because of the thing. So you can call it that way then it will just monitor all the decorated functions which take less time but you can also call it with normal python. So you do python minus m memory profiler and then your program with all the arguments you are using for your script and then it will work anyway. So there is two ways of calling it. Thank you very much.