 Hello, my name is Elizaveta Shashkova and today I'm going to talk about some interesting new opportunities appeared in Python 3.6. At first, let me introduce myself. I'm a software developer at the JetBraise company. I'm working in PyCharm team. I work on debugger in PyCharm. And I've come here from St. Petersburg. When we write programs, unfortunately, we always introduce bugs into them. And there are many different ways to find these bugs. For example, you can just simply add print statements. Also, some people prefer logging. In fact, it's the same print statements, but with ability to turn it on, off, or add some options. But there is a big separate group of tools named debuggers. Debuggers are much more complicated than logging. Because they allow users to post the program in some place, they allow to execute stepping commands. So to monitor the program execution line by line, watch the variables values, and do something else. But unfortunately, there are many people who prefer print statements and logging to debuggers. Why such people exist? The answer is quite easy. Because debugger is rather slow. On average, in big real-life programs, it's usually almost 30 times slower to run program under debugger than run it without debugger. I think it isn't breaking news for you, because everybody knows that debuggers usually slow down program execution. But why does it happen? And what can we do with that? Today in my talk, I'm going to answer to these questions, and we will learn how to build Python Debugger and how to make it faster. Let's start with tracing debugger. It's named tracing because of tracing function. Python provides a standard way to set this system tracing function. It takes three arguments, frame, event, and arc. Frame object, it contains the information about the current state of the program. Event is the string representing the event, which appeared in the program. And argument, the argument of this event. Let's define very simple tracing function here. It prints the line number under the execution and event, which arrived to our program. Let's see how it works. For example, we have very simple function, and we define this tracing function. On the first, we will receive event call on the line one because we called function foo. After that, we receive event line on the line two because line two is executed. After that, we receive two events line again on the lines three and four. And after that, the output high bobb will appear in our program. After that, we will receive event lines again on lines three and four. And then execution goes to line five. And we receive event return on the line five. That means that we are living the current function, we are living the current frame. Okay. How can we build debugger based on this function? Debugger consists of two parts, break points and stepping comments. Breakpoint allows to stop program on some special place. And stepping comments allow to execute command, mature program execution, line by line. And we can implement both these parts of debugger with our tracing function. For breakpoints, we can, inside our tracing function, we can check the current line number. And if its line number equals breakpoints line number, we understand that we need to pause program in this place. So we call some breakpoint function, which pause our program. In fact, it's continue in infinite loop until user continues program execution. And for stepping, we can use our tracing function two. We just check the type of event which arrived to our program, to our tracing function, and handle it in different cases. Okay. We build our tracing debugger. It works. But we tested it just with a very simple program. But what if we consider more complicated program? For example, this function calculate, it sums number from zero to the seventh power of 10. And let's define very simple tracing function. It in fact doesn't even print anything. It's just return itself to continue tracing in the current frame. And let's run our function calculate. If we run it without debugger, it takes about one second to execute our program. But if we run it with our tracing function, it takes already almost seven seconds. Our tracing function is very, very simple, but we call it on every line of our program. And if, for example, we have in our tracing debugger three breakpoints, so it means that on every call in our tracing function, we need to iterate through loop with three elements. It takes already almost 20 seconds. So it takes much more time. The program becomes almost 25 times slower. And let me remind you that the tracing function was very simple. And this small experiment shows us that explains why running debugger is much slower than running program without debugger. And the main problem with our tracing debugger is that we call our tracing function of every line of the program. Okay. Let's remember this problem. And let's consider a small story about Python 3.6. As everybody knows, Python 3.6 was released half a year ago. It has many cool features. And one of them is new frame evaluation API. It was introduced in BEB 523, Python enhancement proposal. And BEB 523 allows to specify per interpreter function pointer to handle the evaluation of frames. And also it adds a new field to the code object to use it by this frame evaluation function. Okay. It sounds a bit tricky, but we will consider an example. We'll try to write our custom frame evaluation function. This is frame evaluation API. This is in fact C API. So in order to use it, you need to write C extension. But for example, you can write Python extension like we do it in PyCharm. And for better readability, I will use Python. But in fact, this code is written in Python. Okay. We're defining our custom frame evaluation function. It takes two arguments, frame object, which we've already seen in tracing function and exception flag. We have frame object, so we can get the name of the current function. We can get the line number of this frame. So let's print this information and call the default frame evaluation function. We don't want to change program's behavior. We just want to print some interesting information. And let's see, and we need to define this, to call this custom frame evaluation function. And let's see how it works with example. We have three functions. First, second, and third. One call, the first one calls the second. The second calls the third. And when we run this program, we get this output. That means that our custom frame evaluation function was called on the line one when we entered the function first, on the line four when we entered function second, and on the line seven when we entered function third. Okay. It works. That's great. And also from this example, we learned that our frame evaluation function was executed while entering every new frame. And inside this frame evaluation function, we have an access to frame object. So to the code object as well. Okay. We know about this new cool Python 3.6 feature. And you remember that with tracing debugger, we have problem that we called tracing function on every line of the program. And if we... What can we do with that? We can remove tracing function. But in this case, our debugger will stop working. So we can't just remove it, but we can replace tracing function with our custom frame evaluation function. And let's try to build frame evaluation debugger. Debugger based on custom frame evaluation function. As you remember, every debugger consists of two parts, break points, and stepping comments. Let's start with break points. When we had tracing function, we had a complete mechanism to monitor program execution. Because in every line, we know all information about event, about line number. But in case of frame evaluation function, we don't have such mechanism. We have only frame object. And we need somehow insert break points into this new frame which we are entering. And we can do it another way. We can insert break points code right into frame's code. So for example, if we have very simple function maximum, which returns the biggest value of two arguments, how can we return the break points? For example, if we want to insert break point on the line three, that means that we want to insert some break point function call right before the return statement. So after our modification, the result will look like this. Before returning the value A, we want to call our break point function and suspend program. And we want to wait for some user comments. Okay. How can we insert one piece of code into another piece of code without changing the source code? We want to modify byte code. Let's use standard module Ds, which shows the byte and byte code in a human readable presentation. For example, for our function maximum, the byte code will be like this. This byte code is generated for line two, this for line three, and this for line five. As you can see, the byte code for line four wasn't generated because if else construction was replaced with this pop jump if false operator. Okay. Our byte code. In fact, we're not interested in what's going on inside our byte code because for us it's just a sequence of operators with or without arguments. Each operation has its offset, this even numbers, and arguments, which can be absolute or relative jumps. For example, here we have an absolute jump from the operator pop jump if false to the operator with offset 12. Load fast. Okay. And we want to insert our break point code. Okay. We can just take sequence of bytes and insert it into another sequence of bytes. But we can't do it just without changing anything because as I've already said, we have some jumps. So some references from one operator to another. And when we inserting our code, we need to update some arguments offsets because all operators after break point, they goes down and their offsets will be increased. So we need to change references to them from the other operators. But when we do it, our modification will be done because the resulting code will be the original code, but with the additional calling to break point function. It sounds a bit scary, but in fact it is 200 lines in Python. We just have to write it carefully and it will work. Okay. Now we know how to insert break point, but we need to decide what to insert. It's quite easy to answer because for our break point, we can create some simple wrapper and it's byte code is shown on the right side of the slide. So we're just calling global name. We're just calling global function. Before the byte code modification, we add this global function to the frame global variables dictionary. So we can quickly just call it and inside this function, we can do anything we want at some additional debugger functions and we don't care about it because for us, it is just calling some global function and it's quite simple. Okay. Our break points are ready, but we still need to implement stepping in our debugger. There are two ways to implement stepping. Of course, we can insert temporary break point on every line of our program, but in such case, we will return to the previous situation. When we called tracing function on every line of our program and it slowed down our program significantly. So we won't use this opportunity, but we will use all tracing function. When user wants to execute some stepping comment, we enable all tracing function, handle events in our program, and if user want to resume program execution, we just remove this tracing function and continue program execution until the next break point. Okay. Now our frame relation debugger is ready. We are looking forward to try it to remember this slow example with function calculate. And as you remember, when we ran it with tracing debugger, it became almost 25 times slower. So what about running with frame evaluation? Yes. With frame evaluation, it runs almost as fast as without debugger. It happens because we are not calling tracing function on every line of the program, and we just call this our break point function once and continue our program execution like without debugger. So that's why it works so fast. But let's consider another example. What if we add some additional function inside loop? For example, function foo that doesn't do anything, but that means that on every step inside our loop, we call this function. After that, said news that our frame relation debugger becomes slower, much, much slower, because we return to the previous situation. On every, when we enter every new frame, we call our frame evaluation function. And we do some checks inside it. So we again return to the situation when our program becomes slow. Maybe PEP 523 can help us again. Yes, it can. As you remember, it consists of two parts. We have already used the first part. We defined our custom frame evaluation function. But also, there is a new field which appeared in code object, this code extra, scratch space for code object. And we can store there some information. And for our frame evaluation debugger, we can use it quite easy. We can mark frames without break points. So when we enter frame, and we know that there are no break points there, we add a special flag to this frame. And we know that we don't want to do any additional checks here. We can just return quickly, default frame evaluation function. And it will work quickly. So we know all the functions without break points. And we can skip them very quickly. And in this case, yes, we were right. After that, debugger becomes faster. In all cases, we don't depend on such situations like in example two. And I want to emphasize it again that how PEP 523 helped us. The first part helped us to define our custom frame evaluation function. And the second part helped us to mark frames without break points and quickly skip them during debugging. Okay. Our frame evaluation debugger ready. And it works in different cases. But what about real results? Real life results exist. Frame evaluation debugger was implemented in PyCharm 2017.1. There is also PyCharm community edition, which is free and open source. So everybody can download it and try how it works. And it works in production. It isn't just an example. It's a real debugger in the integrity development environment. And for one of our benchmarks, for example, if it took about 20 seconds to run program under debugger, we, before frame evaluation API, we added some site and speed ups. We write some bottlenecks of our programs in site and it gave us, it increases debuggers execution. It takes almost six seconds to run program with this debugger. But frame evaluation improved debugger speed significantly. And it became almost 80 times faster. And the only thing that I can say after that is that frame evaluation rocks because it gave opportunity to dramatically improve debugger's performance. And of course, it has some disadvantages and limitations because such debugger is a bit more complicated. You need to implement byte code modification. You need to write C extension in order to use this API. At the moment, it works only with C Python because this API was implemented only in C Python and is available only in Python 3.6. So I can say that frame evaluation debugger may be yet another reason to move to Python 3.6. Because most likely such a debugger or some other tools will appear in many IDEs or developers tools. And also, you might be inspired by my talk and you might find yet another use cases for custom frame evaluation function. Because, as I've already said, we used this frame evaluation function in order to insert break points code into the original code. But in fact, we can insert everything. For example, we can insert some functions for logging. Just imagine you can enable logging for all your program without changing source code. So there is no need to write any log statements during your program. You can just define this frame evaluation function and your logging will be enabled in your program. And you can disable frame evaluation function and logging will be off. So I believe that such use cases exist because originally this PEP was created by authors of Microsoft's Pigeon project. Pigeon, it is a just-in-type compiler in C Python. And they use a custom frame evaluation function in order to generate jitter code and use coextra field to store this jitter code. So this PEP wasn't implemented for debuggers. Originally it was implemented for jit. But successfully we used it in debugger. That's why I believe that other use cases exist and we need just search for them and try to implement them. Let's move to Python 3.6. It's a really cool release. And let's find these use cases. If you want to try to watch the whole codebase of the today's example, you can check out my project on GitHub. Also, as I've already said, it's included into Pigeon community edition. So you can watch the source code of Pigeon 2. And also I moved a byte code modification to a separate library on PyPI. So you can just install it and use it. Everything is ready. There is a small example. There is library for byte code modification. So everything is ready for your experiments. And I hope that all of you will just right after talk, you will try to do something interesting with new frame evaluation API. Now I'm ready to answer your questions. Feel free to follow me on Twitter and ask anything you want about it. Thank you. We have time for small questions, please. I really appreciate the presentation today and you stepping in the frame. I didn't know about that. I'm wondering, though, PyTest used to rewrite the byte code at startup to add this kind of modification to the byte code. What is the reason to do that when we enter in the frame instead of doing it at the startup or whenever user interacts with the bugger and sets a new breakpoint to remove an older one? We can't do it in startup because when we start a program, we don't have access to the frame object. The main advantage of using this frame evaluation function is that when it is called, the frame object is one of its parameters. And you can have access to the code object, to the frame object, to global local variables, and you can change it. But in a startup, you can't get all the functions, for example, from your program. Of course, you can change source code, but it doesn't look like a good idea for creating the bugger. Hi, thanks for the talk. Can you use this to monkey patch C extensions? Could you repeat this? Can you use this to monkey patch C extensions? Which is normally not so easy, but Python code, you can quite easily monkey patch, but maybe with this thing you can go deeper. Would it be an opportunity for, I don't know, testing or mocking frameworks to do this at the C extension level? Maybe it is possible. In fact, I didn't try it, but you can try and tell us about your results. I'm not sure. I don't know the answer, so unfortunately. Last question. So you were inviting people to create new use cases, but what will happen if everybody writes to go under extra? There is, in frame evolution API, I didn't have enough time to mention it, but there is mechanism to multiple usage of this frame evolution API when you use coextra. In fact, you use it by index. You can see the number of usages in order to not to intersect with other systems who use this coextra. So they are stored separately. Thank you very much. Give her a hand.