 So our next speaker is Dmitry Trovimov and his talk is Python Debugger Uncovered. Dmitry is a developer of the PyCharm IDE. Dmitry. Hi, my name is Dmitry. Today I will speak about Python Debugger. First I'll introduce myself and this talk. I work for JetBrains. In the last four years I developed PyCharm and Debugger is among the other things that I do. And often when I say people that I develop a Python Debugger, they say, wow, that's kind of hard or complex. But I'd like to show you in this talk this Python Debugger is quite an easy task. There is no rocket science here. But in the first place, why do we need Debuggers actually? As Brian Kernigan wrote in his book, debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are by definition not smart enough to debug it. And what we do normally, we always or often, but I think rather always write our code as cleverly as possible. And as a result, we have bugs that spoil our code. And then we need to find those bugs and we have problems. The only thing that can actually save us is good debug tool. And we'll look how to implement one. So there are a lot of Python Debuggers, but I roughly divide them into big groups. The first one, Python Debuggers that are implemented in Python itself. And those are PDB, PyCharm Debugger and PyDev Debugger. Actually, PyCharm Debugger is a fork of PyDev Debugger. It was forked four years ago and it gained a lot of new features. And now we're in a very strange situation when we develop this PyCharm Debugger separately and Fabio, the maintainer of PyDev develops it separately and we exchange the fixes and backport features. But now we are in some kind of process to stop the situation, but I'll tell about it in the end of my presentation. So, Python Debuggers implemented in Python. The most advantage of such debuggers is that they are platform independent. They can run on CPyton, Jyton, PyPy, on IronPyton. That is because they are written in pure Python. But the problem with this debugger is that they can be broken by user code. Because they run the same Python interpreter, you can just write something like clear CIS modules and Debugger will evaporate from the memory and it won't work anymore. The second group of debuggers are those that are implemented in C. The main debuggers are VIN, PDB and VINC. They work only for CPyton, but they don't interfere with user code, so they work better for such cases like debug of G event or twisted code. But that is actually not a problem for Python debuggers written in Python because all those cases can be solved, but you need to make something for that. So, how to implement Python Debugger? Actually, many languages provide developers of debuggers some kind of API to develop Debugger. And Python is a bit different in this case. It provides only one function to develop Debugger. This function is called CIS set trace. If we look into the Python documentation, we see that you can set trace function and it will be called every time you get any event in the Python interpreter. And an event can be a call of function, execution of a line, or return from a function, or exception, or if you use some kind of C bindings, then it can be a call of the C function and return from the C function or C exception. So, when we want to implement our debugger and we see this documentation and we realize that this is the only one function that we can use, we can get a bit scary because it's very primitive and we feel very constrained. But as we know, constrained breeze creativity. So, as I show you, because of that fact that we are constrained a lot, we can implement a lot more features than actually exist in normal languages like Java and C++ and so on. So, trace function. How can we use it? Here we implement a simple trace function. That is a function with three arguments. The first one is frame. That actually is the top of our call stack. This is the context which has local variables, the event that we get, and some arcs. Arcs are dependent on event. And we print just which event we get and on which line we get. The line that we can access from the frame. So, we import our Cis module, set trace, and then we have some simple code we iterate for within a range and print the division of some arithmetic expression. Let's look what we have. So, after we have set our trace function on the first line, that will be line number nine. Actually, it will be line number 13, but we have call of the function that is declared on number nine. We have a call. And then we have execution of a line number 10. Then we have execution of a line number 11. And after that, we got the first output. The iteration continues. We are again on the line number 10, on the line number 11, and we get gain output. And on the third iteration, we have an exception. So, execution terminates. And you see that we have in this short program, we have all four Python events that are possible. Call, line, exception, and return. So, actually, that shows us that developing debugger with the help of trace function is possible. So, let's develop simple console debugger. To test it, we use a sample program that is simply a retothense sieve. That's a function that gets n as a parameter and prints all the primes between two and n. And here are our command line debugger. It's actually only 23 lines. And it works. I'll just tell you about how it is implemented and then I'll show it in work. It has two main parts. The first part is trace function. As we've seen before, we get events. And here, except just printing all the events, we have a break point. There's only one break point. And if the current execution equals to this break point line, we do the simple thing. We just read the input from console. And if we handle commands from console, there are three possible commands. The first command is frame. If we get it, we just print all variables. The second command is c. It's for continue inspired by gdb. And we continue to the next execution of the break point. And any other string is just treated like any expression that is evaluated. And the second part of this simple debugger is the main part. And the second part, we get here from command line arguments the line of the break point file to be debugged. And then we set our trace function as trace function and execute this file. So let's look how it works. So you see we execute our simple debugger. We set our break point at the line number six. And we pass our dot pi as our sample program. So debugger stops somewhere. And we type frame. And we see that we have three variables. That's our index and our multiple set. That is empty now. We continue. And we get two. That's our first prime. We print our frame again. And we see that our index is three. And multiple set is all even numbers. So we can continue, continue, continue. And if we don't want to read all this frame, we can, no, it doesn't work. Print. Multiple. We can print just the things that we are interested in. So voila, we have our simple cancel debugger just implemented in 23 lines of Python code. But this debugger is a bit not full-fledged. It can be used, actually. It's not very convenient. Because we can set only one break point. And normally we run our program and we need to place break points somehow interactively. And actually here you can pass while true as an expression. It will hang the whole process. And it's not very convenient to read our frames in console view. We need some kind of tree for that or some kind of UI. So thinking about that, we came to the idea that we need a visual debugger. And if we need visual debugger, we need some multitasking. And for this, we need some kind of architecture. And this architecture can look something like that. The left you see our debugger interface. And on the right you see the Python process that is being debugged. They communicate between each other within socket connection that allows us to run them on different machines and to allow remote debugging, for example. And our debugger interface sends break points and commands to the Python process and gets back events, threads, frames, evaluated values. So on the Python process to handle this communication, we need two threads for reading, writing. We use threads here because we don't need actually performance. And GIL is not a problem for us, but we need cross-platform work. We need this to work on JITAN and all versions of Python. So we just use threads and it works good. And here we have reader thread, writer thread and user thread. So if we talk about communication, then we need to find a protocol. The protocol for this communication will be quite simple. Every message is just a line and separated by line separation. And all the data inside this line is divided by tab separation. The first one is command ID. The second one, command type. And then we have different arguments that depend on command type. So command types can be set break point, resume, get thread, get frame, evaluate expression. And for example, for a message, get thread, we generate some ID and the response for that it will be, it will have the same ID. And this will make us to know that this is a response for this very request. So this is very simple protocol, but it's very powerful actually. So we can get responses for our request or we can get not. But if we want to get responses, we can get exactly the same responses for this request that we want. So it's very simple and very powerful. So on the side of ID, we assume, we will not go into ID details. We'll focus on the Python code, but we assume that ID creates server socket for us and it launched the script that is being debugged. With the command line, it passes socket address and passes the sample program as our argument to our debugger. So let's look how our code will be. It's quite simple also. The main code looks like that. We initiate our debugger and first of all, we make a socket connection. Socket connection is very simple. We create a socket and connect to it. It is already opened on the side of the ID. The next, we initialize our network communication. It's very simple. We just create write a thread and read a thread and start them. How can be, how can look, sorry, how can look read a thread? It just, in a cycle until it is killed, it reads data from the socket and finds line separation. It thinks that it gets the whole message that can be parsed. Then we parse message just by splitting it by tabs, which read the first element as ID of the command, the second as a type of the command, and we put this in our process queue. The writer is implemented by the same. So what's next? The next is a bit more interesting. We run our program. To do that, first we trace our set, we set our trace function and then we wait for a command from ID to start, because at this point, we need to be sure that all the data from ID has just arrived, that break points are set, and when we get a command from ID to start, we execute our file. And the most interesting part is our trace function. It's actually also very simple. We handle here line events. We take from the frame the line number and the file name and we see if we have break points for this file, if we have break points for this file, and if we do have break points for this line, then we just send a message to our ID that we need to suspend, and we wait in this point in a cycle for resume message from ID. So execution is suspended here. We don't execute commands anymore. We just wait for the message from ID to resume. And if we don't have break points for this file, we don't trace this context because it will optimize a lot of tracing. So I can show you how it looks like. Font is a bit small, but I think it's okay. Here we have our dense sample program. And we debug it. And we just stop. So that is a, it looks strange on this screen, I think. Okay. So actually, that's it. We just implemented visual debugger that communicates with interface. But we lack now very important features. The first one is conditional break points. It's ability to set, close this. It's a demo effect. I don't know. I see this the first time. I can show, wait a second. Okay. So we need to implement conditional break points, exception break points, step over step into smart step into. We need to make it work on Python 2.4 to 3.4. And we can very like to implement multiprocess debugging. And I'll show you now that it's also very, very, very simple. So how do we implement our conditional break points? We just enhance our trace function that we see if we have any condition. Condition is just a Python expression that is related to true or false. We see if we have any condition expression, we evaluate it. And if it is true, if it is false, we don't stop on this break point. So voila, we have conditional break points. Exception break points. To trace exception break points, we need to trace, to handle somehow exception event. And we do that very simple. We get our exception time from arguments. And we see if we have exception break point for this exception type. If we do have, we suspend. If we don't have, we don't suspend. So step into step over smart step into run to line. These functions are very simple. I'd like to show you whether you, maybe you don't know what a smart step into I'll show you. Okay. Now it's okay. So step spy. Okay. So step into is just going inside the function that is executed. Step over is skipping. The execution of the function going to the next line. And smart step into is just the possibility to step into the selected function. And go to line is actually going to the specific line that you can select in your editor. So how is it implemented? It's very simple. It's totally simple. Step into is just resume and stop on the next line. Step over is just step into, but we step on the next line in the same frame. We remember which frame was it when we received step over message. Then the execution goes somewhere inside and when we return to this very stack frame, we stop there. And smart step into is just step into, but we stop on the line of the selected function. And run to line is just temporary break point which we remove after we reached it. So these four features are implemented just in, I don't know, couple of lines each. So what about support of Python .4 and all versions of Python and all interpreters? How? Actually, that's not the best part of the code. Because when you need to support all versions, your code is very fast lines like this. You need to handle all differences in standard library. But that's okay, actually. You can collect it in only one file and it will not spoil the rest. So multiprotest debugger. That is the point that I like the most because this feature shows us how our constraints that we have in Python, only one API function to implement debugger allows us to make something better than in different languages. For example, you cannot debug multiple processes in Java easily. But Python, due to its dynamic nature and due to the fact that we implement all by hand, allows us to do that. If we go to the Python standard library documentation again, we see that all the new processes in the end use OS, the functions of the OS model like that exact where and spawn where and there are a dozen maybe. First of all, the fork function is executed normally and then some of this function, one of this function. So what we can do in Python, we can just monkey patch them. And we do that this way. We take OS model function and replace it with our new exact function. And in our exact function, we call the original function with patched arguments. And the patching of arguments is very simple. If it is Python executed, we leave it and then add our debugger script in front of the real arguments and host and port that we already have. And what happens in practice is that our new process that is about to launch, it first launched inside the debugger, the debugger connects to the IDE and then the debugger code executes this new process. So we have debugging of the new process like debugging of the new thread actually. What we have learned today, we just saw that it's very simple to trace Python code and it's very, very simple to make a simple cancel debugger. And also it's very simple to implement a real visual debugger. But what for? I encourage you actually to contribute. There are a lot of features that can be implemented in this field. And they can be implemented by you or with your help. I don't say actually that we give up, that we are stopping to develop debugger, we actually make a lot of work, but if you help to solve your daily problems, it will be great. And the sources where the best place to look are the first one is the link to debugger in PyCharm, open source repository and the second one is the link to PyDev debugger on GitHub. But there is one moment that I'd like to tell about. Now there is a work in progress, there is a merged version of PyCharm and PyDev debugger. The repository is already created, it's called PyDev.debugger. It has no code yet because the repository was created just the last week and it has some development branches, but stay tuned. In short time we get a merged version of PyDev and PyCharm debugger with all the union of different features that have both of the ID and also documentation will be there so it will be possible to contribute to this project and to learn how it is all implemented. So that's all. If you have any questions you can ask. I will start with a simple question. Is there a console client for your debugger agent or are you aware of anything like that? Not yet, but I think that after we establish this merged debugger with PyDev we will make one. That's great. Now for a harder one. Have you considered data watch points and how hard would those be in a garbage collector language? We considered to implement that, but we have not evaluated the performance problems that it can, which can be there. I think it definitely works to try, but I don't know if I cannot say nothing about real production implementations of that feature. Is it possible for the debugger to modify the flow of the program? Can I skip the execution of single lines or suppress exceptions? Actually it's not possible in Python. It is partly possible, so you can hack the byte code that you get, but it won't work in all cases. As for suppressing, no, I think now that you cannot suppress the exception that is raised and not caught. I have a question. So it's nice when you run your programs in development environment and you can run them with your debugger, but when you run programs in production they are usually not instrumented, but they still fail sometimes and you want to troubleshoot and debug them. So what's the current state of Python debugging for the uninstrumented processes? So that's actually the first one. Stay tuned. I hope it will arrive soon. It's not yet there, but it is the first on the list. Okay. Any more questions from anyone? Okay. So that will be the end of the session then. So thank you to meet you. Thank you.