 Hello, everyone. My name is Elizaveta Shashkova, and today I want to tell you about Visual Debugger for Jupyter Notebooks. First, let me introduce myself. I'm a software developer at JetBrains. I'm working on the PyCharm IDE, and currently I'm focused on debugger and data science tools. We always write code with bugs, but productive developer is not developer who writes code without bugs, but developer who can quickly find and fix them. And Visual Debugger is a tool which can help you to do it really efficiently. Visual Debuggers for Python files exist in almost every IDE nowadays, but they usually can't work with Jupyter Notebooks. Because Jupyter Notebook doesn't contain only Python source code, it's a sequence of cells with different types of content, including Python source code. And exactly like code in Python files, code in Jupyter Notebooks may contain bugs. The most popular ways to find bugs in Jupyter Notebooks nowadays are either print statements or common line debugger IPDB. To be honest, but these ways are not very useful. Print statements requires code modification inside your cell and rerunning your cell to get additional information. And IPDB debugger is based on built-in PDB debugger, produces a lot of output during the bug session, and requires remembering all these comments to evaluate some variable or to put breakpoint. Also, there are some visual wrappers for IPDB, like Pixie Debugger, for example. But they all have the same limitation, like IPDB. For example, you can't add breakpoint during program execution. You need to wait for a program to suspend and to ask you next comment. The good news is you can see the whole Jupyter ecosystem lacks very important tool, Visual Debugger. And the good news is that recently, Visual Debugger for Jupyter Notebooks was implemented in the PyCharm Professional by me. And today, I'll try to explain how it was done. So the answer for the question from the title is, of course, reality, because otherwise my talk wouldn't exist. As I've already said, usual Python files and Jupyter Notebooks have at least one thing in common. Both of them contain Python source code. Debuggers for Python already exist, so let's learn how they work and which part we can reuse to build our Jupyter debugger. Most Python debuggers are based on built-in tracing function, which allows you to monitor program execution. So you can define your custom tracing function, pass it to setTrace function in this module, and it will report you all the events happening in your program. As you can see, tracing function takes three arguments, frame, event, and arc. Frame is an object which contains information of the current place in the program. Event is event, which happened in this place, and arc is an argument of this event. We defined simple tracing function which brings line number and event which happened on this line number. And let's check how it works. On a simple example, simple function greet neighbors, which sends greetings to our neighbors. On the first line, when we call our function, the event call arrives on the first line because Python called this function greet neighbors. Then, Python executes the second line, so the event line arrives on the second line. Then interpreter executes lines three and four, and we're receiving the following events, and output highMars appears. After that, during the second loop iteration, Python executes lines three and four again, and an event highVinus appears in output. And after that, we are returning from function, so we're executing Python executes line five, and event line and events return appears on the line five. OK. How can we use this tracing function to implement brick points in our debugger? On each program event, tracing function receives a frame object, which contains not only line numbers like we've seen in our example, but also a filename of the current code, which is stored in code object. Brickpoint also has a filename and a line, where user put it. So on each call, we can compare Brickpoint's filename with frame's filename, Brickpoint's line number with frame's line number, and if these values are equal, we can suspend our program in this place. Cool. So this is how Python debuggers work and how we can use tracing function to implement brickpoints. But execution of Python code in Jupyter notebook files differs from usual Python files. In the next part, let's learn how Jupyter executes Python code and what we should change in existing Python debugger to implement Jupyter breakpoints, brickpoints in Jupyter files. You browse your Jupyter notebook in a frontend. For example, to support Jupyter notebooks in Python, we implemented our custom frontend, which works similarly to the default one. So when you run the first cell in your notebook, it starts an IPathen kernel and establishes connection to it. IPathen kernel is, in fact, a Python process which works similarly to the idle, so it's running in a loop and waits for the next comment to be executed. So when you execute your cell, frontend sends its source code to IPathen kernel. IPathen kernel compiles it to a code object, executes it, and sends result back to Jupyter notebook. The most interesting part for us here is how kernel executes this code. For every cell execution, kernel generates a unique name for every cell and passes this name as a file name for generated code object. Usually, kernel hides this information from users, but it stores all this generated code object in its internals. That's why when you define some function in the first cell, executed it, after that you can call this function in another cell. Because IPathen kernel saved it for you. To implement breakpoints in usual pattern files, we can use pair file name, line number, to define a place in a source code. Because this pair uniquely identifies a location or breakpoint or some source code position. But in Jupyter notebooks, it doesn't work, because each cell is a separate code snippet with its own line numbers inside, and all cells are located in the same file. So we can't reuse the same pair for Jupyter breakpoints. But we already know that IPathen kernel generates all the necessary information during cell execution. Executed cell has a generated file name and its internal line numbers. So we can use this pair to define a unique location of our code. Great. But the problem is that this generated information is available only in IPathen kernel and not in our IDE. And when debugger sends some message to the IDE about, for example, debugger suspension, I'm suspended in some place, this message contains generated file name. But IDE doesn't know which cell is suspended because can't find its source code. It was generated on IPathen kernel site. In the IDE, we can introduce some cell identifiers, for example, to find their locations in the editor. But we still need to find a source mapping between these two objects, cell identifier in the editor and generated file name. I spent a lot of time trying to understand how to implement it. And solution appeared to be quite simple. There are two things which helped during implementation. Firstly, in the IDE, as I've already said, we have a custom Jupyter front-end. That means that we can control all cells execution inside our Jupyter notebook. That means that, for example, we can track all the cells which were executed during the session or send some additional comments. The second thing which helped to implement it was a silent cell execution supported in Jupyter. That means that when you can execute some code in a silent mode, so it will be executed in the context of the IPathen kernel, but it won't be added to kernel history, and it won't increase execution counter. How did we use it to support source mapping? Now, before sending a real cell code, we can send several util comments in a silent mode. For example, we can send a comment which touches function for name generation and saves currently generated name to debugger instance. Also, we can silently send information about currently executed cell ID and save this value to debugger instance as well. That means that when cell is started to execute, we already know all the necessary information about mapping because it's saved in our debugger instance. And inside our Jupyter tracing function, we can do the following. When execution is suspended inside a code with some generated name, we can map this generated name to the cell identifier, which is stored in our debugger instance, and then send a message to the ID side. And now this message doesn't contain a generated name, but it contains a cell identifier in the editor. And after that, the ID can quickly find the cell, its source code in the editor, and highlight a suspended line. That's how Jupyter breakpoints were implemented. Now we know how IPython kernel works and how we can define source mapping for Jupyter breakpoints. But we still have two separate entities, the ID and the IPython kernel. And they should be able to communicate somehow. As I've already said, we have an ID instance with its custom Jupyter frontend implemented there, and IPython kernel, which executes our comments. But for debugger, it isn't enough to send just a source code for execution or some comments in silent mode. We also need to send a lot of util information, something like user edit breakpoint in cell number three, line number two. Or when debugger is suspended, it should be able to send a message like, I'm suspended in some place in the cell. That means that debugger needs some additional communication channel with IPython kernel. When I started to think about it, I realized that there are two possible solutions for this problem. The first one is to establish an additional connection to IPython kernel. And the second one is to reuse existing Jupyter channels. The first one is the simplest one, and it's the first thing that came to my mind. But it has some limitations. And the reason of these limitations is in Jupyter architecture. This is more detailed scheme of Jupyter communication model. This is a frontend. When frontend contents to IPython kernel, it doesn't connect directly. It connects via kernel proxy, which connects to IPython kernel via web sockets. And if we want to avoid Jupyter messaging architecture, we can establish only one direct socket connection to IPython kernel. And of course, it isn't always possible. For example, if your kernel is located far, far away in some cloud, you can't connect to it without proxy. So the solution with additional connection is currently implemented in Bytron Professional. But it works only if you can establish a direct connection to IPython kernel. But Jupyter already have a rich messaging architecture. Maybe we can try to reuse it. Yes, IPython kernel has five sockets. Here I show in three the most important of them, which sends cells for execution, sends output back to frontend, and requests user input. It would be possible to reuse some of them in our debugger. But there is another serious limitation in Jupyter. IPython kernel runs a tornado event loop in the main thread, which processes execution events. Also there is a second event loop in a separate thread, which processes output comments. Each of these event loops is a single threaded. That means that if some comment started to execute, event loop is busy. And the following comments will be executed only when execution is finished. And the following messages with some debug information, which will send the other same channel, will be blocked. But the problem is that debug information should be sent exactly during cell execution. It's useless when execution is already finished. That means that in the current Jupyter architecture, it's impossible to reuse existing channels for sending debug information. But wait, everybody knows that IPDB works for both local and remote cases, and it doesn't require any connection. How does it do it? If you remember a workflow with IPDB debugger, you understand how it works. To call IPDB debugger inside your Jupyter notebook, you need to add a call to set trace function inside your cell. And after that, debugger starts, suspends, and asks you for some comment. You type some comment, debugger receives it, does some actions, and asks you for the next comment. You type it again, debugger receives it, and so on, and so on. So an IPDB debug session is in fact a sequence of request-reply comments, which kernel sends to frontend and back. And it works so because it's based on built-in input function. And it can reuse an existing input channel because it's input function for sending debug comments. It works, but it has some limitations. For example, if you started to execute some long-running cell at a debugger and realized that you forgot to boot breakpoints in some important place, you have no chance to do it with IPDB. You need to wait for a program to suspend and ask you for the next comment, and only after that you can add your breakpoint or execute some stepping comment. It's OK for a common-line debugger, but we can't reuse the same technique in our visual debugger. Because in our visual debugger, we want to have an ability to put breakpoint even when the debugger is running and to make programs suspend in this place where we edit this breakpoint. That's why in the current implementation in Python Professional, I decided to establish an additional connection and send all debug utility comments separately from Jupyter channels. Well, in this part, we've learned how Jupyter Debugger sends its utility comments and why it was implemented this way. Also, we've learned how IPDB works inside. So it looks like now our visual debugger is ready. Let me remember you how we built it today. Firstly, we defined a tracing function which can work with code generated by IPython kernel. Secondly, we created a mapping between editor and generated code for cells. We used silent cell execution to implement it and features of our custom frontend. And after that, we established a debugger connection for sending comments from the ID side to IPython kernel and back. So today, we've learned how Jupyter Visual Debugger is implemented. And that means that it's time for the most entertaining part for you and the most horrifying part for me, a live demo in PyJAR. OK, can you see anything? Yeah, you can see everything. Here it is. Yeah, can you see everything? Well, this is Jupyter Notebook in PyJAR. You can see cells are located on the left side and Jupyter Notebook preview is located on the right side. You can work with cells as if they are located in one Python file. And it's important thing to notice that we don't convert Jupyter Notebook to a Python file. It is still the real Jupyter Notebook with IPython B extension, which is located in your project on the disk. We just show our custom presentation. So you can work with it like it is in one big Python file. And you can use both features of Python editor, Python editor in PyJAR, and features of Jupyter Notebooks. For example, when you type code, you can use the same code completion, which works. Or for example, here we have some function. You can quickly navigate to any variable declaration. And even if this declaration was in other cell, PyJAR will navigate you to the correct place. But also, you can use features of PyJAR. So for example, you can run cell. And you can see it was executed. And output appeared here in the Notebook. And it's stored in Jupyter Notebook. So it works exactly like your default front end for Jupyter Notebooks. Also, there are many other actions. For example, in the PyJAR 2019.2, it will be available to run all cells in your Notebook or start kernel or clean outputs and do a lot of other things. Well, but we came here to check that our visual debugger works. Let's put breakpoint. We put it here on the second line and run debug cell. Debugger is suspended. As you remember, we defined a tracing function. Established source mapping between editor and kernel. And then sent comment to our editor. And we found a place where it should be suspended. You can see variable values. Here you can expand them, check their values. And for example, resume programming. Great. Simple breakpoints work. Let's look the next cell. This is Grid Neighbors function, which we've already seen today when we discussed tracing functions. Let's put breakpoint here and debug this cell as well. During the talk, I had time to discuss only breakpoints. But the very important part of every debugger is stepping comments. And they were implemented in PyJAR. Let's try, check that it really works. I can press here, step into. And debugger steps into function declaration in this cell. Here we can also execute stepping comments. You can see its values. It changes. And after that, we can go step again and continue our stepping comments in the cell where it was executed. Great. So stepping in the current cell works quite fine. Let's consider a more complicated code sample. There is a lot of code, but it's quite simple. We have a list of planets. And we iterate over these planets, print name for each planet, search for its neighbors, the left one and the right one if they exist, and call the same Grid Neighbors function after that and sleep for two seconds because we like sleeping. Well, let's execute our cell under debugger. OK, execution is started. You can see output appears. But I forgot to put breakpoint here. Let's add breakpoint. And yeah, we added breakpoint. And debugger suspended exactly inside our cell. This is the thing that isn't possible currently in IPDB. You can't add breakpoint. But in PyJAR, with our visual debugger, we can do it. We also can iterate here. For example, we can check where we stopped. We can select these planets E, evaluate this expression. And we're stopped in Jupiter, planet, great. Also, we can execute step into again, where we call in Grid Neighbors function. And we navigate it to the correct place where function was defined. Even it was defined in other cell. OK, we can resume our execution, remove breakpoint. Great. We are checked neighbors for Jupiter. But I would like to learn what are neighbors for Uranus. And I don't want to do a lot of stepping, stepping commons and press resume many, many times. For that, I can put breakpoint. And then I can set condition for my breakpoint. So I can select this one, suspend my program, if only the name of the planet is Uranus. OK. And start our debugging session again. Let's just hide this. Let's start our debug session. Let's check output is starting to appear. But we are waiting for our condition to work. OK. OK, it's OK. We are suspended. We are suspended in the correct place. The current planet name is Uranus. Yeah. That means that condition for our breakpoint is really worked. And we can check the name of neighbors for Uranus. Neighbors are Saturn and Neptune. That's correct. Great. We can add breakpoints even during debug session. If you do a lot of some data science work and you might do a lot when you work with Jupiter notebooks, you might work a lot with NumPy arrays or Pandas data series. And sometimes it's quite difficult to check values in this data science arrays. You can press here in your variable here. And it will be opened in a beautiful window as a table. And you can work in packets values, type some slices, and do anything you want with your code. OK, so we've checked that visual debugger for Jupiter notebooks really works. I didn't cheat you during my talk. That's excellent news. And then let's go back. OK, during my talk today, we learned how to build a visual debugger for Jupiter notebooks. And that means that now, after my talk, you can implement visual debugger for Jupiter notebooks in your favorite IDE, if it's for some reasons not by charm. And if it is by charm, I've already implemented it for you. So you can try it right now. Thank you very much for coming to my talk. Now I'm ready to answer to your questions. Thanks for the talk. I've got one question. When you register a set trace function, the program is running slower. Is the program running slower when you have the set trace function? I didn't get it. Your debugger is based on the set trace function, which does a lot of things. When it is activated, the program runs slower. It's activated when you passed your function to a set trace. It's activated in the next frame, which was called. So as you can see in this example, where was it? Here it is. So you can see we set this for the, it will be applied to the next frame, which will be executed. So here we're calling the next function grid neighbors. So we're entering the next frame. And it is activated in this function. And it should return, tracing function should return, it's, where is it? OK, here it is. It should return tracing function in the current frame, or it could return non, and the tracing will be stopped. And is it possible to unregister the trace function? So the revert effect of registering? Yeah, yeah, you should call csettrace known. It will be unregistered. So when you push on the play button, that's what it does? Something similar, yeah, yeah. If you don't have breakpoints, maybe. OK, thanks. Thank you. I have another question. For you to connect to the kernel, to add a new connection, did you had to modify the kernel to build a custom kernel, or is it done in a csettrace function? No, no, it's only csettrace function. We're connecting, we're silently executing this command which connects to our debugger. Yeah, and we are storing the debugger instance in science, some internal. So it doesn't modify kernel, we're just setting this tracing function. And do you have an idea of the performance impact of this csettrace function? Yeah, it's as usual for debuggers. Of course, it makes your program execution a bit slower, and sometimes much slower if you have a lot of computations. Yeah, but I think it's anyway, it's faster than to add print statements and rerun your cell many, many times. So on average, it's slower, but usually you even don't notice it. Thank you. Some more questions? Thank you very much for coming. You can always find me at the PyCharm booth during the whole conference, so feel free to come and ask me any questions about my talk, about PyCharm, or about anything you want. Thank you.