 And the next talk is, we welcome Ansem Kruse with the topic of post-mortem debugging with hipdams. Welcome. Hello, everybody. So it's probably a well-known problem that every serious program is bugging. And there are various methods to handle it. My talk is about post-mortem debugging for Python. Just very short, I'm working for science and computing, and I'm a senior software architect and in my spare time, I'm doing some Python access. So some program failures are very infrequent, and they are sometimes also hard to reproduce and think of a large compute cluster where once a day a job dies and you have no access to these jobs. And in this case, it's usually a very common approach and a very old approach to use some kind of post-mortem analysis to find the cause of a failure. The classical approach is to create a core dump. And the core dump is a file, and you can later load it in the debugger and analyze it. And while, unfortunately, Python has no use of the core dumps yet. So I thought it could be a chance for a little project, and when I started, I didn't know whether it would work out, but well, actually, it works to some kind. So there's a lot of previous work, and core dumps date back into the origins of computers. So the oldest reference I found to the classical core dumps is in the program's manual for the share operating system from 1959, that's the second operating system ever created. So it's really old. And today almost every operating system has a feature to create a dump of the memory of a program that caused some fault conditions. People have used this operating level dumps to analyze interpreted languages and interpreted programs running within a native code interpreter. And so in the internet, you can find various reports of people trying this for Python, and so I had mixed results, a few projects, and I think I got most of them. You can find it in the reference section. So it's complicated and highly dependent on the implementation and the compiler options and the compiler version and operating system, and so it's not really practical. Then it's, of course, possible to move this idea, to move the feature to create a dump from the operating system into the interpreter. And there are some reports about dump features for interpreted languages, and the most prominent example is probably Java. So IBM implementation of Java directly supports some kind of Java heat dumps, and you can later debug the Java program. Well, for Python, there's also some ongoing work in 2012, Elifina released a pie-dump module. Its idea is to catch an exception, pick a traceback, and then use the BDB post-mortem function to analyze the unpicked traceback. And this, in theory, this works well, but in practice, most serious traceback contains some unpickable objects, and, well, it fails with a pickling error. And now we come to pie-dump, that's the module I created. The name pie-dump was already used, so I had to choose a different one, and I used pie-dump, it's still experimental work, it's currently 2.7 only, because as picked a library I depend on as 2.7, but it's possible to port it, and there's also an experimental port of this aspect of the library, so if it may just, we will get it for the presence of it. The building blocks and the basic idea is similar as pie-dump, some exception-handling code, some sequeration of the dump, and glue code to insert the dump into the debugger. And indeed I used a few lines of code from the pie-dump module, because when I found it, I thought, well, perhaps I can improve it, it turned out to be, there are just few lines left. So it's time for a little demonstration. Think of the following situation, you installed a little Python game for a partner or your kids or a customer, and then the fee or he complains about some questions occurring every now and then, and you have to catch the bug now. And please note, I used the game block fortress and I introduced the bug, so the upstream version is perfectly okay. And here we are. So first we have to instrument our Python installation to create the dump, so it's as simple as a pip installed by heapdump, it's already installed, I didn't want to depend on the network, yes. And then I created a little pth script, here is it, and the Python installation, it just, you know about what this kind of files do, how they work, if Python finds a during startup and files extension pth in its site packages directory, it's well, let's say it executes this file in some sense, yes. So it imports a pipdump package and then registers a dump on unhandled exception handler, it's registered with this exception hook. So let's play the game, should be forgot, yes. Okay, so I have tried a little bit to find a solution where the game reliable crushes, and here is the crash. And we got a message that we have here and a heapdump file. That's fine, so we can now load this into the debugger, it's fairly simple. There's a, we're simply called Python minus M per heapdump, and there's a nice help option, so you can have some arguments. The idea is simply to tell the Python what file to debug, and I want to use the PyDev debugger that is included, that's a debugger for the Eclipse PyDev module, and it has a nice remote debugging feature, and therefore it's very well suited for this application, and I have to tell Python whether the debugger module is actually located, this is a long line here, yes. So let's debug it. So then I have to go to the debugger, yes, and you see here we have a message, the debugger shows me the exception that was called, and while attribute error, ball object has no attribute at bonus, obviously that's the reason for the crash, but we could still ask why did it happen. So we have, can you read it, I think, yes, it should be possible. So you see here there's some more or less complicated conditions when this at bonus method is called, and so we can look into the ribos and we see here, for instance, self, so here are all the values you need, yes. And we can find, okay, the combo length is 14, that's true, and some other values, yes, and this complicated condition as it goes while the crash is occurring on the rally, fine, actually I introduced this code here. You see we can also look at other frames, and we get all the variables, and we can, for instance, here, look into, oh, that's an interesting thing. We can look in the objects, and here we have an interesting object because the game object was actually not really unpick level. For some reasons it depends probably on some resources or something like that, so we get a zero gate object instance, and so it doesn't, in the debugging, cause the zero gate objects inserted by the unpickler, by the fault tolerant unpickler of the bar heat dump module has all the attributes the original object would have, so we are still able to analyze the problem. Okay, so far the demonstration, and back to the presentation. So the application of the bar heat dump module is very simple, yeah. You have to set up some exception handler, you have various ways to do it. Usually the most common and comfortable way is the function dump on unhandled exceptions, and it can register an, this except hooker, as in the system module, as in hook, that's called if an unhandled exception occurs in the main thread, yes. Or it can work as a decorator for a function, so if this function raises an exception, it will be called as, and there are also some low level functions available. It's documented in the manual of bar heat dump. Then you have to instruct your customer or the operator to send you any heat dump files, yes. And then you have to wait, and if you have good luck, you have to wait forever because your program is not buggy. And finally you analyze it using a common debugger. So how does all this work? Yes, we have, that's a complicated question, so let's divide it into some simpler questions. And the first question is what is the content, the information in a heat dump file? Let's have a quick, oh, I have to finish debugging here. So simply finish, let's, so actually the file is a kind of MIME message format, yes. And the idea was to make a file that a human could read to some degree, and so it contains some headers with information about the Python process that created the file. And then there's a large binary part, and it contains the real information. So what is in this binary blob? It's a content of a, the content is a compressed pickle of a dictionary, and the dictionary contains trace back, so trace back of the exception stack frames of selected or all Python threads, or in case of stackless Python of all tasklets, and then the transitive closure of the objects retrieval from the frames or tasklets. And optionally, you can also include the sources of the code objects from all these frames, and some other interesting objects like the process IDs, the fast modules, because if you create a dump, for instance, on a Linux system and analyze it on a Mac system, or on a Windows system, you sometimes need to know how to interpret the passes and the filenames of the source code files and the code objects, and so you need to pass them and thread IDs and things like that. How does ParheapDump create this content? Well, the basic idea is, well, simply create a dictionary with a content and pickle it. And there's a challenge. You can't pickle almost all classes, yes, the kind of data that is pickleable in Python is fairly limited, yes, those typical data objects, and objects designed to be pickled, yes, but surely not everything. And the second challenge is multisreading, because think of several threads running at the same time and changing the state of your program, and if you have an exception, it could depend on the states, not only of the threads that actually had the exception, but also on other threads. So how can we pickle arbitrary objects? Yes, you probably all know a little bit about pickling, and pickling is, on one hand, it's a data format, yes, you can read the exact description of all these little opcodes and details in the source of the C-Python library. And there's also standard implementation, yes, that's a pickle or the C-Python module. The basic idea of the standard implementation is to serialize data in a portable way, portable between different Python versions, and it's really fast. Well, there's a second implementation of the pickle, of the Pickler, the S-Python module, science and computing created and released as an open source in 2010, and the idea of this module is to serialize well-behaved objects, but not between Python versions, between different Python versions. It's fairly slow, because it's written entirely in Python. And PyHebDump builds on this S-Pickle library, and add some additional features, the important feature is fold tolerant, pickling and unpickling. The basic idea is we are not required to serialize and restore the data in an exact way, yes, because we are not interested in continuing the program. We just want to look at its state, so it's enough to preserve the state in a way that is useful for analysis, but that's all we need. So we can, we have another additional freedom to handle problematic objects. So the second challenge is writing, yes, that's no perfect solution possible. It would be, in an ideal world, we would have a method to stop all other forms, write our Pickle, or create the Pickle at least, yes, and then let everything go on, yes. But we can make some best effort solution, and it's indeed possible to block other sorts as long as you don't release the GIL, yes, so it's a set check interval function, and if you set it to a very long interval, or instead of spying, you could use an atomic context manager, then you effectively block other threads, and then you can block the threads, make a copy of all the local variables of the frames you're interested in, so, and then you can actually Pickle these copied frame objects. Pickling it by itself could release the GIL, because if you Pickle a class that has a custom reduce or custom get state method or something like that, this method could actually make a call to an external C function, and it could release the GIL, and then you get all the methods. So, short note, final note about the debugger support, PDB and PyDEVD already support post mortem debugging, PDB has a nice API method in post mortem, and in PyDEVD you need to hack a little bit around in the internals. PyDEVD supports the inspection of additional stack frames, which do not belong to a thread so-called custom frames, and that's a really useful feature here, because if you have multiple threads, then we can use these custom frames to make, it's very simple to access the other threads besides the threads that cost the exception. Yes, and PyDEVD should probably add some API for advanced debugger features like post mortem debugging, or adding custom frames into the debugger console. It has all these features, but they're not accessible. Future goals, well, actually, it's a moment, it is useful, we have already, we are using PAHIP dump in one of our products, but we have to know we got just very few dumps that were caused by real bugs, that's probably due to the testing and quality assurance we have for this product, just very few bugs. And there are open questions, the memory usage, how reliable is this concept? And very, very important security, yes, this dump files contain enormous amount of information, so probably you have to handle them carefully. And as always, if you pick or especially unpickle something, yes, you're running code, yes, pickle file is a program, so you have to be sure that you can trust the source of this code, yes. And probably I'll ask Fabio to provide some better APIs for PyDEVD, and in the long term I plan to support Python 3, probably not every version, but 3.3 or 3.4 and ongoing. So, then thanks to my boss, he approved the publication of this code, and my colleague Tanya for testing, and my wife is there for a lot of patients, and many hours in the evening when I did some of this. And here are the references, probably the slides will be available somewhere. So, many thanks for your kind attention, and I think we have two minutes for questions. So, if there are questions, please go, take this microphone, it's okay. Thanks for the great effort you've done, and the question is, that's probably only my impression that it kind of doubles the efforts done for a sentry project, because actually what you could do is to extend a bit their reporting of the exception to include also other threats if it's needed, actually. Because most of the time what you need is just a traceback, right, with local variables and some source code to understand the problem, but core dump is a bit of overhead most of the time. Basically for most of the projects you just need to report into the sentry server of an exception, and you can cover like most of bugs. Do you agree with that or? Yes, and by heap dump it's also possible, it didn't show it to limit the information that included just a local thread, and it really depends on your application what you need, yes. And there are also other solutions like look at Django or something like that, where you get a very nice... The thing is that the sentry you see is any exception, it doesn't matter, it can be any application, that's it, so just need some information or it can be like local server, it's good as the exception. Yes, that's surely possible, but it depends on the infrastructure and the situation you have, yes. Okay, we're running out of time. You have a group photo afterwards, please join us outside. Thank you very much, Ansem, for the presentation.