 Hello, so I'm Christophe Dinshin, I'm working for Red Hat. And today I'm going to talk about the evolution of a software project over a period of practically 20 years. And so I'd like to show you interesting things about how a project evolves when confronted to reality. So what is a flight recorder and why do you want one? So did you ever wish you had more information about a crash, like your rocket crashed and you don't know why? So you try to think, what was I doing when this happened? And maybe someone pressed the wrong key at the wrong time. And as the code says here, it's a competition between computer programmers and idiots. And so far, the idiots are winning. So you wish you could reproduce, but what do you do when you can't? We all have had this kind of Heisen bugs, which only appear when you're not looking at the screen. Or something that also often happens is you are not reproducing it, so someone else needs to. And users, for some reason, think they have something else to do than to reproduce your bugs. So what happens very often is if the bugger has been opened for long enough, close it. Some tools actually have rules without old bugs like this. So you open a long chapter of unsolved mysteries and you make your customers rather sad. So the flight recorder tries to address this by continuously recording data in circular buffers with lock-free updates. I'm going to explain what this means. So a circular buffer, you probably know what this means. And the iteration of this project has multiple buffers for recording events at different speeds. So that makes it easy to preserve old things, like, for instance, boot time parameters of your program while recording fast-moving things that come along. So it's a fairly small project to source files easy to clone as a sub-module. You just add the two files to your project. And it works mostly like printf. So instead of printf, you use record. And you see that there is a recorder name at the beginning. And otherwise, it's a standard printf format with minor variations. So I gave a more detailed talk about this last year. You can look at it if you want to have more details. So the main features today are basically to record information while the program is running in order to be able to dump crash when you did it. Typically, a crash is an atypical event. And then you show what happened just before. So you see here, for instance, that I received Sieg Info. So I'm running BSD or my OS. And I decided to respond to, in my case, Control-T by dumping the contents of what my program was doing. So another feature is real-time tracing and graphing of the program. So you have trace configurations that let you have the text output directly. But you can also export some data for graphing purpose. And there is a small graphing tool. And then it has things like min, max, average, and so on. I'm going to demo all this. And another interesting feature, and I'm going to explain why this came together, is to be able to remote control long-running code. So you can basically tweak internal value in your program while it's running and see what happens. So a use case for this where I'm going to explain this extensively is a talk I'm giving tomorrow called Spice Smart Streaming. And I'm going to show how I use these features. But I'm going to talk in here of why using this against Spice brought me to adding features that don't seem to fit together too well. Or they seem to. So let me start with the demo. So I hope this will do better than what you see here. So I have a recorder scope that is a little graphical to OSNAU and a small test. So the way you configure it is with this environment valuable recorder traces. I'm not going to show you all the details. Suffice it to say that it's recording some random data. And of course, the thing moved on the screen. And so you see it's recording stuff a little bit too fast for me to see. But there is a sleep time. So I can basically adjust the sleep time in my program. And that's mostly what it does. And I can add some random noise to it. And then I can decide, for instance, to see what are the mean and max values. So this gives me an idea of how things evolve. So that's basically what it does. And inside the code, it looks like a printf. I just decide to configure it at runtime. So it's always there. And when I need that, need it, I activate it. So that's basically all there is to it. Let me key the other one because it's spinning and heating CPU. So, well, the demo did not crash. Yeah, success one. OK, so let me give you an overview of how the design evolved over time, which is really what this talk is about. So I'm going to convince you that I followed a design school that I call idiocy, which is initial design is often crummy. So the prehistory of that project is something that Karen Noel over here knows a little because we used to work together on this. And it's something called HP Integrative Virtual Machines. So this is the pre-printer iteration of this flight recorder. So Integrative Virtual Machines was some kind of virtualization technology for itenium systems, Superdome. At the time, it was considered large machines. Then the next iteration was in Excel compilers and tau3D. Do you have any idea what tau3D is? Don't raise your hands over that. So yeah, someone knows. So it's the software I'm using to present here. So it's basically a 3D presentation software. It's open source, and there's no contributor. It's bit rotting. So if you want to help me, you're very welcome. And then there was another sort of pressure point on it, which was a small camera. It's really not that big. It's this. So please smile. This is a live interaction. Let me take a picture like this. With me on it, it would be better. So this camera, basically, I'll explain, was in bad need of a printer that works. So I used a recorder for that. And then I joined Red Hat and the SPICE project. So I'm going to explain what this is about. And well, this brought a number of new features, like separate compilation and having multiple buffers. And then when I started working on the SPICE client, I needed to have some real-time graphing. So that's the feature I just showed you. And I'm going to explain why. And finally, when I had to work on the SPICE server, that's when I really needed to add the remote control feature. And I'm going to explain again why in a minute. So back to this prey history. So when I'm talking prey stories, like 2018, 19 years ago, that was a really long time ago. So this was basically started as a kernel logging facility for a proprietary operating system. So the problem statement is here. If you want to make an Apple Pie from scratch, you must first create the universe. And when you start software from scratch, running on bare hardware, well, basically, you need to create something that lets you log stuff. So it was software written from scratch. It was running straight on the hardware, at least at times. And because it was a virtualization product, it had what we would call a host operating system, except the way we designed it, the host operating system was really living side by side, complete isolation, different address space. And this means we could not access its drivers. We could not access its IoT devices, et cetera. So if you're familiar with KVM or stuff like that, it's very different. KVM can use print K. We could not. And it's run on what at the time was considered large hardware, like 64 CPUs, 1TB of memory, this kind of things. And so we wanted this thing to be fast in a scalable system like this, so OSMP. And of course, we had no libc. We had no printf. We actually had broken compilers and no access to IoT drivers or devices. So we needed to do something that was extremely simple. So the solution was a circular buffer recording events, very simple events. And it was very scalable because we used log-free operations using atomic operations, plus a few tricks to make sure it did scale well. And basically, it was extremely fast because it was just copying four pointers in that buffer, four pieces of data, but 464 it was, basically. The pointer, three arguments, we decided what they were. And it was stored in physical memory at a place that we could share with our host operating system, so the host operating system could simply dump that buffer if needed and be at a time where it did have the drivers and the written. So there was this feature that the host operating system would dump in case of crash and in case of hypervisor panic. So we had this common, for instance, to collect all the data collected that way. By the way, I forgot to say something. If you have questions, please feel free to interrupt. I prefer to be interrupted while I'm talking than have questions at the end. So at the time, we thought this was pretty cool. And actually, there's another red hat that used to work on the project named John Farland. And when I returned to Red Hat, we crossed math again after 10 years or something. And he said, you know, that was the best feature in HPVM. So you work to create an operating system. And some guy in the team tells you, you know, the best feature was printf. OK, that's in the case. At the time, I was young. I was naive. So, you know, that's the famous idiocy here. So the initial design was actually pretty crappy and kept improving on it. So then, Excel is an extensible language that's the basis for Tower 3D. So basically, these slides you're seeing are written in some kind of Lisp latex kind of mix, interactive 3D latex Lisp, if you wish. So that's a definition of Tower 3D. And of course, it's a real-time system, so it's hard to debug. And so we had tracing and we had recording. And at the time, I did nothing to put them together in the same thing. So basically, we have two independent facilities that are both written in C++ because Tower 3D is in C++. One that does the recording. So that's an example of dump. So that's what happens when you crash and get something like this. And another that is for tracing purpose where you can activate individual traces and see what your problem is doing at any given time. But in practice, a lot of the code has both. So for instance, here you see, let me show off a little. So you have this record here followed by this if trace. And of course, they don't use the same syntax and it's lousy. So different syntax for static. One is basically based on C++ IOS strings. The other uses static data. That's really bad. Also, you can notice that the records were actually already conditional. So it looks a bit like the trace. So deep down at some point, I realized, yeah, you know, something is wrong about this design. And we are back to this famous idiocy. Now that that's not the first generation of the design. So now it's improved design is also crappy, also written with an O. Then next step is the DX01. So it's a tiny camera with a lot of software. And basically, we inherited the software from a Chinese company. And they were writing everything on the sail port at 115 k bar boards. So and they were really printf happy. So the sail port was extremely verbose on this machine, including the production units initially. So that's what the drawing here says. Let's solve our problem by using the big data. None of us has the slightest idea what to do with. So you had this bunch of data. You did not know what it meant. And you had to filter things out. Among other problems for this camera, it slowed down the boot. So when you switched it on initially, it took over two seconds to boot, which was very slow for you want your camera to be there when you need it. But even shots. So you would end up with pictures like this. This is really annoying, just because you're dumping stuff on a sail port. And at the same time, it was still necessary because the team had developed a number of tools that were using the output of this sail console to do. So this is not the actual tool, which is proprietary. But it looked something like this. And you would know what the camera was doing inside. So it was time to rethink the problem and try to make it faster. So that was a case where the original code was using printf. So I wanted, and printf was sent to the sail port. So basically what I needed was to rewrite a printf that would be faster. And so I used, once again, a circular buffer, but this time much, much larger, like 128K or something like that. And the other difference is that I was not writing one item at a time. I was writing a string. So basically, I needed to advance. And this is multi-threaded. The camera has tons of threads doing various things. And so you need to be able to do a two-phase commit to advance your buffer as you go. So the code for this is still in the project, in the Recorder project. Now you know why. It's not used in many cases, but there is still this large ring buffer emulation stuff and the testing that goes with it. And if you want to spend a little bit time after this looking at the code, look at the test code for this ring buffer because it's funny. So if there was a crash, we would dump the buffer. Same idea. And you could configure traces individually so that you could have the graphing software work. So it was faster, but it was still not very fast. And one of the reasons it was not very fast is it was formatting everything for every printf. Like printf does format all the time. You need that to compute the size of the string. And so once again, you just need to find the next step. And then I joined Red Hat. And I started working on Spice. So Spice is a remote viewing software for virtual machines. And when I joined the team, one of the things that surprised me is that this was a real-time software that needed to do stuff like transmitting images of incomes. And they had no real-time debugging tool at all. So like this cat, driving very fast, not knowing where they were doing. And so there was some basic logging based on Chile blogging. But that was both too much and too little. It was too much because if you enabled logging, it was the settings were nothing, a little, a lot, and fast amount. That was basically the granularity you got. So when I complained about this, I was told, oh, you should learn how to use grep. Yeah, go grep this. So the issue transposing my existing code in the Spice context was that the project basically did not fit for a variety of reasons. It was written in C++. So basically, I had to convert all the code back from C++ to C, which feels like this, really. And then you have to support printf format. So I have a detailed screen snapshot here of how the internal architecture of printf looks for those of you who have not written printf themselves. So it's basically a bit like this. Now, most of you see this is a game from the ATs called the Incredible Machine, and it's really fun to use. Shows you how old I am. And so one of the things that the team complained about with my initial approach is they said there was a trace file where you would define the various traces that you could activate. And they said, well, that means you have to recompile everything when you change the trace file, and it's complicated. We don't want that. So I started thinking about how to be able to define the various traces locally in each file. And so the new syntax is at the bottom. And I started thinking the recorder as the library rather than as something that was part of the project itself. So multiple declarations meant it was easy to have multiple buffers. So that's a feature I was telling you about at the beginning. So you could keep store entries around. That was really convenient. And I knew that back from the DXO time. So when you take a picture, you have hundreds of things that arrive at the same time. And so you lose all the data from the beginning of the configuration. And a number of callbacks for formatting, et cetera, because again, that's a library, so I did not know how it would be used. And then I was very proud of myself, and I started using it on Spice. And this happened. Like, what's this stuff? I can't make sense of it. It's old text. It's whaaaaa. OK, no way to understand that. So the next step was for the Spice client trying to understand why it was behaving the way it was, exporting the data in real time using shared memory and in order to obtain graphics. So why do graphics matter? Well, let me compare a game from the 80s at the top with a game from 2019 at the bottom. And that's a proof that first graphics are better. So no discussion there. Now, fast is the keyword. So some solutions can be eliminated, like for instance, piping the Spice client through grep, then through Perl to extract some data, then through Nuplot. That's just the best way to not get any results if only because you have to relearn Perl and oh, what is this syntax mean? So instead, what I did was to have a shared memory, another circular buffer that was containing the data I wanted to graph, so extracting the data in real time from the source and copying it for export in a shared memory area. The reason for this design is so as not to export the actual internal buffers because that meant I would export too much state from within the program, which I felt was not safe. And then there is a client API. And the client can connect to this remote memory and export it for display. So the way it works is you have this recall, my trace, blah, blah, blah. And then you run the program saying that my trace. So you see my trace here as a person D as a first item and then a person F as the second one. And you say that my trace is two columns that you name index and value. And then you run the recall scope with index and value. So you basically give them a name in the exported data that way. And in practice, this proved really useful to get valuable insights about what was going on in the Spice client. And then I began to use it on the Spice server and rebooting my VMs and rebooting my VMs and changing the parameters, rebooting the VM each time I needed to change one of these done parameters about what I wanted to export, et cetera. So another idea. Time to look at it again. So the next step was for the Spice server to send remote configuration values via shared memory. So exploring the behaviors with the previous setup was slow because I would change the constant or the algorithm inside the Spice server. Then I would compile the Spice server. Then I would install it. Then I would restart the VM that was using the Spice server. Then I would measure stuff. And then I would think, is it really higher or lower than before? Because the graphs are like this. So it's really hard to tell the effect. So I did not understand anything. So what I did is, well, my recaller has this trace flag that says active not active. Doesn't have to be a one-bit thing that can be 64 bits. And I can use that basically as a configuration parameter. And it actually had some value for normal debugging. Like in this case here, I'm using the value that I set for this trace and saying, if it's bigger than 1, then maybe I'm being more verbose in what I show. So that can be useful too. So this can be used to tweak your program at runtime. And in that case, you would say something like max loops equals 250 before running the program. And then you'd get the verbose case. So now you're using the shell memory to send data back the other way. And I added a dash c for command option to the recaller scope. And it was basically it. And then I thought, oh, well, well, you're not that a slider. So that's s option for a slider. And there you are. You can tweak your program in real time. So very easy. And now you can basically, that's what I showed you, explore behaviors in real time. And then next step, there was still one more iteration. Because when I started discussing this, Paolo from Red Hat, Paolo Bonzini's first thing he said was, well, why can't you compute mean max average? Let me do that. So I did this. So the thing I want you to remember from this is I think my initial design was not that bad. It's not that I did not know how to program to start with. It's that the environment changed on me. And so we are always faced with this need to redesign things. And we think, that's stupid. Why did I do it this way? It's not a fault. It's not a failure on your part. It's really that the environment changes. So I really invite you to remember this idiocy. I think it's easy to memorize. When you look at your program, what is my next idiocy? What is the next thing I'm going to change? That's the thing I don't want you to care much about the recaller. The thing I want you to remember is this. And with that, I'm almost exactly on time. So we have two minutes for questions. And if you liked it, please say it on the sender feedback. If you did not like it, you can bypass it. Don't need to waste time, too. Any questions? No questions? OK. So if there are no, yeah. Yes, that's where I was going here. So there was a talk about this, I think, last year, two years ago. I forgot. So I gave a talk about that. But it was not recorded, unfortunately. So that's what it looks like. So for instance, here, this is the page at the end that says Modo, blah, blah, blah. And it says Modo. And for instance, the initial word, I have alternatives, which are defined by this initial add function at the bottom here. And I'm using the page time, Modulo 3, to index it. So if the result is 0, I'll display initial, then improve, then incremental. And the motor function is defined here. So what you see here is the famous Excel language. So I'm the only developer, only user in the world. So this may change. You're welcome to. So most likely, this thing has been threatened because it's based on LLVM. It generates code dynamically. And LLVM keeps changing. And now I'm doing other things. I started working on this DXO thing, then on a red hat. And I have no time to follow all the changes that LLVM does. So it has been threatened. And so if someone with interest in compilers wants to help me put it back in shape, I'd really appreciate that. But it's not very complicated because you ask me, right? It's not more complicated than ASCII Doc, I think. It's probably more well-defined. No other question? It's the end of the day. I thank you very much for not falling asleep. And I hope you to see you tomorrow at 2 for the other side of the story, which was the Spice smart story thing.