 Hello, my name is Alexey Brodkin and today I'm going to talk about one quite funny experience I've got to investigating story of debugging of one problem throughout multiple software stacks. So in recent years, term full stack developer became very widely used in areas typically connected to web development. And there you have front-end developers who are typically responsible for whatever happens on the user's web browser on his or her site. You have back-end developers who are dealing with server-side of your application and you have somebody else who is called full stack developer, so that is the person who may deal with anything with front-end, with back-end, whatever. But generally speaking, that kind of classification is applicable to much more areas, not only web development. And in fact, the Wikipedia page which talks about full stack developer says that, I will cite here, in computing a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. And so full stack developer is expected to be able to work in all the layers of this stack. And so I think you'll see that that's pretty much the case here. And that's exactly what I really meant composing the title of my presentation. We will literally go or take a journey from a very high level of software, which in our case is open source, automation, server Jenkins. And so down to the guts of the synopsis proprietary instruction set simulator, those you see ICC ISS, which stands for instruction set simulator in the title. So now when the title is explained, let's get started finally. First, let me introduce myself again. My name is Alexey Brodkin. I work for synopsis and I've been working there for more than 10 years now. And so I reside in St. Petersburg, Russia and in the company. I'm responsible for development of different runtime software, including different operating systems such as Linux kernel and so some are tosses in particular as Zephyr are toss. And so Zephyr is one of the important projects on which we are working on these days quite heavily. Even though personally I'm not full time busy with Zephyr developments, I'm trying to help my colleagues solving challenging problems, especially where it helps to have a broader view on different technologies and different software parts and not only software, but sometimes even hardware. And so this presentation will be about exactly one that kind of situation where I decided to help my colleagues and do something useful in the end. But before I move to the real meat of that presentation, I'd like to thank my colleagues without whom that presentation wouldn't happen. And because they helped me a lot from debugging. And so sometimes even just with suggestions were doing hot discussions on the kitchen. And in particular, I'd like to mention our master of automation, Mikhail Falalev and our Jenkins Instance Administrator in Kyiv-Sienkov. So at this point, I think we are ready to go for the technical part of the presentation. Today, we are going to talk about debugging one of the interesting problem we faced during automated testing of Zephyr are toss on simulated our platforms. And that's why we'll start from very brief explanation of what that I think really is. I mean, Zephyr are toss and so why the toll happened in the first place. And then we'll see how the problem was initially observed from that. We'll go through a couple of steps on how we got to the bottom of that or to the root cause of that problem, how we figured out what is actually going on. And then we'll go slightly back trying to figure out how to live with that, how to solve that problem, how to get our setup up and running back again. And so obviously in the very ends, we'll come up with a couple of notes or suggestions which might be useful in those kind of situations. So why Zephyr? That's a good question. And so as I see it, Zephyr is rapidly evolving real time operating system, which is developed by a brilliant community. And compared to many other are tosses which we have currently on the market. Zephyr consists not only of a simple task scheduler memory management and interrupt handling part, but it's actually accommodates a lot of different things and quite complex software subsystems like Bluetooth, networking, crypto and so a lot of different different things. To add to that, we also have in Zephyr a lot of device drivers, which you typically won't find in other real time operating systems. And we have support in upstream or source three for multiple different boards and multiple architectures that makes Zephyr very convenient and very useful when you are going to develop your new real life application. Because that's easy. You have everything just pretty probably adds a couple of bits for your particular platform for your board, because most probably even your SOC is already supported or you may add support easily if your architecture is supported. And then you keep using available libraries and available components and builds just your business logic and your application and you are done with that. Speaking about arc and synopsis involvement into the process. So our processors were supported from the very first public committee of Zephyr and currently we're working hard trying to support what we have and adding newer features with the support of new processors features, hardware features which are not supported. We add software features and work on whatever is available and becoming available in upstream sources. So that's good. That's interesting and a lot of interesting things are happening in that. And as I said, Zephyr development is moving very rapidly, very fast. And for example, last month we saw more than 900 commits merged into the main source tree. And those were changed from 140 plus developers. So you may think how many of those changes happen like 30 commits per day. And that's quite a lot. Some of them might not be just one line change. And that basically means rates of change is huge. And with so many different changes, it is very hard to make sure that even whatever used to work yesterday still works today. And obviously the only way to make sure that your stuff still works is to test it again and again and again and do it as fast, as rapidly as possible. Ideally you would do that on each and every commit. But in case of Zephyr, we deal with typically not commits but pull requests. These are one or a couple of commits which are accepted together as one logical substance. And in Zephyr community, we already have very nice, very powerful CI infrastructure which exactly does that. So we built for, if not for all, but at least for a major amount of boards supported in upstream Zephyr. And so for them we built a lot of different tests that obviously happens automatically. And then on some of the boards, tests are being executed. Now speaking about why we need execution because what God builds is not necessarily will work when you execute it. And then that execution typically happens in boards which are based on QMU emulator. And so the reason is QMU is free. You may even build it yourself. You may introduce patches which fix stuff that you need. And so it could be easily deployed in independent open source CI. And so that's what is done in Zephyr. So that's good. But the thing is QMU is not the only type of boards which we support. We have our proprietary simulator. We have multiple different boards for which we also need to make sure that they work. And so obviously what works on one platform may not work for on another, especially if it is not only differs by name, but it has different configuration of peripherals. It may even have different configuration of the CPU itself. And in case of arc that might very well be the case because our cores are very configurable. And so that means whatever we have in upstream CI is good, but we want more. We cannot imagine having all the boards tested in upstream CI. And so that's why every vendor, every developer ends up building their own CI farm or CI infrastructure. So that's how we got into that business of having and running our in-house CI. And so in our case, due to different reasons, we used Jenkins as the major automation server for that. And so that's one of the most popular open source software for those kind of things. And so why not? That's a good thing. Okay, so we talked about that. And so how that testing is done in Zephyr? Except for codes, which does something useful and like kernel itself, different libraries and drivers, we also have a lot of different tests and samples. And so that is also another important difference to other real-time operating systems that we have currently available. Because for a lot of different subsystems, for a lot of drivers, for a lot of boards and different parts of the tree, we try to create tests. Some exist and some we add whenever we decide that it is needed. And so that's in the end gives us a couple of hundreds of different tests, which covers different parts of the Zephyr RTOS. And so it's not only that. Again, in many cases, you may have different tests and examples. But you would rarely use them because for that, you need to do some special steps. And well, you just lazy enough to do that. But in case of Zephyr, we don't only, we not only have those tests itself, but we have also testing infrastructure, which through execution of just one single script, one single Python script, you get a lot of those tests executed. So basically what that thing does. First, depending on the platform, you are focused, you may be like everything, everything for all the boards which are supported. And so all the tests which are supported on those boards, or you may just specify a couple of platforms of your interest or a couple of tests that you want to look at. And so then that script, which is called sanity check, I think for a good reason, it's basically builds those tests, which are, which were filtered by your platform once you specified tests. And then once they are built, if execution is going to happen because you specify simulator platform or sets, especially instructed sanity check to do execution on boards, I think it is done like with minus, minus board checking. But anyways, there is quite a good documentation where you may check all the parameters which you may want to use. So when there is a need to run something that same scripts will execute test as well. And again, it is not only executing tests, it actually captures test outputs and tries to figure out if that test was execution correctly or not. Again, that's quite a difference compared to many other things where you need to do that manually or write your own scripts here. You get sanity check executed. And then in the end, you get something like that. So you see some summary, which gets you quite a good understanding of what is actually going on if you are doing well or not that well. And so, yes, so now if we move forward and look at that sanity check output that we just discussed, you may actually see that there are a couple of problems. Well, in fact, not a couple, but quite a few. You see 17 tests failed. That's not something we want to have. Another interesting thing, so obviously whenever we get those kind of results or any failure, we try to reproduce that locally. And beauty of that sanity check, there's the same comments which you may use in automated infrastructure. You may do it on your own machine. So we try to reproduce that problem. But surprisingly, even if we run it on local machine or on the same server where automation is running, we are not able to reproduce that. And so, well, that makes debugging of those failures not that easy. Now speaking of the CI itself, it is I think the most useful thing when you have exactly zero failures, typically observed in executions in your CI, because then it is very, very easy to figure out when problem got introduced. So yesterday had clean results. So today have some failures. So you know where the difference came from. Obviously, that's a difference which was not in sources yesterday and yesterday. And even if you have like even 10 different commits or 10 pull requests, that's quite easy to bisect it to the one which introduced a problem, but then you until with that in the best case, you just ask a developer who introduced that who submitted that change to take a look. And so these are some symptoms. So please take a look at that. But then on the other side, if you have failures in your CI, which you cannot explain and more or not one failure, but many of them more or not just many of them, but you have different amounts of failures from execution to execution that makes that CI system completely useless. That's my opinion on that subject. Because that gives you no idea on what is actually going on. Are you getting better with your project of interest or you are getting worse? Nobody knows because today have plus one failure yesterday had minus five failures. So what does that really tell you? I think nothing and people just stop looking at that at all. And so I think needless to say that we were getting exactly in that business because we got those 10 from 10 to 20 tests practically failing. And so we had no idea why that was happening. And so because we still wanted to get some use of that CI setup, somebody had to buy the ballot. And since everybody else was busy with different other things, I decided to give the try. And so, okay, looks like I was successful with that. So with all sets, we are now ready to start our journey through the software stack, but we'd better be well prepared for that long trip. So since we are not able to reproduce that same problem manually, it was not even clear which part of the CI process effects test execution. And we started from full blown CI job, which was doing a checkout of services and building and running. And so everything was done from scratch all the time. And so even though that allowed us to have that simple production, a simple scenario for a production, it was still not very convenient because turn around time was about half an hour. And whenever you do a change, you need to wait that amount of time. And also given it was in the CI environments, whenever you want to do some change, you had to do that in scripts and then wait until completion, then do that yet another iteration that was not very useful. And then you have to do with artifacts which you need to store properly because otherwise CI machinery will get rid of that on next execution. So it was quite inconvenient. But anyways, to make faster progress, we decided to make that execution time short because otherwise it just justifies coffee breaks, but doesn't allow you to be productive in solving your problem. And so we started to minimize the test case. We removed checkout of sources and that was useful. We decided to reuse the same source which we already have. And so we had a number of execution of that full flow. We figured out that we have at least one test which reliably fails and so platform which was used for that. And so we had a test case which allowed us to reproduce the problem in 30 seconds. So again, this is important because it allows you to try different things much faster and move forward. Yes, so that was already a good progress. But then when we agreed, okay, so there is a test which we are going to work on, let's see what we can get from actually execution of that thing. Typically, sanity check would output something like that. So that will be a typical error in normal execution of sanity check. It is not very verbose, but okay, it also gives you some idea. It shows you the test which fails and at least one reason. So it might be build failure. Here it is, you see a timeout. Okay, but this is not all that's useful because we need to want more data on what's going on there. So we add more verbosity to the sanity check script. And at this point, we already see, okay, that was indeed timeout and execution lasted a little bit longer than 60 seconds. Why that 60 seconds? Because as you remember, that timeout, so which is used by a sanity check to kill execution of the simulator special end. So that thing I wanted to mention, when a sanity check executes a test on real boards, nothing actually happens if we want to stop execution because what can we do? Boards will keep running and in the best case, we will reset that on the next execution of the next test. We do that on our board at least. We try to do that. That helps to make sure that you are starting from scratch and no internal states are causing you new problems. But in case of simulator, what's a sanity checker actually does it kills execution of the simulator, be it QM, UR, NCM or anything else. So that's how it works. And so I think that will be useful for us in our further discoveries. But then also what's good about sanity checker, it is not only running stuff, but it dumps a lot of data in different logs. And in particular, there is a build log. But for us, build was not a problem, but there was a hand error log, basically, that's a dump of the console, which is captured by the sanity check. And see here, we may see that indeed in the very end of execution, we were stuck with something. That's an output of the test. That's okay. But that's a normal part, but the problem is we stop here. We are not getting any project execution successful or failed. And so that's the problem. Okay, so at least we know that for some reason, we stopped getting updates to that log. Okay, so what might be the reason for that? What hypothesis we may have for that? Okay, so the first thing which comes to my mind is, okay, maybe data which is being passed by the simulator to the sanity check gets buffered somewhere. And from my previous experience, and I think that's true for many embedded developers and not only developers, not only embedded probably, we know that standard outputs get buffered quite often. And so there is a good reason for that. We don't want to mess with each and every symbol which gets printed, for example. So typically data gets buffered. And then from time to time, we flush that buffer. And so our letters appear on the screen in the terminal, our files get written to the real storage. And so that's how it works. And apparently it turned out that in Python, which is used in sanity check script, sanity check is written on Python. In Python, there is exactly the same concept. And data will be buffered typically. And so if you take a look on the right part of the screen of that presentation, you may see a very simple snippet which demonstrates that behavior. So if you execute it as it is, you will see print one, then print three, and only then print two, which will be flushed on completion of that script execution. And so with simple googling, we figured out that there is a way to get Python working in an unbuffered manner. And so that's fine for that. You just need to either export Python unbuffered environment variable or just add minus U to the Python command line. And so we tried that, but apparently that didn't help, by the way, so that presentation wouldn't happen at all. So what would be our next theory? Okay. So maybe a simulator is dead by that time. And so that's why we are not getting updates out of it. To verify that hypothesis, we'll use Strace. And Strace is a very powerful tool which currently might be used not only as a tracer, but for many more other use cases you may even introduce expected corruptions in between your application and system libraries and Linux kernel. But for that, I strongly advise to Google and watch a couple of presentations made by Dmitry Levin, current maintainer of Strace. It has those presentations. They have a lot of interesting insights and newer features, not very new features. And so that will definitely help you one day. So please, please go look for that and so watch them. And in a very simple case, what Strace does is you start your application through Strace or on top of Strace. And what happens when your application is asking for some services from Linux kernel or from system libraries, it basically does a call to those system libraries or does a system call execution directly. And Strace captures that request for syscall execution system call. And so then it records that and passes to the kernel and so then records what kernel returns and then passes that contents to the application. So that's how we get very nice noninvasive look at what's going on in between your application and underlying system. And so with that, what we are trying to do, we are trying to figure out if our application is doing anything useful. Because if it is not looking in the simple while loop or not executing any instruction at all for some reason like being halted or being in sleep all the time, most probably it will at least try to read files from file system, try to get some additional service, I don't know, ask for time or something like that. And when we run our execution, our simulator under Strace, we do see that something is actually going on. A lot of things happen. So that suggests that our simulator is alive. A couple of other things I'd like to mention. So typically, so if you execute Strace without additional parameters, it will completely ruin your console output because it will, you will have a lot of output from Strace itself, which will be mixed by your output if you happen to output in the standard output and you won't see anything. But then you may redirect it to your log file with minus or your name on your log file. And also another thing which I typically use by default Strace will only monitor your parent process. And whenever other process get created or even threads are created in that process, their events won't be captured. And typically I run with minus as a flag, which stands for follow forks to see what's going on even in process, which was found by that parent process. And so we'll see if it was a good idea or not. So yeah, we see that simulator is alive and that field was not proven. So let's go one level down again. And if a simulator is still functional, so probably our model CPU is dead. And so how are we going to check if that's true or not? We may just record instructions being executed. And so looking at captured instruction traces fortunately with that particular simulator is possible. We can make at least two conclusions. First, the trace keep growing, which means the CPU is executed something. And also if we take a look at the trace itself, we may easily notice that at least we are not executing one simple loop like go to the next instruction and then jump to the previous one and looping in that cycle again and again. We are not seeing that. Probably there is a larger loop, but you cannot tell it looking at loads, at least with your own eye. Okay, so that's a good thing. And okay, so we see that something is going on in our simulator. So at least it is not dead. But there is another interesting observation. Since we were not able to reproduce that locally, but we were able to reproduce that in the automated setup, what we may do, we may try to capture two different traces, one in a local setup where we know that execution is correct and one from the incorrect execution in automated setup. And then just compare it to logs and see if we are getting anything interesting out of that. And so we do. And what I'd like to note here, that's actually not all the simulators born equal. So for example, aforementioned QMU, which has a lot of different benefits, but there are a couple of downsides. One of them, it's cannot records trace of execution target instructions. And the reason for that is in the first letter Q in its name, which stands for quick. To be quick, what it does, it's implement something like G to adjust in time compilation. Basically, what it does, it splits target code, target instruction. Instructions in so-called basic blocks. That's a couple of instructions, which are executed all the time linearly. And so then that block gets converted to the host instructions, keeping the same semantics of entire block. And so then the next execution of that basic block happens with substitution of that those target instructions with host instructions, which means we cannot sexually dump instructions executed by the target at all. Probably except for the first time, but that typically doesn't have a lot of sense. And so yeah, in that case, and team, our proprietary simulator helps us to get complete and full trace, even though it might implement it as well. And with that, it works much faster. But so first, we were not running way too long anyways, and I didn't want to introduce additional variable here, who knows how that git works in the end. Probably it might be buggy as well. I'm pretty sure everything is buggy. So to make things simple, I didn't use G to name and seem anyways. And so we were, we were able to capture those logs. And in other interesting parts of QMU, it is not instruction accurate in the sense of timing as well, even though there is a such thing called icons, where we try to mimic that in accuracy, or at least to make sure that instructions are executed with the same speeds on different execution. This is still not true. There are many discussions of that topic. But anyways, in case of unseen, we may be sure that interrupts at least which are caused by our built-in core timers, which are incremented synchronously with execution of commands happen every time in the same location, in the same like cycle, which we cannot say about QMU. So yeah, with QMU, we won't be able to reproduce that at least that easily. But here, with unseen, we may do that. In other interesting parts here, in that particular case, we knew for sure that nothing, no communication is happening between external worlds and our test, which means no source of interrupts were going to happen unexpectedly at random point of time. So again, everything should be very predictable. And so we captured those two traces, even though they were quite huge, like more than 2 gigabytes each. And so then we tried to open that. Obviously, you're not going to do that with your GUI tools, but with VI, or it's a wrapper, VimDev, you may get that done nicely. And here, you may see that we do have those logs. And so what we may see here, we have two different executions. So one execution is correct on top. And here, what happens in iLink register, we write address where we're going to return on execution of RTIE instruction. And so then we execute that RTIE instruction and we get exactly to that address. In case of wrong execution, we write the same value in iLink register, but then somehow we end up at a completely different address. If we disassemble our codes, our binary of our application, we would be quite surprised seeing that that's actually entry point of an interrupt. So what happens, it looks like we are jumping from interrupt handler right back into interrupt. What also suggests that might be true, that highest beating status register, which says that interrupts are enabled. So due to those two facts, we may conclude or at least make an education guess, probably we are returning unexpectedly to the interrupt handler. And so remember, we don't have any source of interrupt other than timer and timer interrupt, which are served. So most likely, we are not going to return here. So that seems to be a bug in the simulator. So that's good. We are getting to the bottom of that issue. And so we know who is misbehaving, but then what do we have out of that? And so literally, we've got to the bottom of the software stack. That's the executable platform, not even a code of target. That's a real target, like a harder button. In this case, that's a simulator. And root costs seem to be simulated, that that's okay. But on top, we have Python, then we have our codes, then we have Python script, then we have Jenkins, which is written in Java. So we see a lot of things. And even though we have engineers who will be fix on that simulator, that's still not clear when we are going to get that fix, because that problem might be not fixed that soon. And so still, we don't know why that's only happens when we execute it in Jenkins, but not when executed manually. So we want, as engineers, we want to get all the answers on all the questions answered. And for particular, in particular, we have the following questions. Why nobody faced the problem before? How Jenkins affects execution of the simulator, because typically, you don't expect that to happen. And how to prepare a monistic test case for a simulator engineers to deal with that, because so far, we're only able to reproduce that in the automated setup. So let's try to get solid explanation. And for that, we'll try to get in the reverse direction, seeing what we may do with that. So we return one level up, or back, it depends on how you look at that from software stack standpoint, or from our previous adventure to the root cause, and check if there is anything unusual we may want to examine. And so the only material things that we have are logs. And so even in these logs already, we may have a couple of interesting observations. So first, we see that line where execution stopped. That was the last line when that's a safety checker, a safety check script, locked into its log file. So that's one thing. Another thing is, you see there are, for some reason, two different PIDs, 979 and 712. So that's also interesting. And so also what we see, you see a lot of things happening, and so they all are happening in the same millisecond. So that doesn't look that right. And so those reads with returning nothing, that is all very suspicious. So to understand what's going on, we need to take a look at the simulator. And so that's why we go two levels down the stack again, and see what's going on there. Luckily, again, we had the luxury of having access to the simulator sources. And so that's always quite a benefit. And in that sense, open source software is very good, because everybody may go check those sources and try to build it and see what's going on. But for us, again, fortunately, it looked like open source because we got access to sources. And in those sources, we decided to look for something which could be easily found. So for example, those flags, which I used in Paul's system call. And so here I mean exactly those two. So we are talking about Paul in and Paul three. And so if we do a simple key drip, we may easily find a place where they are used. And so there we see a couple of interesting things. So first, we do see exactly what we expected to see that execution of the Paul in a while loop. And also we noticed that so that function itself, even though it is not seen from here is being used in a completely separate thread. And so I think that explanation is quite easy. It is quite easy to implement it that way as separate threads, which keeps checking what's in the input. And so that also already explains that we have additional PID. The one is for like CPU core emulation and one for that standard input Pauling. So still even though that codes look quite valid, we see couple of interesting things. First, we expect that this Paul will block until we get something useful. But what we really see if we start debugging that with additional instrumentation or GDB, we see that we immediately get that one returned. And that's the same thing we saw in our logs previously. So it returns immediately. So why that might happen? It is quite unexpected happening. And another interesting thing that we see that that read returns nothing actually returns zero like as if there is nothing in our input. And so that looks very suspicious. So now if we consider a couple of things, and in particular, if we consider that problem that we were seeing, it was not seen in the other attempt to do that execution. So while we were trying to create a minimal test case, we tried to use so called freestyle Jenkins job. And in that case, that problem was not reproduced. It only happens in that new freestyle Jenkins jobs. And while we were discussing that with our automation people, they suggested that there is quite an interesting new thing in our Jenkins pipeline jobs that they use something which is called durable task plugin. So I go so I went to Google that sense. Here you may see a couple of links, which you might become familiar with if that's of any interest. That's the most interesting article, even though it is archived already, because that website is not available any longer. But from that, I was able to figure out that so that durable test plugin uses no hub. And so then I went and checked what that's no hub actually is and what does that thing do. And among other things, what it does, it connects dev null device to the standard inputs of the process which is run under that no hub. And so that is already quite interesting, but it doesn't provide us any answer at this point. So what we do, I started to think about so what's wrong with dev null. And apparently, if we look at its documentation, it says that it exactly always returns and the file and it is always a date. That's kind of data is always available. And so that already explains a couple of things, I think. So that explains why Paul returns immediately because nothing happens there, we just return. And then reads reads nothing because it only returns and a file. Okay, that's good. And with that, we are able to reproduce that very easily. So that's our simulator and cement. These are properties describing the system, the processor, and that's our executable. That's dev null. That's simple. That allows us to reproduce the problem. Okay, so with that, we are done. So we now have a couple of answers. That probably was nobody faced before by anybody because nobody in their same mind would connect dev null to the standard input in Jenkins that happens because of that plugin and that no hub thingy. In the IRQ handler, we see that happening because in that loop, we may see that we are looping infinitely and very fast. And that's why a lot of requests to reset interrupt happening there. And so, well, there seems to be a race exactly there. And so minimalistic case is available. So what can we do then? And even because even if that problem gets solved in the simulator, which we found with instruction trace, it still won't help us with the situation where that infinite looping and host instructions will be wasted for nothing. And so for that, what we may actually do, we just may accommodate those peculiarities of dev null. So what do we do for that? We add reads where we monitor how many data we read and if we read nothing, we first get back to the cycle and not proceed with the execution of that interrupt handling code. And also we add a sleep. So we relax a little bit and so don't spin here infinitely. So that's what we may do here. And even that fix allowed us to get our CI back in the normal states and all the tests were passing completely normally. So that also solves our initial problem. And so that's good. And so that's how we got to the top of our software stack again. And yeah, so that's how we went down to the simulator and returned back. And now we have all the questions answered. And in the end, when the journey is done, I think I'd like to mention a couple of things, even though there is no particular hard or hard or particular technical suggestion that I have about a couple of things which really helped me a lot. So first, and so in my opinion, the most important part for any engineer is to be curious. Please don't keep questions without answers until you get to the bottom of the problem whenever whatever hack or fix or work around implement is just another hack and it introduces more problems typically than you fix. Because if you don't know a root cause, so that's a root cause or even your additional hacks on top of that will bite you inevitably or even your customer's users. So please be generous to them and try to figure out what is actually wrong. In that case, we saw sporadic failures in DCI and they seriously got in the way. So I really couldn't stop thinking about that until I got to the bottom. And so now I have that piece of mind back, which is quite good, I think. The next one, it's good to be persistent in your search. So you have to be curious and persistent in getting answers to your questions. Because here in that presentation, we spent less than one hour or like 45 minutes by this moment. And so everything got explained. But really what's happened in that reality in real life, it took me a couple of weeks to get to the bottom of that. And a couple of times I go that that's completely dead and there was no explanation of anything. So I kept looking and talking to different people and trying to find their solution and answer to those questions. And in the end, I was able to do that. So that's the next thing. And then another thing which is important is luck. But here I don't mean like that pure luck. So when you're sitting on your sofa and so everything falls into place. So probably sometime it happens, but not that often, at least in case of our systems program, system software, it barely happens like that. Itself everything goes worse and sideways, but not something good really typically happens. So for that luck to happen, you have to be curious. You have to be persistent and so look for different places, try different things, talk to people and then eventually that luck will appear as finally working something, something, some hypothesis being proved. And so with that, you may move forward. And so more experience you have, more challenging tasks you solve, more developed and advanced is your gut feeling, which you will have that luck to to happen much sooner rather than later. And so that will help. In our case, we were lucky because our instruction set simulator was instruction, instruction allowed us to get so those traces and it was instruction accurate. And so timing was the same all the time. So that was good. Logs were not that large, we had only two gigabytes, even though sometimes we have to deal with literally terabytes of logs. And so try to find something there. In our case, it was quite easy on that front. And so instruction diverged so quite soon it was so like less than 500,000 instructions. So not that many is not even Linux kernel booting. What next? What is important? So to have access to sources and documentation as well as knowledge database in that sense, open source, which we are discussing on that conference is that's where it really shines because you have access to all the sources and you may do things, you may look at sources, you may do your experiments, you may improve those sources. And so that's why improved project that you use in our particular exercise. It was very similar to that because we had access to sources, but anyways, you see, in case of truly open source projects, that's what you get from the very beginning. Another important part is actually people around you. So talk to people around you, discuss things that you have. People will help you debug problems. If not help debugging you, they will try to suggest something to you and if not even suggest some discussions might seed an idea and so that might be a good starting point for your future developments. And that was exactly the case for me when I heard about that plugin in Jenkins because I have no idea for what those Jenkins plugins are, but that's helped me a lot. And then another interesting important thing is a good toolbox of tools which you know quite well because that helps you to be productive and successful and so again, move fast to your targets because otherwise you'd be spending a lot of time trying to solve very technical problems. But if you know your tools good, so like for example, that's a VI trick with BeamDef, when you're looking through your sources with different editors, different things which allow you for non-invasive monitoring like logs, as trace Linux, Perth and all that. So knowing those tools and having experience working with them will obviously significantly improve your efficiency debugging complex problems. So I think that's it what I wanted to discuss today. Thanks a lot for attending that presentation and take care of yourself and your relatives. We'll talk to you hopefully next time. Again, thanks a lot for watching this. Thank you. Bye.