 So Konnichiwa and hello, everyone. This session is about semi-formal verification of embedded Linux systems using trace-based models. Watashiwa Benodes, my name is Beno Bilmaian. I'm a research student at the Technical University of Applied Sciences in Regensburg, currently in my master's degree. About two months ago, I physically left my home university to join the Tokyo University of Sciences for starting a research internship. So that means, gladly, I have had already the opportunity to experience Japan and Tokyo in particular quite a bit. So if there is anything you do not dare to ask the locals or your Japanese colleagues, you can reach out to me off of that hook. And with that being said, I'll hand over to my co-speaker. Yeah, thank you very much. You're also a welcome from my side. And we're doing this crazy experiment of switching speakers all the time and turning on and off mics. So I'm Wolfgang Maurer. I've been told that means Maurer by my student. It may also mean person who makes me rehearse my slides all the time again and again. But who knows? I see many, many known faces here. So I work at Siemens Corporate Technology in the Corporate Competence Center Embedded Linux. And I also have a second head at the Technical University of Applied Sciences in Regensburg, where I had the Digitalization Laboratory. And this talk basically came out of two interests that I'm following in both roles. One is for the Civil Infrastructure Platform, where we, as Kate this morning, very nicely introduced work on various topics that relate to safety critical to mixed criticality systems to what keeps our civilization alive. And academically, we are a associate partner at the ELISA project, where safety and Linux obviously are likewise receiving a great deal of attention. So for those, I guess that's very few people in the room who have never heard of mixed criticality systems, or perhaps it's zeroes, but you know, these come in many shapes and sizes and weights from airplanes or the trains to say medical imaging devices. But all these devices have in common, although they are very, very much in the form factor, is that they have some portions that are safety critical. If these go wrong, then people will be harmed, people can die. And of course we do not want that, so that needs to be prevented at any cost. But they also have more traditional tasks to satisfy, functions to fulfill, like in this plane you have an onboard entertainment system that is probably not safety critical, so no one will die if the entertainment system fails, probably except if you are on a 26-hour hold from Melbourne to London without any stop and the entertainment system fails and you get a mutiny on board by the passengers, but that's then beyond the control of technology. And actually in this case, it's very simple to separate between the critical and the uncritical parts, so just don't let the entertainment system play with the engine control or with the flight regulation software and you're good, you can afford that in such a big system. When we look at trains, the distinction line becomes a bit blurrier, so you also have lots of software on trains, some of which is safety critical, some of which is not. But say in the traditional control systems that we're using, people start to employ Linux more and more, so there's not such a clear separation anymore, and then you go to these devices. Here you have problems that you need to employ hard real-time control techniques, for instance, in case of this machine to control the electromagnetic pulses that are responsible for image generation or in case of computed tomography machines, the doses that a patient is subjected to, but at the same time, you need to do large-scale data processing, throughput-oriented data processing on the same hardware because you have a certain cost pressure going on these days in these systems, and of course you want these systems to be not too expensive because obviously it's a good thing to have good healthcare available everywhere. So we are seeing more and more pressure to combine critical use cases with uncritical use cases. Now, these devices have very little in common with what we are seeing Linux deployed on typically embedded Linux these days, but again, as we have learned in many interesting talks at this conference, Linux is seeing more and more adoptions, adoption in devices like this, so why are we doing this? Actually, that raises two questions. What is the difference and why are we doing that? The difference is clearly all these devices have safety requirements, which bring in security requirements. They very often have to satisfy real-time demands, and a point that we cannot go into detail in this talk but that should also be mentioned is sustainability for such devices is very important. You may throw away your mobile phone after two years in use. You certainly don't do that with a Boeing 747. Why are we doing that? Because consumers, users of these devices always request more and more features so we cannot use traditional engineering from the first screw-up woods to build these systems because cost and time are simply prohibitive factors. So, to summarize our motivation, why do we need mixed-criticality systems based on Linux? That's because there is a trend that has been going on for years towards using common-off-the-shelf hardware. Industry is under pressure to build software-intensive control systems so we rely on loads of software written before because we want to combine convenience features with the strict determinism that's required in such systems. Since you're all experts in open-source software, I don't need to tell you that many elements of the traditional open-source engineering process like these self-organized, highly often highly informal development processes, at least informal from the point of view of a safety certification. Exist the software is of high complexity. A Linux kernel is so infinitely more complex than a traditional QNX or VxWorks real-time kernel and the software also changes rapidly. So we need new ways and this has already been discussed by a number of talks at this conference to arrive at systems that are A, safety-critical and B, we can prove they are satisfactorily safety-critical requirements. What makes it even more interesting is that this goes well beyond Linux. We need to deal with other real-time kernels. We may need to deal with hypervisors that are becoming standard parts of such software architectures. We need to include the middle where so there's no kernel user distinction. In this point of view, we just need to cover everything. And eventually, if you want to bring it down to two challenges, the two challenges are the processes that we employ, the development processes that in some way or another need to be made compliant with safety regulations. So it's about people to some extent, but also the technology itself needs to be up to the challenges, needs to satisfy the demands, needs to provide the sufficient real-time guarantees and so on. So it's two things we need to look at for this talk we're focusing on the technology and our efforts in that direction. And Philipp Armand, since you are all very technologically advanced and have your time machines at hand, gave a very interesting talk on Monday yesterday. So set them to Monday 11 o'clock and you will hear that what makes the system safe is assessing whether a system is safe requires understanding the system sufficiently. And that is precisely the motivation for our work, which Beno-san is about to explain to you. All right, so we heard a little about challenges, problems and issues with mixed criticality and real-time systems in general. And what we would like to have is something that gives us reliable statements or even guarantees about real, about the runtime behavior and especially in the worst case latencies of these systems. And the main reason or the main point prohibit preventing us from doing so or doing so in a straightforward manner or using established tools is the system complexity. So we came up with an approach consisting of four steps on a conceptual level that is outlined here on the slide. And the core of this approach are two ideas. The first is simplification. So we take the simplification of the system via modeling and abstracting its complexity in the first step. And this is done by leveraging expert knowledge about the system. This is also the reason why it is called a semi-formal approach. And well, as we are in a domain of zero-based counting, there is actually a zero step hidden that we just consider as a precondition, which is, well, if you're gonna model something, you actually have to know what you model. So we just assume for this approach that you already have identified the scenarios that you are gonna consider for the verification approach. And the second main idea is to gather empirical data about the runtime behavior and use this data to later on extend the model and use for stochastic analysis. And to achieve this, to achieve gathering data, we need some kind of system instrumentation. This is what the second step, the annotation part is about. And so we used then that instrumentation to gather the data in a real-world context and production, extend the model and eventually derive statements about properties of the system. So let's dive into the single steps. But before I again hand over to my co-speaker, I just want to mention that we are going to use a deterministic finite automata as a modeling type, but you might also know them as state machines. Yes, deterministic finite automaton state machines, tomato, tomato, doesn't really matter how you call them, as long as we can all agree that the correct spelling of color has an O and a U in it. But I see that everyone agrees with, okay, but maybe let's shift aside that discussion. What are the properties that our model should have? So we are abstracting away from the system somewhat and we'd like our model ideally to have three properties, generality, realism and precision. So it's getting a bit academic here, but it makes sense to consider these three dimensions because generality means that the model, and I guess we should have chosen a slightly larger font size, is applicable to multiple real-world situations. I mean, that's what the whole idea of abstraction is about. You take a system, you make it simpler, and then this pattern applies to many, many instances of different systems. We can have realism. That means the model accurately represents real-world phenomenon. The model behaves in exactly the same way as the system does. Of course, this is a bit in contradiction to generality because either we want to abstract or we want things to behave exactly in the same way. But there's also the consideration about precision, and that means the model minimizes the error between its own behavior and the real-world behavior, meaning that we allow some differences between how the model behaves and how the real system behaves. But if we can bring these differences beyond a certain threshold, that is acceptable for our use case. And trust me, regardless of what you are trying to certify, you will never, for no standard whatsoever, need a model that completely behaves in the same way as the system itself. So some amount of error is always tolerable, and that's why we chose to go for these two properties, generality and precision. So we deliberately simplify systems to make the models tractable again in our analysis by sacrificing realism. So we don't go for a completely faithful model of the system, but we still ensure that precision is possible to the desired level. So what have people done previously? To ensure, for instance, that real-time properties are satisfied, I guess. Many of you will know this nice test track built by Thomas Gleichsner in Germany at the OSADL Foundation. Actually, they do not just have one, they have, I think, 18 of these. So don't want to put the power bill. And what they do is they run all kinds of embedded systems with the latest real-time kernel, subject them to some load tests and then measure for a sufficient time duration and then determine the maximum. That works fairly well. On the other hand, what academics would do is you go to the source code level, you just dispose of the whole system, all this hardware is just a minor detail. We don't care about that. Let's go to the source code level and ideally count all paths through the code and then determine the maximum. Of course, both approaches have their strengths and weaknesses. This, for one, is way too hard for any halfway realistic system these days. And this one is a bit simplistic considering that there's quite many things that influence the actual behavior of the system. So what our approach is trying to achieve is on the one hand to benefit from the simplifications that this approach brings. Basically, that's a model that represents a system by an input, time in between output, then measure latency. So we can go for more complicated models that more fully represent the system, but we still do not do it up to this standard, but we use our simplified model, then instrument the system as such to record information about the actual behavior that's then connected with the model. And then from the model, we can infer the quantities that interest us, be that the worst case execution time, maximum latency, other properties, what not. And there's lots of statistics that you can throw at this problem. So we are not, we decided to not go too deep into that direction in this talk, so times finite, but there is a lot of established theory that we can use to that purpose and we're always happy to discuss that with you. So for this talk, we're focusing on the approach that we're actually doing. Here comes again our highly innovative flying speaker switch mechanism. So back we are to our previous model. As I already said, we use deterministic finite automata. Why? Because they are simple and yet powerful enough for our purposes and the latest statistical analysis. And I just want to highlight that the model states do not have to correspond to the system's states on a technical level, in technical sense. So these kind of effective states can be abstract representations of anything reaching from a tiny block of code to whole models, modules. So, and the implication of that is that we are not limited to the technical boundaries of the underlying implementation. So this enables us to model on a global and system-wide level. For example, combining some internals of the kernel with user space things or taking properitarian binaries like the firmware into account in such a modeling approach. This model is actually modeling a synthetic real-time application we are going to use during the next couple of slides to show and demonstrate the next steps. So now that we have a model, the next thing is instrumenting our system. As I already said, this is kind of, we merge our formal model into our system and defining there the states and transitions we are interested in. And it might already be clear, but I just want to stress that we are not only interested in the execution path token, but also in the precise timing, so time information and also some further information about the context of the execution, which might help us to bring a total order for the state changes when we deal with concurrent execution. So when it comes to tracing, Linux provides a bunch of mechanisms as well in the kernel as well as on the user space side and in the user space with and without the kernel involved. And certainly this is not, choosing the right one is not simple. Therefore, we're going to give you a short overview of the mechanisms we considered so far. So we, just a few of them, we had trace points, static trace points with custom handlers in the kernel. We used trace points with ftrace and the interface in debug.fs. We use custom probes and also custom handlers, so basically just inserting some C function calls. We tried k probes, user probes, user static defined trace points, user events, and also some more advanced mechanisms like LTNNG and EPPF. And as we are not only interested in the domain of what the tracer is capable to measure and trace as in the first column, we are also interested in, well, is it a dynamic approach? Does it have static probes? And also we don't want to restrict ourselves to one technical implementation or trace syndrome in the domain. So we are basically looking for some approach tracer that is able to gather or combine data from different sources, from different scopes and domains. And also we want to record data as realistic as possible. We want to keep the required changes at a bare minimum. So we try to use out of the box tools and approaches that is what their mainline column is for. And as a last note, we are on the very right hand side, we just outlined our experiences regarding usability of the traces regarding the annotation and measurement step. So as you can see, it is quite colorful and there is not one perfect solution. So it highly depends on your model and scenarios which a tracer and which instrumentation you should pick or suits best. So is there any author of the involved tools here who would like to correct our highly subjective assessment of utility and ease of measurement? No, then it's probably correct. All right, so we have a model, we have the system instrumentation. So we have some trace points marked in our system, in our code. So now we can start measuring the real-time behavior of the system. And just as a, again, a short reminder, this is our model of the synthetic real-time application comprised of six states. And yeah, they are kind of representing the essentials regarding our scenario that we consider. And well, measuring how could something look like that? Well, we get basically a log of events, as I already said, about with information about the code path that was taken and also the time of information. We take then this data and as I said, extend or annotate our model. And this could be visualized something like that. So now we have each transition represented by one box with its name in the first row and in its body. The latency values are drawn as a histogram. The blue line indicates the smooth kernel density estimation and the red vertical bar is the worst case latency that we measured during our measurement scenario or run consisting of 100 runs of the application. So now the big question is, how realistic is the data representing our system modeled? And to clarify that, we increase the number of runs we measure. And as you will see with increasing number of measuring data, the distribution of the latency values converges towards an actual distribution of the latency values. And also the worst case latency will shift to the right which makes sense as when we gather more data, it is the probability is higher that we catch an rare case of an increased worst case latency or scenario. So yeah, these are 200 runs. And you see, as I already said, the distribution sharpens a little. Yes. So the following question is, when do we have enough data? And there are many options in statistical analysis, but certainly this is out of scopes for now, but we are open to discuss that afterwards or in break. So in order to get back to something that you are probably more familiar with when you do traditional real-time systems analysis, in a, say, a cyclic test experiment, what you measure there is a point into the soul, some signal gets into the system, is processed, gets out of the system. From that automaton point of view, that would correspond to any path from where the signal comes in to a possible exit of the signal of which we have a number. So we could follow these paths one to three or one for six or one, two, five, four, six, and so on in that simple model that's limited. But eventually, if we combine the data X post that we got it, we can regain these usual diagrams of global latency distributions. Again, here you see how it converges once we increase the number of measurement runs, but that's very standard. The important thing is we get that out of the measured data and out of the model, but could also look at different things. We could look at more specific paths. We could look at whatever you desire basically one important thing is also we can combine multiple models. So as Fennel mentioned, we have some distributions that are wider, some that are sharper, and that's an indicator of how faithfully typically we model the system. So here we have lots of unobserved noise that can be in the Linux current example, for instance, caused by interrupts, perturbing the control flow or not. You can also model that with an automaton then analyze this automaton, get information on that and see how that interacts with the different automaton and so on and so on and so on. So we have lots of possibilities to combine that, but that concludes basically the process that we are following and the approach to our measurements. So now we come to the final fun part. So far we've shown you a synthetic model of synthetic application, but of course the approach also applies to real world scenarios. And we took as an example the IRQ handling process which conceptually consists of the three steps. We receive an interrupt, we process the interrupt and based on the processing, we trigger some reaction. And the system instrumentation of that model could be looked like on the right hand side. So again, the three steps also are represented there. We get an interrupt, we kind of handle that and react based on the processing in the handler. So we have kind of the same base pattern on the conceptual level and on the system instrumentation. When it comes to the measuring part, you might be now familiar with the representation. So again, we have the histogram of the latency values with the worst case latency indicated by the red line and the density estimation in blue. One might now wonder why are there only three transitions and back in the model four? Well, this is due to, we are not interested in the latency between finishing, processing an interrupt and the start of the next interrupt because we are not interested. So yeah. Yeah, so obviously, so it's impossible to capture than like 17 more realistic use cases that we can apply this model to and actually that is something that we would be interested in to hear from you and from the engineers actually building the safety critical systems by if you have interesting cases for us where we could apply these two. Beno San will be in Tokyo until the end of March, I think and he has nothing to do anyway. So I assume, so he'll be happy to assist you there. Actually, it turns out, so we did model a number of scenarios and with a few very simple patterns like this sequence, you can of course model loops very easily, you can model. So to model code paths and control flow instructions you basically need a fork like pattern where you can process different code paths. You need some mechanism to join back code paths again but with these four really simple, really elementary patterns we've seen so far that you can really very satisfactorily provide models that reasonably well represent real world system situations. So to conclude, we've shown you our approach that somewhat mixes the experimental empirical based traditional real-time analysis with more academic approaches. We've shown you how to create such models conceptually. We've discussed that actually the communities provide many, many, many opportunities to perform the required measurements upstream or not and we've shown you some examples of analysis and having said that, let me conclude with the following statement provided to you by the great people at DeepL. So thank you very much for your attention and of course open for questions. There's no question then maybe let me make the comment that of course you all have heard about the verification framework that, so you have a question. So I skipped my self-question, yes. So is there any actual project to apply this model? So the question is, is there any? Any actual project or any? Oh, an actual project. Yes, so what we are looking at is indeed some real-time control system. So that's a job done in conjunction with Siemens technology and so basically the things, so these devices that we showed before, medical devices of interest for that, some communication devices, some devices from the telecommunication industry that are starting to face these issues and basically everything that needs to be certified at a low level, sorry, that needs to be certified at a reasonably low level of assurance so it's not ready for seal for something. That's what we were saying, but these devices that have safety requirements but do not necessarily kill people if things go wrong. Oh, okay, thank you. There are many cases to be upright, yes, all right. I have one question and how to evaluate the different between the model, for example, the state machine in this slide and between this slide, yes, and the real model. I guess it's very difficult to evaluate whether this model is near the real system or not. I guess it's, it will be revised as a result of analyze, is it correct? Yes, exactly, so of course the modeling steps, so that would be, this step here, of course requires expert knowledge to build the model, so as Beno-san mentioned, that's the real model, so that would be the real model. So as Beno-san mentioned, that's the zero step, you need to know what you want to model and that crucially depends on people sufficiently understanding, coming back to my very first slide, the system already, then we can create a model, then we can use this technique to populate the model and infer them because certification authorities are typically not interested in, so we say we understand our system sufficiently well but they'd like to have hard data so the idea is to argue we have this model that is a simplified representation of our system, yet it was in this and that test and again here we're coming to the statistics that we left out from the talk, we were able to show that it responds up to minor differences as the system itself does and that gives you then the assurance that the system works reasonably well but yes, of course it's an iterative process of trying the model again and again. Thank you very much. So, Tim. Okay, I'm going to ask the question that I think you're about to ask yourself. How does this correlate with the recently introduced runtime verification subsystem? That's an excellent question, I didn't expect it to say but so the kernel actually introduced a runtime verification system by Red Hat and Daniel Breston that has been under development in the real-time community for a number of years. Of course we've discussed these ideas together so if we go back to this very first slide the main difference would be in that they're heading for a more or less realistic model of the system so if I understand the paper and the discussions correctly then they aim for something they can basically put into a theory and their models are extremely large so their models for, I don't know what they have in the paper but don't pin me on that it has about 9000 states and I think 21000 transitions between the states so that is something that you cannot really comprehend so they're generating the models but then you can use theory improvers to make statements about that model and that's more in the direction of, I don't know if you're familiar with these L4 research initiatives so they have a formal model where they generate the kernel from that model and say okay everything the model does is also reflected in the kernel that goes a bit in that direction so they're shooting higher than us we in contrast go for these more or less simple models so of course a real model has more states than that but it should still be understandable by humans and especially our intention is that we can explain the model to the certification authorities and they are credible to believe us that this model accurately represents the system whereas with the 29000 state model that's then a totally different approach to certification thank you thanks so I guess it's more or less time out anyway so there are no more questions as I said we are happy to hear use cases from you for the students entertainment and joy and thank you again very much for your kind interest thank you