 Okay. Thank you everybody. My name is Shua Khan. I am going to be talking about discovering Linux kernel systems in use. I think Kate mentioned earlier when she was doing the welcome talking about how we are doing this for open in open APS case. Let's start to talk about why is it important understanding workload requirements? Why do we care? The reason we care about this is multiple really sizing and memory and storage for a workload. We want to be able to especially when we are talking about automotive talking about medical we want to be able to size the system memory storage and all of those and then also configure and tune the kernel. And that's important as well and configure and tune the system. So we want to make sure that we have two different things tuning the kernel, enabling the right configuration options in the kernel that work for us. Say for example stack protection. We want to be able to do that if that is important for the workload. And the other things is configuring and tuning the system. Say our workload requires large number of files. We want to make sure and processes maybe it kicks off when we have our system is configured when we have maximum load or when we have when the system is in heavy use. So we want to be able to tune that. And then the lastly this is really important aspect is that we want to be able to identify tests that we care about to avoid regressions as the new releases come out kernel bits and then also be able to write tests. If we think that the existing tests need to be either enhanced new tests to be written. So for two different things avoiding regressions as we keep taking new kernel bits and the platform to and then evaluating safety. So then so we first of all we need to understand what our workload requirements are and resource usages. So we do that system information gathering at three different points. One is before starting the workload. While the workload is running and after stopping the workload. So those are the three different points we take pulse of the system. What do we gather? These are the things that we want to gather. We have supported system calls on the system. We have a few tools that we can use to do that. Usages call, which is part of RDD tool set that can dump all of the system calls for a particular architecture. And then also this the second script checks is calls script is part of the kernel that it can it can tell us the system calls supported system calls on architecture. Then the we also want to check the features supported features. So we can do that using a script that is part of the kernel sources, which is get feature. Get feature.pl is get feed.pl. And then the other thing is. Cisco can tell us information about system parameters, the limits and so on and schedule what kind of schedule reasoning, priorities, et cetera. And then lastly, we also want to understand what modules kernel modules are loaded loaded in the system. So we use LS mod for that. And there are other obviously other things LS USB tells you USB ports available and then LS BCI tells you other things, but for now we'll talk about these things. And while the system. So we talked about what we want to get before we start our workload. That tells us almost like a baseline what is running currently. And then we want to gather. We start our workload and then start gathering system parameters and then loaded module currently loaded modules. It causes some workloads. The one that we played with extensively open APS it does go and load and unload modules. So we want to see the differences in what is the workload. How the workload is changing things. Is it loading new modules or loading some and so on. And then also how it's changing the system parameters limits, schedule or priority, et cetera. And we let the workload run for however long we want to run it. We'll get into a little bit into how we want to trace the system. But right now we are talking about the when do we gather system information. So after we stop the workload, we probably run it for 30 minutes, two hours, doesn't matter. Then we after stopping it, we go and look at system parameters and loaded currently loaded modules. So this tells us a couple of things. One is, is the workload behaving correctly? So a good workload, good well behaving workload would probably set the system back to the state. Is it doing that? And then also we find out what's happening. Is it loading modules and tons of modules and leaving them loaded? Or is it upping the system limits and leaving those left? I mean, not cleaning up after itself. Okay, so with that, I will walk you through a couple of what the output of these things looks like and how you can use. I mentioned a uses call. You can actually ask it to say, tell me all of the timer related system calls that are supported. So it can tell you when I ran this on my system, it tells you these are all things. And then the number associated with it is a system call number. System call numbers can change from architecture to architecture. So it is this information and the number associated with it is important to us. And check system calls.sh what that does is it tells us the ones that are not implemented. Like you can see on my system and I ran it on my system. These are all the ones that are not supported or not implemented. And it's a longer list. And then now we switch into the other tool I mentioned that the feature list tool. So you can list the features. You can also check and see. This I ran it on x86. It shows you what features are supported. Some of the important ones. Like for example, BPF related ones there that is showing. And then stack protector. I am looking for specifically I'm saying here you can see I am looking for grep. I'm looking at the feature list. I'm looking for stack protector. What kind of stack options or does it have enabled? And it shows me have stack protector on. And then what else is available? And then memory leaks. So you can get up for specific features like DMA and so on. So you can do get up for specific things. So, and then lsmart. If you look at the lsmart tool that I mentioned before, you can look for specific modules or have if you don't have that grep in there, you can it'll dump every single module that's currently loaded. And in addition to that, it'll show you what does the what's the usage count of those. Like for example, video buck to common has four users for other modules have are holding reference to it. So until all of those modules are unloaded, you cannot unload video buff to common, for example. Okay, so what we do is we have gathered a different information. We gathered all of the things that we talked about, like for example, the modules, modules configuration option, especially the modules and system parameters. So we look for changes in between two points before loading the workload before starting the workload. And once we start, we are continuously saving this information. We find differences so we can figure out what exactly is the workload footprint lack of better word. Let's let's use a different word. What are the resources the workload. Uses and we can narrow the subsystems that way and say these are the subsystems. We want to care about more when we are running regression tests and when the new releases come out are these, we're watching those. It's like our watch points. Okay, so I will continue tracing. Okay, tracing is not for just debugging tracing is traditionally used for debugging and we are using tracing in this context for a very different purpose. When I say tracing workload, I'm talking about the event tracing feature available in the Linux kernel. You can enable system-wide tracing and say, give me all of the events that you can possibly trace, generate trace for it. And before starting the workload, we enable system-wide tracing and we can disable the system-wide tracing after stopping the workload. So we'll get a snap shot of all of the events that have happened after, while this workload is running. So you will see the reference here. You will see this is one that's, some system used by we have done a couple of blogs about this and then you can take a look at those in details on how we did the gather cases. So once you have that, now we know the system calls that are supported on our system, when we know the system call numbers associated with them and then in the traces we see the system calls that are system calls that are used by the workload, then we can go and look at identify functions and system calls, usages and map them to kernel subsystems. Like for example, if you see get timer call, for example, being called, then we know that the timer subsystem is being used. That's one of the subsystems all workloads use. That's not a great example. But if you see calls to setting priority, for example, or any kind of scheduler changes it might be doing priority levels on different processes and thread priorities, then you will know some of the useful information that's coming up. So far, this is all high level information. We do not know specifically, we do not know what the low level information is, meaning fine grain information. For example, we have done analysis of overall open APS load. We ran it. We enabled system wide tracing in its open APS init in the current tab and then we went in and it has the init script. So we enabled system wide tracing in the init script and then we let it run for a couple of hours and we stopped the workload. And then we took this snapshot of this entire two hours worth of log and we analyzed system calls and modules it's loading and we have seen a couple three modules it does load and then unload. We have determined what exactly does it use and we have done these experiments on a Raspberry Pi running open APS workload. However, what we do not have is we wanted to also understand what happens, say for example, when individual open APS commands are run, for example, open APS might go and say, user says, tell me, give me pump history. What has happened with this insulin pipe pump in the last hour or two or whatever. And then also how much insulin is left in the pump. Those are the ones it will be lost if we have just this large gain system view. We won't be able to isolate when the tracing or resource usage happens just for those individual command runs. And then this and when errors happen also. So for this, we used a different tool, S trace. S trace what S trace does is again, this is also heavily used for debugging so far, but we wanted to use this for a use this for a different purpose. In this cases, we want to understand run S trace run various commands with S trace and say, hey, what are we seeing what kind of system calls this particular command is using and what kind of system resources it's using the files it's opening, for example, and then also pay attention to failure cases as well. So this blog that link I put in here talks about more in detail about all of that. Okay, let's see S trace command, you can trace system calls and signals and you can run it in multiple different modes. One is the full mode where you can get the full complete information and you can run it in summary mode where it'll identify, summarize the system calls and see invocations of those. What we have done for this is we looked at open APS commands and we pick the ones that would make sense to us get insulin pump model. For example, that would be something that that would go out and then open APS will query the pump and then at the time and then battery levels and if you commands that we want, we're interested in and more of that and you can look at this pump history, insulin pump, temporary basal rates and then insulin. This is how we wanted to kind of find out what happens when we run these commands, what system activity is associated with these command runs. And this is a list of various commands in here and let's go to where we have information about these. I have some slides that show pump information. So like I mentioned, s-dash-c will give us summary of the activity and then the full, if you remove this dash-c option, it gives you a full case. When we run pump history command, what we have seen is that the output on this left side, you can see run pump history command. So it prints all these, this is output from that command retrieving pump history since when we run this command and it's telling us that it cannot connect to the pump. It's trying to open dev SPI dev and it cannot open that and it's also saying pump is busy. So this device is in use. So this is one of the busy paths we traced and then what corresponds to it. This is the right side, what I'm showing is the analysis we did. Process startup course that has to happen management and then it goes and opens sys-suffice file associated with this device. Then it opens the pump device and then these are the two it's trying to read. Say these GPIO, active low and the direction, these two are associated with this device and then it'll try to open them. That's when it detected that it cannot open it and it's busy and then it goes and closes the files and exits the program. So that's how we mapped the subsystems with individual command runs. So in addition to that, this is a flow chart, we developed flow chart to it that what happens. So we have anybody that's familiar with software engineering, I mean this is a general software flow, a flow chart that shows the command flow or a process flow. It is showing here and that we try to open it, we couldn't open it, detecting device busy condition and print status and close files. So that's what one of them is. So and if you were to look at, there are two aspects to it. We took the command output that we have gotten and we mapped it and then you can see that from this S trace output, you will see everything else that has happened in that system during that time. This corresponds to this output corresponds, you can see the device in use right here. So it corresponds to that and then it shows us how many percentage of time it spent and how long each one of those system calls took. So and it'll give us activity on what has happened and you probably saw map map too in the previous slide and then we can corresponding on map. So you can get a feel for what is all happening while this running. Any questions at this time before I switch into it? Go to the next slide. Okay, just go ahead and type here, just ask questions or type them in the chat. So this is a runtime context for what we have done. Raspberry Pi system, we ran it on Raspberry Pi and then during that time, we have identified common kernel functionality, these are all the subsystems that were invoked. SPI driver is the one you noticed that when SPI dev is being opened, that's where SPI dev driver comes in and then you have SysFS, the driver files, driver interfaces that are available that it exports and then you have a GPIO and or the GPIO we are talking to insulin pump. And then outside right outside, you'll see continuous glucose monitor and night scout, night scout is the device. So that's, this is the hardware part of it and then how Raspberry Pi Linux running on Raspberry Pi is managing that device. So from looking at this, a few things we can already identify as important in the next kernel subsystems for us. One is that we, SPI dev driver, if there are any changes to the SPI dev driver that we know that we have to go and look at our workload and see our workload is, there are no regressions and that our workload is functioning. And then the GPIO subsystem for example. So those are the kinds of the high riders besides the other things that could have regressions. The other areas are common to process management. I mean, if you can't start a process that you would already know. Some of the process management things, I put them in a different bucket of a common functionality and whereas SPI dev driver and interfaces and then GPIO would be something that we cannot, our workload will not run if there are problems or regressions in that area. So I will leave you with these two blogs in here to read more about this. The bottom one, we uploaded this to, I mean, we upstreamed this. This is part of the kernel documentation now. So you can see that we went and documented and may upstreamed it so that it's available for people in the kernel. So that's pretty much end of my presentation. And if you have any questions, go for the questions. Sure, I would have a question. If you go one slide back, there was the picture of the continuous glucose monitor and also the night scout, is there some interaction between the Raspberry Pi and the night scout or the, I guess also with the glucose monitor, they need some interaction. Is there some wireless connectivity then? So this means you would run depending on the commands, you would see the different subsystems then. Correct. And is there a wired connection then from, it looks like there is a GPIO and then there's a direct connection to the insulin pump. My understanding was it was wireless kind of thing. Oh yeah, I mean, so yes, the rig connection is wireless connection. So you will, that would be, that happens on Broadcom, Ethernet interface, network driver interface. So when I'm showing this, I'm just showing the connections. This is not the real connections, but what I mean is these, these arrows indicate input and output process. So yes, you're right. It's a wireless connection. Okay. And do you consider to add other tracing mechanisms, tools to get a deeper knowledge? Or do you say we're standing here and we would go for ftrace or adding some other tools? From the automotive work group, we recently also discussed if we maybe add system tab, tab track on it to just get also another representation of it because I know it's used in automotive to get boot time optimization, track down bugs and so on. You can do more. ftrace, for example, we can use a functional level tracing with ftrace and say any specific functions we are interested in at the kernel level, if we can identify that and system trap is another good one to be able to use that. So for us, what we are running into is we might not do anything more with open APS unless we have access to a rig. We are running into some issues with having access to a rig. So however, that is for a medical working group, right? And then the other areas that we can start looking at is I'm currently doing analysis for RT Linux, real time Linux one. In that one, I might be able to do more tracing with one of the, during part of that analysis, which we are planning to present. Ilana and myself are doing that work and then we'll be presenting that in Prague. So we are continuing it, not on open APS. So to answer your question completely. Yeah, thanks a lot.