 Hello. Good morning everyone. So the four people, let's give it a, maybe a couple more minutes. I record it. Please act yourself as an attendee. Good morning everybody. So you have a nice view, Quinton. My outdoor office twin, the weather is good. Yeah. Yours is even better. My is in Hawaii. Anybody wants to bring up anything before we start? Yeah, I was just going to mention that the Cube Edge due diligence document has been prepared. And I still need to go through it and just make sure that it's complete and accurate. But I'm happy for the members of the CIG to take a look at it in parallel. My plan is to make sure that everything's in order there and then open it up for public comment hopefully next week. That's great. Yeah, I can put the link in the, in the meeting notes. Sure. All right, looks like we don't have that many people today, but we got Dan. So, how are you doing, Dan? Good. Yeah. Nice to meet you. All right. Nice. I know I've met you before Rico, but nice to meet you again. Again. Yeah. And nice to meet Quinton and Eric, I guess. Likewise. You want to get started. This presentation actually gets recorded. Okay. That's fine. Yeah, let me, let me share. Share some slides then. Okay. Can you see it? Yep. Great. Yeah, so, yeah, thank you for, for asking me to talk and for the interest in the work that we've been doing. So, a little bit about myself. Can you hear me okay? Yeah, yeah, I was just going to mention, if you have dual screens, I think you might be sharing the, not the presentation screen. How's that? Yeah. Okay. Okay, great. Yeah, so, yeah, I'm going to talk about some of the, some of the research work that we've been doing at IBM research around, you know, I guess, I guess secure containers is, is how we refer to it lately, but mostly this has to do with isolation between containers and especially kind of how virtualization fits into that picture. Okay, let me see. Okay, so just a little bit about me. So I'm a research staff member at IBM. So it's just outside New York City. I've been there for about 10 years now. And my research is mainly on operating systems, virtualization and security in a cloud setting, which I guess is why, why it fits into this, this topic. And the past few years, I've been working on unicernals, which I'll talk about a little bit, and more recently sort of how to take some of the things that we learned in the unicernal work and apply them to the broader container landscape in some way to help containers become more secure. So, yeah, so I thought I'd just like kind of jump in and, you know, please feel free to just stop me or at any time. So, so okay, yeah, so I know I don't need to tell this audience that containers are great. This is, this is something that, you know, sometimes depending on where where I'm talking to, we need to talk about but I don't know about you but in terms of containers, I think that a lot of the benefits of them are really, really obvious when you're doing a lot of development. So things like their lightweight characteristics, how quickly they can start how you can easily kind of manage the images, especially for me when I when I'm building things like so if I have a project that I want to build, I love using containers to do that to reproduce the environment whenever I want. And, you know, I think this is without a doubt, a huge advantage that containers have brought. And I also know that this at this group that I'm talking to now is perhaps one that also asked this question which is, okay, so if containers are great for a lot of this packaging stuff and a lot of the sort of, you know, some of these build cases and things like that. Are they also a good a good candidate for the unit of execution that we might use for, you know, bigger software deployments. So a lot of these lightweight performance characteristics that they have are really attractive for that right so like the fact that they start up so quickly the fact that they can share, you know, share pages in the host for memory density, all this type of stuff is sounds very attractive in terms of a runtime, you know, a unit of execution. Also, of course, that there's that giant tooling and orchestration ecosystem that that exists, which, you know, without a doubt is super valuable. There's those CNCF landscape pictures with like the bajillion icons everywhere. So, so it seems like those are those are all really great. However, one, one thing has sort of been, I guess, a thorn in the side of containers from a runtime perspective for a while which is the attack surface to the host. So this, this kind of stems from the level of abstraction that the applications within the containers are using to talk to the host they have the full, you know, 350 plus system calls and Linux that they can sort of poke around at and and look for vulnerabilities and other things. And the good news, though, is that we know how to reduce when we have a large attack surface like this, we know some basic approaches that we can use to reduce the attack surface. Namely, if you take that kind of shared functionality that super highly privileged because it's in the kernel, and you reduce the privilege of it somehow, and you unshare it so not everybody has it anymore. Then you effectively get a much thinner interface. And of course, the most kind of familiar way to do this is through virtualization. You basically take. So the way that I'm positioning virtualization in this the way the way that I'm thinking about it is that you're essentially taking kernel functionality that was highly privileged, and you're running it in a less privileged mode in a virtual machine. So a guest kernel, for example, is a less privileged way of implementing the stuff that the, that the, that the host kernel may have implemented. So for example, the network stack is now running in the guest and that's less privileged. It's running in a less privileged mode than it would be with the host, same with the file system same with a lot of things. So this basic idea, this way of thinking about virtualization as a deep privileging and unsharing to reduce the attack surface. I think this has been recognized generally in the context of containers and tried to be tried to be applied to containers to reduce the attack surface that they have. So things like cat a containers, for example, are attempts to take virtual machines, virtualization technology, and do exactly that right reduce the interface, the attack interface, thereby increasing the isolation of the containers. There are also other approaches that do the same, you know, the same abstract thing of the privileging and unsharing. For example, things like devisor would have a user space kernel so in some sense, the century and devisor would be sort of a way of taking some of this functionality that would be in the kernel and implementing in a less privileged way. You could imagine doing something similar with user mode Linux or something like that. And that's another, that's another way to do it. The third way. The third thing I have the third bullet I have here which is another kind of instance of this idea of the privileging and unsharing kernel functionality is the library OS or the unicernal approach. And this is something that we looked at pretty heavily for a while. Because we looked at it because I don't know how familiar you are with unicernals in the next slide I'll talk a little bit about them. But unicernals have this philosophy of being only what you need. They're these virtual machines with only what you need inside of them, which comes along with all this like kind of lightweight characteristics so what we did in the past couple years. Some of the previous work that we had was to try to take these unicernal ideas and apply them directly into containers and the novel containers were our effort to do so. So just a little bit more about unicernals. One way to think about unicernals is that they're just like virtual machines, except instead of having a guest kernel inside. It's just an application linked with only those library OS components that application needs in order to perform whatever its task is. So, again, you can think of them as BMS with the smallest possible thing inside of them. So it's almost like the application runs directly on a virtual hardware abstraction or virtual hardware like abstraction. Typically these things are single process single CPU etc. The first unicernals that sort of came out and started to get a lot of attention where language specific so in particular Mirage OS, which was all based in OCaml. It was basically an OCaml runtime that ran directly on the Zen Dom U, which is the guest in Zen. And everything inside was OCaml, which was great except perhaps there weren't that many OCaml programs there around and maybe not everybody wants to rewrite everything in OCaml at that time. They still don't. But, you know, the ideas in it were very interesting and I think a lot of people sort of jumped on this and started to think about how to support more things in the unicernal case. And so several more legacy oriented unicernals came about. One of them was called Remfront, which is based on NetBSD. So you can think of that as a virtual machine that where you have your application and you stick in NetBSD kernel straight in there instead of the guest, you know, the traditional guest kernel and just link it with the application. And other ones like Hermitucks and OSV even go so far as to claim binary compatibility with Linux. In the case of Hermitucks and OSV the kernels are sort of written from scratch. They're not reusing legacy kernels like Remfront does. So anyway, Unicernals just to sort of like wrap this up a little bit. Unicernals are just think of them as these tiny little, you know, tiny little, you know, super compact virtual machines. Yeah, yeah. I was wondering if I can interrupt with a quick question about the previous slide. So, so does this approach, the unicernal approach is it fundamentally prevent multi process, you know, containers essentially so typical application has, you know, a bunch of related processes running on the same machine. Is that not possible with Unicernals or would they can they share this library, you know, assume these are friendly processes that don't need to be that isolated from each other. But they do need to run, you know, in the same environment. Yeah, that's that's a great question. So I think the answer to that question is really like depending on how strict you are adhering to the unicernal religion. I think if you if you adhere very strictly, then you're you're absolutely right, the running multiple processes and having, you know, having containers that have multiple processes in are a challenge, meaning that it will will not work. So things like run front, for example, does not support fork in a normal way. And, you know, this can lead to all sorts of generality problems. Which actually is kind of the subject. So that's a nice kind of segue until until what what I'm going to talk about next. So, and that is a key. That is a key point. Not just the multi process part but other things as well. So let's keep that question also in the back of in the back of your mind as I go forward through the other stuff because this is this comment that you made is really hit the nail on the head so a lot of the stuff that we're doing later on will will also kind of address that to some sense. Right, so so taking the unicernal taking taking a lot of the kind of advances in the unicernal thing fields so the unicernals were demonstrating really great lightweight performance characteristics and things like that. Especially when you start thinking about function as a service, especially like, you know, the more kind of limited the execution environment is the more something like a unicernal seems like it could be a great fit. So, you know, when you think about more general containers, things like running multiple processes start to, you know, start to be obviously pretty big concerns but there I don't want to, I don't want to say that there's no no place where having a restrictive computer computing model like that makes sense because I think there's there's still maybe some models. Anyway, so, so we took the we took the unicernal stuff and we tried to apply it to containers as best we could. And with the novel container things were based on run front, which was one of the legacy based unicernals. And what we sort of I'm not going to talk too much about the novel containers today but what we sort of learn through this process was that, you know, the virtual the virtual machine like characteristics. The virtualization like characteristics. They didn't really get in the way of these lightweight things we could achieve very lightweight properties, even though these things were these little virtual machines so even though virtual machines had a very kind of heavy weight connotation at the time, less so these days. They, you know, you know, we found that that amount of things but as as you you know already recognized, it was at a high cost right that they were paying a lot for it and that was mostly through generality. So the question that we that we started to ask was, can we run more like normal Linux applications on on these things in some way. And so the content or the thing that I want to talk about today is mainly about that which is, can we take some of these unicernal like philosophies or lessons that we learned from from doing the novel container stuff and apply it directly to normal Linux virtual machines and that's, that's sort of the subject of this talk. So the, the project that we that we worked on is called Lupine Linux so it's a Linux and unicernal clothing and the basic idea here that we were trying to do is we were trying to take a normal Linux VM and make it as unicernal like as possible, so that, you know, any sort of distinction about whether or not this was, you know, a traditional virtual machine or library OS or use space OS or any of those deprivileging ways I said kind of kind of take that out of the equation and say, can we with a normal virtual machine get the properties that the unicernals were getting right so if you look at the virtual machines I mentioned that that there's, you know, at some point there was sort of a big, a lot of kind of, I guess, negative negativity around virtual machines for a while about them being very heavyweight. And this is something that's that sort of started to be challenged so for one the monitor process so that this is this is, I'm just depicting the virtualization stack over here with the host kernel the monitor processors be saying like you and a traditional virtualization setting, and then the big guest. The monitor process has started to change and so maybe most famously firecracker is AWS firecracker is something where they really reduced the kind of functionality in some sense the complexity of it the amount that it emulates when it tries to make this virtual machine abstraction specifically so that it could have more lightweight characteristics. The same types of things have started having to cure me to so that's one piece of the puzzle if you want VMs that seem lightweight, the monitor is is one thing that can become more lightweight and people have indeed start to do that. The second piece of the puzzle though is also the guests so and people have done this also to various degrees so you can think about in the user space of your containers for example people talk about running through containers versus outline containers, the ladder being lighter weight than than the former down at the guest kernel level if you're talking about virtualization. So if you have looked at different kernel configuration options. There's a project called tiny x. There was also if you look at the firecracker, the micro VM configuration this is also a reduction in terms of what the guest kernel is going to do. And of course the unit kernels that we talked about before are almost the extreme form of this thin guest with only what you need. As we, I might not have said all these things that were great about unit kernels but a lot of the things that the unit kernels were boasting with their, their particular design, where things like very small kernel sizes very fast boot times great performance and security benefits from kind of not having a lot of stuff that you don't need in there so it's like almost a, a get rid of potential vulnerabilities that may be there, like thing reduced attack surface. However, as we said, they suffer in particular because of this, this lack of the Linux support. So, just to give you a little bit more detail about some of these things. So herma tux and OSV are two of the unit kernels that claim binary compatibility with Linux. There's a bit of a caveat there with that binary compatibility claim. For example, herma tux supports 97 system calls, which is not the full Linux thing. So if your application is requiring more system calls, you're, you're a little out of luck there. OSV, on the other hand, has a, has a whole, whole list of, of kind of caveats that might happen. So if your application isn't compiled with PIE, if your application uses TLS, if your application is statically linked, if your application does fork or exact, and a number of other things are kind of like little, little caveats that make OSV difficult to use. In general, what happens is that the, the communities behind these unit kernels end up curating applications. And, you know, it's, it's a difficult process to keep all those applications curated. So, what we really set out to do is just to see if we could match a lot of the performance from these unit kernels just using Linux, kind of a quote unquote normal Linux VM to see if we could do this. And so just the spoiler alert is, is that we, we could succeed in doing this. So here are some statistics that I'll go through a little bit more in a few minutes, but you know, we could have a small image size for megabytes boot time of 23 milliseconds, and up to 33% higher throughput than the micro VM, which was from Firecracker, but I'll go into those a little bit more. Okay, so, so basically what I'm planning to talk about is, I'm going to talk about this loop line Linux that we worked on, which is basically taking your application, or basically what it does is it takes this guest, and it applies a couple of what we call like techniques to it. One of them is specialization, which we do through kernel configuration. And the second one is system call overhead elimination, which, which basically means, you know, when in a library OS, instead of system calls you get to use regular calls, which tends to have a performance or has at least the, you know, the belief that this will improve performance dramatically. So we also do that to Linux, through an existing patch, which is called KML, which I'll also talk about. And I'll show how, how we put it all together and, and we, how we got those numbers that I mentioned. Okay, so first specialization. So, if you think about the, the unique kernels and like what, what is sort of the primary way that they, they're getting the primary kind of philosophy that they're organized around it is really specialization so they only include what is needed for the application that is, that is going to run and that's, that's by design. So if you look at, if you look at Linux, it's, it's a very general system. It's not, it's not typically used to specialize for a particular application. However, it is extremely configurable. So if you're familiar with Linux, you'll recognize this menu config screenshot here. There's about 16,000 options in that, in the cake and think maybe more, I think there's more at this point. It's always increasing lots of them are for drivers file systems processor features, but also a lot more stuff. So what we started to do is we started to think, can we use that, that kernel specialization that is already existing in Linux to kind of tailor the kernel for whatever application we want in the same way that a unique kernel would tailor its library for a particular application. So what this picture here is showing is how we broke down all the configuration options and thought about them in terms of making a Linux kernel that would be specialized to a particular application. So we started with an already pretty specialized kernel configuration, which was the micro VM configuration that comes with firecracker. So in terms of how many configuration options so these numbers are that that are shown have to do with whether something is selected as configs. Yes. Micro VM is basically 833 configuration options which have been selected as yes. The vast majority of these other ones are things like drivers. So, cutting down to that to that much isn't should shouldn't be too much of a shock. Inside those 533 options, we then looked inside and we determined which ones we had to have fun which ones were required for any application to be able to run it as a as a VM. And we basically selected 283 that we just had to have to boot art to boot our our things that's about 34% of them. And the other ones we we classified as not not necessarily not necessary for for so they may not be necessary for every application. So, and we split that group. Again, into things that some applications may need, and then other things that didn't really fit the unicernal model. So things like multi processing and hardware management. But to that question that you asked. So, even though we're kind of highlighting these things and taking them out now. Later on, we start we started to have a look at what happens when you put them back in because once you're in a standard Linux environment, you can start, you know, this isn't this is no longer a very black and white choice, it becomes a very slippery slope. So, what we wanted to do here is get something that's as much as as unicernal like as possible, and then see how, how we could sort of, you know, change change that. How, how it would be great over time. And so I'll get to that a little bit later. Yeah, and just just a quick clarification of your numbers here. So, yeah, if I'm correctly so the 16,000 Linux configurations options that you can turn on basically, you took 5% of them. And then you trim that down you took 34% of 5% and then you identified that I guess that's 44% of those are actually kind of fundamentally useful to everything multi processing and hardware management. So the, so you're down to 44% of 34% of 5% of 16,000. So, yeah, yeah. Not quite so so the multi processing and the hardware management we said, these don't match the unicernal thing so if you if you're interested in running unicernals. For whatever reason you don't need those because it doesn't it doesn't match your model. Furthermore, you may not need any of that that 50% 56% or 66% or 5% which is the the those 311 application specific options. And I'll go a little bit more into detail now about what each one of those categories are and why we categorized it that way. The idea is that if you have a unicernal that does require them, you would put them back in. But we think that the multi processing and the hardware management, any unicernal that you have is not going to require those. This is this is our sort of associate. Okay, so, so applications so just to give you a sense of what the application specific options are. A very kind of straightforward example is some kind of configuration options toggle whether or not a system call is is present in the kernel so here are a list of various config options like, you know, config advise the schools. If you implement if you configure your kernel with them you will get an implementation for an advice and F advise 64. Depending on your application you may not need those. Similarly with with a lot of these other system calls. You may not need them some of them you'll notice are ones that you pretty much always mean like food tax. For example. But nevertheless, though that that is one kind of class of applications specific options that we took out. So the other type of ones are you might have an application that does not use proc at all, or system system control. Was there a question. Oh, okay, so, so if they don't use various kernel services and you know you can kind of just not put those in if you want to similarly there's a bunch of library functions that maybe in the in the in the kernel library, as well as debugging the library functions so these are the types of things that we classified as these applications specific options. So every time you want to run a application as a unicolonial, you would select a certain number of these for the application that you were trying to run. The other ones that these are the other two categories that that we mentioned. So there's a bunch of features in the kernel that go away from that. If you think about the unicolonial trust model. That thing inside the guest is kind of all one thing unicolonials are used to having that link together so there's absolutely no protection between the two of them. So there's a bunch of things like see groups name spaces, as you learn a second, you know, kernel page table isolation, some of these are very expensive. But if you're, if your model has them all in the same trust domain. You may not need them. Yeah, so that that, and that's an important point that I'm going to come back to a little bit later. Okay, so the question comes up very quickly, which is how do you get that application specific kernel configuration, because the, you know, you're, how do you know what your applications may use and we don't have a great answer for this this is a this is quite a hard problem so for what we did, you know again we're trying to find the lightest weight most specialized thing that we can to show that it's not, you don't need to throw up the point here is that you don't need to throw out Linux. What we, what we did is basically manual trial and error so we would run our application with this configuration, these 283 options which are called blue pine base, and see what happened. And if it didn't work, then we would put something else in. This is admittedly a terrible terrible time consuming soul destroying process. You know, like, like many things you get a little faster at it the more you do. But this is something that we want to this is sort of an open question about how you can figure out these types of things automatically. So, just real quick, I want to mention the system call overhead elimination. Because I but yeah. Let's go back to the previous slide quickly. So, so is it this this alternative to manual trial and error is it is it not as simple as just like instrumenting the application and figuring out which system calls it triggers. So, yeah, so that's a little bit more complicated than than it, it seems right. So, depending on your application. It may, it may load some library that will do some more system calls some execution paths may not be may not be triggered in whatever test you're doing. There's a couple of things that we had to do we had to figure out what like the, the sort of success criteria what the what the test is what the load on the application should be so that we would have a representative run of that application. Then once we had that we'd have to have to figure out, so we could certainly as trace that we get all the system calls. So these other things that we're not just call based. Sometimes it would break so if we took out proxy for example and application in some execution path decided that needed to look in proc. Then that would be something that we wouldn't, we wouldn't get with the s trace for example. You know what I mean, like it's, it ends up being like a fair because because you don't have a ground truth of everything this application will ever do. Because that doesn't exist. It's difficult to sort of do this in an automated way. But it sort of boils down to kind of test coverage kind of problems as opposed to something more fundamental on that. So I mean, but I guess I guess another thing to say is like, if you're familiar with putting second policies on applications. This is a very similar problem. So, second, you know, second is a sort of, it can be used as a way to specify system calls that can be allowed or denied. So by doing that. Sometimes you deny a system call that some execution path will use which is again as a test test coverage issue. In general, I think what happens is you end up with more permissive more conservative policies. It seems possible that perhaps some kind of static analysis type things could also help here. You know, I think you're absolutely right. It's not, this is not like particularly a new or a different problem. I think it's a problem that comes up a lot, which is just coming up again. You know, like whether it's like test coverage or a second on for whatever it is. Okay, I didn't mean to rattle. I was just curious if there was more to it than that, but I understand thanks for the answer was very comprehensive. Okay. Right, so, so we're going to use this specialization of the configuration and we're going to apply this to our, our Linux kernel that is in our guest and this is something that we think is going to give us a only what you need, quote unquote, unicolonial, that is really just at the end. The second piece that we want to do is we want to eliminate the system call. We want to eliminate the system call overhead in the same way that you would get if you were running as a library offering system. So I had mentioned that there was a patch to Linux, which allows you to run an application in kernel mode. It's called kernel mode Linux. And basically, what it does is allows applications to be running in, you know, in kernel mode, so that the system calls can be replaced with just regular calls. So the patch exists. It's, it's not upstream though. It certainly is not safe unless you're in a, you know, in a kind of single trust environment. And unfortunately, it's only being pulled ported up to a Linux 4.0. But, but it exists. And so it gave us a good way to sort of experiment with this without without need to do a lot of work. Just so you can see how this changes the the system calls. We made some small changes to the libc. This is muscle libc. And you see that we're basically calling instead of system instead of calling this call we call to a particular location which is exposed by the kernel. The applications themselves. When, when Linux starts the application, it does not put the processor into a user mode. This all also it's just just we're totally clear on this this also predated the kernel page table isolation so the user and the kernel, the user and the kernel were already in the same address space so that they're in the same MMM the same address space. So the only thing that really was happening anyway in those kernels was the was the switch of processor mode. And that is what has been removed here. To use this, we had to relink if we had static binaries, or we could dynamically link our version of the libc. This is less invasive than the build modifications for unikernel. And this allows us to see what the, what the benefits of doing so. Very Dan quick interruption again. Yeah, maybe I missed something fundamental but but by putting the application in kernel mode in the CPU surely you just ruined all your isolation stroke security problems within. So this is within the guest. So, so yeah, the, the isolation boundary now is just so we've basically we have totally removed that isolation boundary that was within the guest. So, yeah. Okay, so sort of put this all together. We started off, we got the Linux kernel, the most kernel source, and we have some unmodified app that we want to run with some libraries. The first step that we had was with the specialization. So, the way we do that is we somehow get a application specific loop line configuration. And so this is that process this manual process which, you know, this horrible process which hopefully will be somehow better. Then we also do the system call overhead elimination which basically means patching the kernel and patching the libc. From that, we can get a application specific kernel image that's going to be our guest kernel image. But we still don't have enough to this still isn't quite enough to run our application because normally what you have when you run a virtual machine is you have a kernel image and then you have the application you want to run the program you want to run all of its libraries which are normally in a root file system. But for this, what we do is we leverage the existing container images which are already in some sense they're already root file systems. So they contain both the application and the necessary libraries that we need and they're conveniently all packaged together already for us. There's one more thing though which is how the application is supposed to start. So, sometimes you know your your application might need the network to be initialized or something like that so there are initialization scripts, which we don't necessarily run here. But those can be application specific. So if you if your application does not require the network or the disk or block or whatever, you might not need the initialization script for those. That's the piece that we're going to add to the picture here. So we're going to take the container image which has all that stuff, but then we're also going to put in an application specific startup script. And in this case, we're getting this by hand. Again, it's possible that depending on depending on whether or not you have a way to automatically generate the application specifically by configuration, you could potentially also do the same for for the startup script because this is basically like if you select this option, then you're going to have to set up, you know, just like networking you have to set up the network that type of thing. Anyway, after getting that we can take all those files from the from the container image, the startup script etc, and create a loop line root FS now. So now we have the kernel image and the root file system, which can be run by a normal monitor such as firecracker. I don't know if that firecracker is a super normal monitor, but this could be run by QMU or or firecracker or something. In our case, because we were going for lightweight, we went with firecracker. So, given all that that we saw, then we ran some experiments to see if we could start to match the performance that we were getting from from the, from the from the uniforms. So we used basically just a single machine here with with firecracker. One thing I wanted to point out was that for these experiments we're using a fairly old version of Linux just because we wanted to use the camel patches and evaluate those. So this is Linux 4.0 that we're using for this. So time check. So there's like 12 minutes left and before they are. Okay, great. So, yeah, so I'll, I won't spend too too long on these. So there's a couple of interesting ones that came up. So the first thing we looked at was the configuration diversity. So how, how much this application specific configuration actually manifested for a bunch of applications that we tried. So we took the top 20 popular applications on Docker Hub, and we went through this manual process of finding out what their application specific configuration was. And what we found so this graph here, the x axis is the, the number, it's the support for the x number of top apps, and then the y axis is the number of configuration options. Kind of a union of all the options that was necessary to run all of those top X applications. And so what you see is, you know, when you get to 20, you only need 19 configuration options in addition to loop on base. This is in order to run those those 20 applications, which was fairly low, which surprised us a little bit. And the other thing about that though, is that if you look. Supporting the top 13 applications required the same amount of support as the top 20. We only did 20 because again this is like a, this was a manual process for us. But, you know, this kind of gave us a sense that maybe there's a more general configuration that doesn't need to be specified per application that we could use. There's one which has all those 19 options options in the valuation we call that one loop line general. So the special configurations we have we have loop line base, we have loop, I think what we call loop line is the application specific one, and then loop line general which is the one with this configuration that has all of these 19 configuration options. So those are the same 19 configuration options that cover all of those applications or each one has 19 unique. No, this is this is so this this graph is showing the union of all those options. So it may so all 20 of the top applications can be run with the same configuration which is has 19 options in it. Okay. So this table which I know is quite small over here. This has the actual number that we found for each one of them. So if you see like, you know, some of them require the most it looks like is 13. And some of them require zero. In addition to the loop line based again. So this says pretty much that with 18 options you can run. Yeah, the 20, the 20 applications right. Yeah, yeah, the 19 options. Yeah. So I mean, so in some sense this is a little bit promising because, you know, the application specific that that has all those issues that have to do with like state exploration and you know coverage and all that stuff which are, which are really difficult problems. If there is something general that, you know, that we feel confident supports a lot of things. This would be much easier to for people to sort of get behind but that question of whether or not it's, it's general enough is always going to be there. So I'm just going to fly like in the interest of time we're just going to go through these pretty quick. So, you know, we basically measured the current image size the thing that I want to bring out of this so micro vm in all these graphs I'm going to show micro vm is sort of the baseline that we use which is the the default configuration that comes with firecracker and again that's already being specialized to some degree by the firecracker team. The Rump, OSV and Hermitux, those are the three unique kernels that are that are legacy compatible in some ways, legacy compatible in rotations. And then we keep showing Lupine and Lupine general Lupine in this case being for Hello World and Lupine general being that one with the 19 options. And a theme that you're going to see is that Lupine and Lupine general tend not to be too far apart in most of these metrics and also they tend to be very comparable with with what we see for the unique kernels. So in this case, Hermitux is the smallest OSV and Rump are slightly bigger. I believe this is because OSV and Rump are a little bit more extensive in their support than Hermitux is. So if Hermitux evolves to support more things, it probably will start looking more like OSV and Rump. In any case, Lupine is kind of it's it's it's competitive with them and image size. A similar story happens with boot time. You know here you see that we're actually getting better performance than than than some of them. Although there are various there are various configurations on the unique kernels which really change change their performance. So like OSV for example, we read the literature and it said sub 10 millisecond boot up time. And when we ran it, we were getting, you know, 50 milliseconds or something. And when we looked into it, it had to do with the file system choice. If you change it to a read only file system, you can get sub 10 millisecond. But again, I think the overall story to say here is that the Lupine is kind of in the same ballpark as these unique kernels, even Lupine general. Memory footprint is another one. You know, again, similar story here. System call latency. This is this one gets a little bit more interesting because this is where we start to see the benefits of having that KML patch. So if you look here, we have Lupine, no KML and Lupine that we're comparing against. So this is the advantage that you get by running without that. Processor mode switch for system calls. And again, like comparing to the unique kernels, it's very comparable. It's better in some cases. This is a system call latency microbenchmark. So system call latency microbenchmark is actually the best case for this kernel mode Linux for for this overhead elimination. But that KML benefit, it goes away very quickly. So if you have stuff that that's happening in between your system calls, that tends to amortize the benefit that you're going to get. So this is another interesting one, we're very limited by the way and what what we can use to evaluate these things mostly by what you can run in the unique kernels. But here what we end up with is 33% advantage over micro vm. And having looked a little bit more into this we think this is because a lot of those security options like the kernel page of isolation or or second or things like this that are fairly expensive. If you have a single trust domain, they can be removed, which gives you some some more, some more performance. Takeaways. So specialization of the guest kernel seems very important. We saw big improvements even over micro vm, which is fairly, you know, has some degree of specialization already. However, it does seem like specialization per application may not be super important. So this is the difference between loop line general and loop line. So, you know, whether or not it's worth going and figuring out how to solve all these, you know, how do I get the smallest possible thing for this application may not be worth it. The other thing is that the KML patch, it may not be worth it to try to port that to a new version of Linux because the advantages that you get is relatively small. So for us, it was a bit surprisingly small, especially because when you start looking at micro benchmark macro benchmarks you get very little overhead. And I guess the other the other takeaway here is just that by using Linux a lot of these common problems that you have about not being able to support applications just go away. And to this point that we made before, you know, this is this is a really important point, which is that loop line is still Linux. And so you get like sort of a great graceful degradation, you can have a graceful degradation of these properties. If you decide that you want to have multiple processes in there. Fork is not going to crash your Unicernals in this case, like it would in another other Unicernals. If you decide that you and you know when we started measuring these things, adding in separate processes, especially if they're control processes that don't have high context which rates. It has virtually no overhead that we could measure. When we started looking into running multiple processors on these things, which also is not typically supported in a lot of the Unicernals. These also have fairly low overhead. So again, it becomes a sort of a slippery slope, but you have some some choice there. I'm going to fly through it because I'm out of time, but there's a bunch of benefits. I'm not trying to say that Unicernals are not good for anything. That's not what I'm trying to say here. There are benefits that that do not compare language based as Unicernal benefits, especially when you get to use the language, you get a lot of benefits from that. But yeah. Yeah, I think I got to stop there, unfortunately, but the next bit I wanted to talk about was how to get this into the container ecosystem. But, but yeah, anyway. Yeah, so I think there's a question from Quentin. Is the plan to, you know, longer term to donate some of this work to the CNCF? Yeah, so, so what we're looking at now and so like I'll just kind of give it a little teaser this next piece that we're working on now. So what we want to do is like so we think that there's a big advantage in having a lighter weight guest and some of these micro VM type approaches to containers, so catacentamers or or something like that. What we're trying to look at is this tension between we're trying to look at how how pods fit into the picture, because some of these things that we were talking about here some of the benefits that we're getting have to do with getting rid of that trust domain inside the guest. So, you know, you asked about like, are you throwing away all the protections in the inside the guest, that's okay to throw away if everything is in the same trust domain. Once you have a pod with sidecars that may not be in the same trust domain or agents that may not be in the same trust domain that that gets a little bit more tricky so we're trying to figure out what how, you know, really the question is how how this can apply in the context of pods. And I think if we get a good grasp on that then I think that that we can get this into we can we can get, we can get into a position where this will be a lot more interesting to the community. That's our that's our goal. Yeah, so there's so Clinton said that he needs to drop off but yeah do you want to do a follow up presentation on the container D. And how how we can work together with the container D community so I think we need to find a little bit more time to, you know, to get some of the initial experimentation to see if to see how how this stuff works. But in the future, yes. Sounds great. I think yeah, I think this is really useful and I think it can be it can be used as a replacement for some of the run times like the kind of containers but they're using a gas, you know their own kernel and then maybe they might be able to use a you know stripped down kernel for for these applications and to improve performance. Yeah, yes, yeah, certainly like a sort of a, you know, a easy target would be to say if we if we could say like hey, in your kernel, can you put these things into your configuration. The problem that we're having now is that we're not sure if the agent design, or the fact that you have the agent inside the guest in the way that you do in the way that that the pods are supported. We don't know the impacts on that on a lot of these specialization plans that we have and that's what we're currently looking into. Yeah, yeah, so I think once we have the answers to those questions then we'll be able to much easier go to the Cata community and say, hey, you know, here's a simple way to get get a bunch of lightweight performance. Yeah. And then, yeah, I think they're working on an agent that is lighter way to have an agent in Rust. Yeah, yeah, yeah, I saw that. I think it's really cool. Part of part of what I would like to be able to do is identify. When it when it means what it means to the kernel configuration for example when you say this is lighter weight because really like taking out code. Sometimes it means that you need less things from the kernel sometimes it means that you don't depending on what it is you know so that there can be some kind of if we have more insight into what it is that gives you these performance should you really try to cut out and which parts should you not really care if you haven't. Unfortunately, it seems like the answer to that question is probably going to be something around security. So like things like set comp kernel page table isolation things like this. If you need to have those those security domains within your desk. Those are expensive. Yeah. And the other question is how this work actually can become some sort of project by itself right so I think. Yeah, it's its own project but then it may be kernel specialization will be like its own unique kind of kind of domain right so. And then I'm trying to see how this, this can be something separate from from kind of containers or firecracker right so so that it has some some some of its own. You know repository so the kind of yeah so it can become sort of a separate project. Yeah, so I mean if if people are interested. We do have the things that we use doops. I don't know if you can still see it anywhere it went out of presentation but we just don't have some of the stuff that we did for the paper that we wrote so a lot of these things that I'm talking about like these scripts to run this stuff. A lot of the configurations, they are open source but right now we don't have anything kind of concrete to that that we feel we can contribute to the community yet because it's like a configuration here or there that we can't say for sure whether or not it's not going to break things is doesn't seem like it's quite there yet so we're working on trying to get a little bit more insight into this. Got it. Yeah. That's great. So, Eric has a comment here says, or a question. Run C still uses a low level runtime. Run C. Run C, yeah. Run, run and see, or are you talking about another container one that I think run C. I don't know if run C is I mean I'm guessing Docker does still right isn't it still default for Docker. Run C, yeah. So, but in your case, I guess you're using containers. You're pulling out container images, right? Yeah, yeah. So in this case, in this case, for this work, what we're doing is we're running the firecracker directly we're not we're not doing it through the OCI or anything. We're in the, what we are using the containers for is to pull out the images so that's like a Docker's, what is it, the one we get the tar ball of the image from the Docker thing. Can't remember what the command name is. Got it, yeah. We're actually running them that way in the stuff that that is kind of ongoing where we're looking into the difference, but then we're running it with cat a containers, which has the cat a runtime, as well as something called run queue, which is a kind of very, it run queues to stands for run can be in the same way as run C is run container, which is a kind of a more lightweight form of, of, of virtualize containers, not as fully featured, but we're trying to understand the difference between running a full pod inside, and not running a full pod inside. Yeah. So, I think it's another coming from Eric here is this, there was a project created to help profile says called host this calls needed for sec. It's a red hat. I guess container security sec calm. So I guess the red hat profiles. Yeah, right. I mean, there's a lot of there's been a lot of work where, you know, you have the kind of a learning phase where you run applications and you, you, you as trace them, and you figure out what they're doing. And, you know, over a certain number of, you know, like I said, a learning phase, you can start to develop a a second profile. These things are, I don't know, I guess to me they always feel a little bit a little bit like, you know, ah, like, you know, as it's only as good as your learning phase was as good. You know, and I think because of that, I think that it leads, it leads most projects from from what I can tell it leads most projects to be not as strict about their second as they could be. It's definitely related for sure. And the other thing that just I thought about is maybe this is a good fit for like a work or something within there. So, so a lot of the community members can collaborate and maybe come up with a better solution for isolation and also for for trimming down the limits kernel within that isolation. That's another. Yeah, is there a, is there like a general group that includes, you know, people from all of the, all of the kind of isolation areas like like the cat of people and the G visor people and there isn't right now so that's I mean, yeah, so that's that's just, yeah, that's something that I thought about. Yeah, the other thing that I think is like coming and I don't know when but it seems like it'll be here before we know it is the ones like things like, you know, amd seve and I don't know. I don't know if people are still working on putting these things into SGX and things like that but once they kind of secure enclaves start really taking taking off and a lot of the container isolation will start. Taking on a distinctly different flavor. Yeah. Some of the some of the Intel folks and some of the Yeah, the AMD and the people. Yeah, yeah, so it feels like that's like that it feels like that's a train that's coming and you know I don't I don't know. I don't know, you know, I don't know what the cat of people are thinking about it. But I was wondering if there was any groups that are kind of talking about that as well because it seemed very related as well. Yeah, would you be interested in driving what some of this some of this work on putting it together or someone or you know someone who might be interested in driving some of this work. Um, I don't know I can ask I can ask an IBM I probably don't have the, I probably don't have the bandwidth at the time at this time to drive some community stuff but I'll ask an IBM there might be some people. Yeah, sounds good. Yeah, let's sync up later. Maybe if you can. That'd be great. I mean, because I think there's a There's lots of different work. You know, but they're not quite related to Kubernetes and some of the cloud native community. Yeah, so So if you can put them all together a lot of that work together and then make it, you know, some sort of project like like I was thinking right now. It will be, you know, very useful. Yeah, yeah, that's a great idea. Sounds good. Alright, so I think we're over 10 minutes. And sorry, sorry for going over, you know, I saw it like a lot more. I was like, I keep talking about this stuff. But, but you know, no, is thank you for inviting me and thank you to everybody for Yeah, listening. Let's keep in touch. Okay, yeah, we'll do. Take it easy. Stay safe everyone. Thank you. Thank you. Thank you.