 Hi everybody, I'd like to speak today about Unicraft or how to cut your cloud computing costs by half. So the private and public cloud have been huge success stories and for good cause. On the positive side, they're very easy to use. They provide great scalability so you can grow your business very easily with them. And they provide a multitude of services. For instance, Amazon claims that they have over 200 fully-featured services. But everything's not perfect and on the downside, the virtual machines that are being run on the cloud tend to be extremely bloated. They're on most for all of the time even though they're not needed most of the time. And all of this is of course bad for the environment. So we need to cut down on the fact that's being run on public clouds and we do this in two ways. We need to first deep load our VMs and we also need to stop idle VMs from running all the time. The problem has two parts. The first one is to do with size. So if you look at your virtual machine that you deploy on the cloud, it's made out of multiple components. Of course your application and this is the bit that you actually care to run. But underneath there's a lot of unused components that are being run and all of those orange parts are wasted resources that are costing you money. The second part of the problem has to do with time. When you deploy your virtual machine it immediately starts running even though it may be idle. At some point in time some client request traffic comes in and it becomes active. Once it's done dealing with those active requests, it goes idle again and then the cycle repeats. The problem is once again all these idle orange parts are costing you money and those are wasted resources. So one way we can deal with this is through specialization and by achieving high efficiency. And in the virtualization space, specialized virtual machines are essentially what we call unikernals and I'll explain a little bit more about what unikernals are in a moment. So the goal we have is first we want these to be easy to build and run. In the past there's been a lot of unikernal projects that were efficient but had to be hand built all the time and would take months of expert work to run a single application. There should be easy or no application porting. We shouldn't require modifications to the actual application. And the third one is of course that they should perform well. So just to give you a little taste of what unikernals can do. They can start and stop very quickly and have good migration times. There's been even some papers showing that these can boot in as little as a few milliseconds, which is not far from what a process takes to actually start. They have very low memory footprint so you can imagine running real world applications such as Nginx with a few megabytes of RAM or less. You can have high density because they would quickly, because they consume little memory, you can fit thousands of such virtual machines on a single x86 server. They can have high performance so you can have as many as 300,000 requests per second on an Nginx server. And of course they have as a side-to-side security features like a small trusted compute base and so forth. So let me just show you what a unikern actually is in one slide. Imagine all you want to do is run a web server. Normally what you do is you take a general purpose OS and distributions such as Linux and Ubuntu. And on it you're going to have third-party libraries and you're only going to do a few of those to run the web server, but you get all of them. There's OS libraries, same deal, not everything is needed by the web server and then out of the kernel you don't need all the functionality of the kernel but you get all of it anyways. In fact what we'd like to do is by some sort of magic be able to take those colored squares and build a custom operating system and software stack for that web server. And that's essentially what a unikernel is. But it takes lots of development time. In the past we had to manually sort of build this unikernel from scratch for each single application. So the question is how do we transparently build efficient and posix compliant unikernels such that we can get the efficiency without having to spend a lot of expert time porting applications. So we have two design principles with Unikraft. The first one is that we wanted a fully modular kernel that would make it really easy to pick and choose components we want and we don't want for particular applications. And the second one is that we wanted to provide high performance specialized APIs to get even better performance. So on the modular kernel, one question would be why not just use Linux and configure it. And to answer this question we put together a dependency graph for the major subcomponents of Linux. And if you see a line between two such components it means there is a dependency and the blue numbers above them is how many dependencies there are. And it becomes pretty clear that it's kind of hard to take out one of these components. It requires a lot of engineering effort. So Unikraft is built from scratch to be fully modular. Could we do it with existing Unikernel projects? The problem is that most of them require significant expert work as I mentioned to build for each application. They're often non posix compliant so you cannot just run standard applications. And the Unikernels themselves the kernel part is still monolithic it's just smaller but it's not necessarily easy to plug and play components in and out. Unikraft is built from scratch with some boron. The second design principle had to do with specialized APIs and here's a little example to show you what I mean. Imagine you have an application and obviously compiles against some sort of libc in this case glibc and under that it needs to do some networking so it uses sockets. The network stack is underneath and eventually you get to some high performance API. In Unikraft you could bypass all of that and go straight to UDP and I'll talk a bit more about that later. So this is the generic Unikraft application you have sorry architecture you have the application on top. And under that you have the libc layer so far we support muscle and new lib. Under that you have what's called the poses compatibility layer it's a bunch of different libraries such that you can run standard applications on Unikraft. And under that you have the core of Unikraft and all the black boxes are the actual specialized APIs. So for instance this red box shows my previous example where if you want to do high performance networking you would plug you could bypass all the posix socket stuff and plug directly into UK NetDev. Same thing if you're trying to do file systems if you know all you want to do is retrieve some files for a very simple web server you could design your own custom file system to go even faster and I'll show an example of that later. You can have different memory allocators in Unikraft so far we support different about five of them because different applications perform differently when using different memory allocators. So let's go to the first goal which is things should be easy to build and run. So nothing to show you that let me just show you a little video we have a tool called craft that wraps around Unikraft. And the easiest thing you can do is say craft up and you choose an application in this case engine X and you give a name for the actual virtual machine that's going to get built. And that single command is going to fetch all the sources that it needs to build including the engine X sources. And once it's done building it's going to actually launch the virtual machine in this case it's on KVM and there you go it's up and running. And just to make sure it is working properly we get the IP address and we're going to download a simple web page there is. Let's try it once again. And a third time and then you can see that it's actually up and running so with a single command you've now built a custom stack for engine X and are now running it. So the second goal is that we should be able to run applications without having to port them or it should be easy. So for this we have two approaches to essentially POSIX compatibility. The first one we call auto porting and in this case we assume we have access to the sources like or engine X or redis or a skewered light. And what we do is we build them with the native application the native build system. But against muscle. And then we link the resulting object files into Unicraft. We have our own ported version of muscle. And then of course muscle requires Cisco so we have a Cisco shim layer. And under that we have our own implementation of Cisco's that go to the different Unicraft kernel libraries. We also have a mode called binary compatibility. This is for the case where we have no access to sources. So instead we take an unmodified L compiled under Linux. And we have a library that is essentially an elf reader and loader. And then we trap the actual Cisco's and redirect them to the Cisco shim layer. So it all boils down to Cisco support. Essentially this little box under here. So regarding Cisco support there was a paper. So far we support about 146 Cisco's but to put this in context there was a paper back in 2016. That was did an analysis of how many Cisco's are actually needed to run useful applications. Linux has in the order 350 Cisco's which is quite a bit. But as it turns out you don't need all of these to run applications. So to take a little point in that study space if you have in the range of 145 as is the case with Unicraft right now. You could run in the order of 50% of the applications. Of course it matters which of the Cisco's you actually support not just any 145. So to get a little bit more precise we did our own study in Unicraft. And we took the top 30 Debian popcorn applications the most popular ones. And we here have a plot where in the x-axis we have all the applications. And then the y-axis we show how much system call support we have in Unicraft for each of the applications. And the takeaway message is that it's mostly green. We mostly support all of them and with a bit more work we would support the remaining ones. But are all the Cisco's that are we did the analysis for really really needed. The reason we asked this question is because from working with Unicraft over time we noticed that sometimes when we didn't have any Cisco support for an application that was using it in the source code the application would still run fine. So we started to dig a little bit more as to why that was. And the intuition is that you have a lot of snippets of code in applications and this is from Redis that is resilient to the lack of an actual Cisco. So in this case the Cisco is get our limit. And if the call is not there if it fails then Redis defaults to a same value. And there's a lot of this in application code. So it turns out that maybe you don't need all the Cisco's that are in the source code to actually properly run the application. So to dig into this a little bit more we started putting together a tool that basically is its Linux based and dynamic. And it starts splitting an application Cisco's into you know which Cisco's are in the source code which Cisco's are in the static binary. But then it doesn't have that dynamic analysis that tells you yes this Cisco is acquired meaning if you don't have it the application will not run. It will tell you which are working if you stop them meaning if you return you know sis you return I don't have the Cisco which of the Cisco's will still make the application run even if you don't have them. And then we have another category we call faked which means we return a success value, but we don't do any work. And we see if that also allows the application to run or not. And the analysis is based on an application's own test suites and benchmarks. So here's the results. This is an analysis of five applications but the results are similar for other application for more applications as well we have Redis engine X men cash T as good light and they just proxy here. And on the y axis we have the number of system calls that are used or needed. And as I mentioned, the legend is the categories I spoke about before. Just to give you a point in the space I'm not going to go over all of this. If you take engine X and if you look at the actual static binary, you think you would need in the order of 110 system calls. And actually the picture is not as bad as that because if you run the full test suite of engine X, it turns out that you need about 50 or so Cisco's that's the pink or violet part. And but there's several Cisco's that you don't even need and many that you can actually stop or fake and the application will run the entire suite of engine X programs correctly. In fact, you don't need to support everything that is in the sort of an application source code. Finally, on the last category, we want to obtain great performance, of course. And one of the questions is this auto porting sacrifice performance. So to answer this question, we compared a manual port that we did of SQLite on Unicraft and compare it to the auto porting version. And to do the measurement, we do we measure the time it takes to do 60,000 insertions. And as a baseline, we take Linux, which it takes about one second. We compared against the manual port on new live, which is also about one second just slightly worse than Linux. And against muscle also the manual port. The most interesting bar though is this one, which is the auto porting bar. And in this case, you can see that we don't take a hit from the reporting. And so this is good news. So now let me show you a few performance graphs. So the first one is image sizes versus other projects. In the X axis, we have Unicraft first and then we have a number of other Unicernel projects, but also Linux and Lupine Lupine being a Unicernel based on Linux. And on the Y axis, we have image size and megabytes for a few different applications, a low world application, but then also engine X rays and SQLite. And you can see here that Unicraft comes here at the bottom where the very basic image just a low world is a couple of 100 kilobytes and the actual other applications are a few megabytes in size. Unicraft boot times. We support different VMMs. So we have Kimo, Kimo with an interface, Kimo micro VM, and then a few more specialized VMMs such as solo five and firecracker from Amazon. And we measure the total boot time. We also split the measurement between the actual VMM and how long the Unicraft guest takes to boot. And you can see here that the worst case scenario with Kimo is about 40 milliseconds where most of the time is spent on the VMM. If we add a network interface, then the time goes up slightly. If we use micro VM, which is a faster version of Kimo, then it goes significantly down to nine milliseconds, although you can still see that this is dominated by the VMM. And then the fastest times you get are with solo five and firecracker, just a few milliseconds. What about memory requirements? Again, we're comparing against other projects and Linux. Same applications as before. And you can see that Unicraft, when running the actual applications only means a few megabytes, at most, in this case, two to seven megabytes, depending on the application. Okay, so some performance numbers to do with engine X throughput now. Again, against the number of different projects against Linux as well. And we are measuring throughput in thousands of requests per second on the Y axis. And you can see Linux as a baseline Linux KVM, we have about 100,000 requests per second. Linux native, so non-virtualized, is about 175,000 and Unicraft is all the way up here with almost 300,000 requests per second. Red is performance, same setup as before. And we measure how long it takes to do gets and sets. And I'm not going to really go over this, but basically you can see that Unicraft is basically at the performance of Linux native. Boot times with different allocators. I mentioned that one of the things you can specialize for in Unicraft is different memory allocators. We support five different ones and there's an API that allows you to add more. And you can see that the actual boot time varies with the actual memory allocator. The standard one called binary body takes about three milliseconds, but we even built a custom boot allocator that is really, really simple. It's just meant to go fast. If you do that, then you can go in less than a millisecond boot in less than a millisecond. Throughput is also affected by memory allocators. So we did a test where we use four different memory allocators for Redis for gets and sets. And you can see that there's a significant variation between tiny alloc, which is a simple memory allocator and meme alloc for Microsoft. So it really does matter which memory allocator you're using. Now, just a little brief thing about specialized APIs. As an example, we targeted the file system area. In this case, we're trying to do a simple web server that does, that serves static files. And for this, we custom design a file system that is based on a hash. It can quickly retrieve files. That file system is called SHFS. And so in this graph, we show how many cycles it takes to retrieve a file when it's there and when it's not there. On the left-hand side is Unicraft. On the right-hand side is Linux. So Linux VFS, you can see it takes in the order of 4,000 cycles when the file is not there and 2,600 when it is. We also disable mitigations because this affects performance. So you get 3,200 for the null file and about 2,000 for when a file is there. As a baseline, we use Unicraft without specialization. It's using our standard VFS core and you already get some gains compared to Linux. But you get the most gains, of course, when we use SHFS. And then you get really low times of about 300 cycles. We also took Unicraft to the cloud. And we did a little test by deploying Unicraft on Amazon EC2. There you can see some instances running. So for this test, we took two large instances at first. And on one, we put a standard Nginx Debian image with Linux. And on the other one, we put the same Nginx version but on Unicraft. And we did a throughput test and we got about two times the throughput. We also took then a, we kept it one large instance. And then we compared against the medium instance. On the large one, we put Linux and on the medium one Unicraft. And we got about the same performance. So this means that it's 50% more efficient. Of course, we wanted to see if this translated to any money gains. So we looked at our bill on the Amazon EC2 console. And you can see here that we basically had 50% savings by using Unicraft in this scenario. So we'd like to go toward seamless integration and deployment. And to do this, we are integrating into a few major frameworks. The first one is Kubernetes. We have this almost working, but it's not public yet. Where we basically can directly plug into Kubernetes. You use the standard Qubectl commands, the standard dashboard. But underneath we're actually creating and running Unicraft Unicranos. For monitoring, we have now support for Prometheus and Unicraft. So you can use your standard Prometheus dashboards to monitor the statistics and the runtime of Unicraft. And we're in the process of integrating Unicraft into virtual studio code so you can do your development of applications and so forth. And the building of Unicranos directly from the... As I probably failed to mention so far, Unicraft is an open source project under the umbrella of the Linux Foundation and the ZEM project. So if you want to go check it out, you can go to www.unicraft.org. And you'll find lots of information, documentation, upcoming events and so forth. Just to show you a little graph of GitHub star growth for Unicraft. We've been doing fairly well in the past few months, but we are always, always happy to receive more support. So please go check us out. If you want to know all the gory details, there was a Eurosis best paper award given to Unicraft in this year. So please go check it out. If you really know, really want to see the analysis of how Unicraft works. And just to conclude, we think high performance POSIX Unicranos are now a reality. And as I mentioned, please go check out our websites. Please go try it out. Give us feedback about what works, what doesn't work. And if you'd like to contribute, we would be more than happy to have your contributions. And on that note, thank you so much for your attention and I'd be happy to take any questions.