 Good morning, everyone and welcome to the last day of OpenStack Summit, Berlin My name is Samuel. I work for Intel and I'm one of the catacontainer member of the architecture catacontainer architecture committee So I think I can talk about it And I'm gonna give a project update today Catacontainer in a month will be one year old and so it's a it's a fairly new project, but Before catacontainers we had clear containers and hyper and run V And those two merge into something that's called today catacontainers. So catacontainers one year old, but it's based off two projects that are Probably three or four years old. So We created catacontainers back in December 2017 and We did our first stable release in May 2018 so it it only took us a few months to go from project inception to the first stable release because we're basing off this project on Already stable project and and what we did from the from December 2017 to the first stable release is actually merging Those two projects together We didn't have a lot of features in between the two We we really literally spend a lot of time talking and discussing how we could merge the two projects together and make the first catacontainers release So that happened in May 2018 and this is what Back then catacontainers looked like I'm not Going to go through all the components here. It's boring and it's not It's actually it's even more complex in this So, yeah, there's what I want you to highlight here is that there are a lot of components You should compare that to a Regular docker deployment with run C which is just one binary and one binary and something like continuity on top of this Catacontainers this is replacing one binary. So it's complex. There's a lot of moving targets there's a lot of I Mean when you when we do release you need to release all these pieces together You need to make sure that the the the CI tested everything. It's a complex piece of software We did the the 1.0 release in May and since then we've made Three other releases and we're gonna make the the next release which is one for zero in in a week or so But my point with the previous slide was that This is a 1.0. It's stable. It's working but we had back then many gaps still and The update I'm gonna give today is going to try to explain how we field those gaps Partially or completely With the next releases that happen since May 2018 So we had a few gaps. We had the security gaps Security is really the main feature of catacontainers and still We always want to improve security with with catacontainers So we want to add more layers of security in the virtual machine that runs catacontainers outside the virtual machine We want to make sure that security is still the most Interesting features of catacontainers as I said, it's it's a complex piece of software So we are also trying to make it simpler and it's it's kind of related to security and the more complex your software is the More likely it is to be not secure So making it simple means you can audit it you can verify that it's that it's not broken that it's not insecure Performance is also a very important feature for us because We are claiming that we are more secure than regular containers, but We don't want that to happen at the cost of being much less performance. So performance is really important and finally one of the goal of Merging clear containers and run V together into catacontainers was to make sure the wall understand that Clear containers is not an Intel specific project and run V is not an hyper specific project So we want to make sure that we support other architectures. We support other companies We support other silicon vendors into a multi architecture project So those were the main gaps that we had one release 1.0 and as I said, I'm gonna explain what we did to Try to fill those gaps and during the past year So in terms of security As I said, what we're trying to do is Adding layers of security optionally or not I'm sure not a non optionally inside the virtual machine so Inside the virtual machine you have the container workload running there and we want to add more security inside a virtual machine so that if you have a pod with several containers running you can Isolate even further your containers within your pot if you think it's necessary you can also add layers of security last second so you can do system call filtering and Make sure that your container workload is not calling into system called that it's not supposed to call If you think about it, it's if it if it does call system call that it's not supposed to call inside a virtual machine It's actually not gonna harm your your host It's just gonna harm your guests But nothing is gonna happen on your actual infrastructure But some people actually really want to make sure that the container workload inside the virtual machine works exactly the same way that it would work Outside of cat a container in a regular Docker deployment, for example so they want to make sure that the system call is not allowed to call in the host is and it won't call also on the guest on the on the cat a continuous virtual machine, so that translates into like this is the Diagram the very high-level picture of a cat a container Without second so your container workload calls directly into the cold switches between between rings by Sis calling into the kernel and it calls directly there and with second comp you add one additional layer in front of the kernel to make sure that The list of system calls that you allow the workload to call are the one that the workload is gonna call So if you if it's trying to call something that you you don't want it to call Set comp would reject that and and your your workload will get an error so we added that inside the the The virtual machine as one additional security layer Another security feature that we added very recently with one for well one for zero is not released yet But it's gonna show up with one for zero is the support for new you Melisa yesterday at the during the keynote talked a bit about Nimue, which is a Very simplified version of QMU very treatment stream down to a lot less lines of codes Much smaller attack surface So this is part of the security story of cat a container We want to support the most secure iP advisor out there and right now. We believe that Nimue is Providing a much shorter attack surface than QMU is so if you want to use and Something that is supposedly more secure than than QMU as part of your iP advisor You can do that today with the cat a containers and switch to to an email, which is as I said the very Trim down version of QMU So yeah with that's the regular setup you have sec comp you have a QMU below it and You just reduced you you your iP advisor attack surface by using the new you still brings you you seeing running KVM But you're using something that is QMU compatible, but just a lot more smaller a lot smaller, sorry And finally we now also support to varti or RNG, which is a way for your container workload to have a much stronger Random source hotter based from the host So that's also adding more security and that was added with the one three dot zero Which is the latest version of cat a container We had as I said complexity Debugability issues as well and gaps not issues but gaps we thought we there could be a lot of improvement there So we did some stuff We added support for Vsoc. Who knows what Vsoc is here? Yeah one. Yeah, of course, you know So Vsoc if you look at the the vanilla cat a container and You have a proxy and this proxy takes all the runtime command all the sheen commands from the upper layers from Kubernetes and talks Forwards those commands to to the virtual machine through a YAMLX interface, which is a basically a virtualized serial interface And because it's a serial interface we need a proxy So the proxy is taking many gRPC commands from from the runtime from the shims and I'm not going to Go into details into what those are But really the idea is that you're getting a lot of gRPC commands and you want to stuff that into a serial Interface even though it's it's a virtual or serial interface It's still a serial interface So you need to proxy that and you need to multiplex demultiplex all your gRPC commands through a serial interface and what we did With Vsoc Vsoc is a way to do virtualized sockets so instead of having a serial interface to take all your gRPC command Your virtual machine is going to provide a socket-based interface So you can have basically a gRPC server running on your on your virtual machine and you don't need a proxy anymore So that's one way of removing one of the many components of of of kata containers They namely the proxy and it just simplifies everything. You just calls directly You do all your component talk directly to the virtual machine through the Vsoc interface with 1.4.0. We're gonna have a very interesting feature which is distributed tracing Uh, when you have so many components the shim the runtime the agent inside the virtual machine the proxy if you're not using Vsoc Um debugging the whole thing Is actually proved to be quite difficult You get logs from all the components and you need to make sense if you want to get the big picture of what's happening So we added support for open tracing with 1.4.0 all the way from the agent to to the runtime so we will be able to Basically see something like this. It's this is not a kata container Screenshot or anything, but it that's the idea you're gonna have A big picture of what's happening in kata container when you run Uh a a pod when you when you do anything with Kubernetes you can trace that From top to bottom and you can have a real good understanding of what's happening Through all the components Whereas looking at each component individually and and trying to understand how they interact together. So that's a That's a very interesting features Very interesting feature for debugging, but also for performance reasons So we will be able to say when we start a kata containers and it takes 1.1 second We will be able to know where those 1.1 seconds are are actually spent and we will be able to say oh This is this is where we should be working because if we want to reduce this this startup time So it's a it's it's both a Debugging tool, but it's all a performance improvement tool Live upgrade is also An interesting feature. It's it's not going to be make it to 1 4 0. We hope it's going to make it to 1 5 0 And this is a more of an operation feature When you want to do live upgrade of your kata container infrastructure So this is this is something that people actually running kata containers in production are looking for And it's a it's a very interesting feature for them Last but not least if you remember that picture you probably don't but You see that In the virtual machine, we have two containers running We have one container and one container exec, which is basically when you do a docker exec or Keep control exec when you want To execute another Command inside your pod that actually creates a new container and does an execution of the of the new command When you do this you end up with having to run two shims on the host And when you run a pod with five containers, you're going to have five shims on the host And if you want to exec some commands into those five containers, you're going to have five plus five shims So shims adds up and complexity adds up and security is lower as as you get more and more shims And so the idea with the Container shim container the shim v2 is to have one shim per pod So you can have as many as many containers you want inside your pod You're always going to have one single shim that handles everything inside the pod so that's uh We put it in the complexity section, but it's uh to me it's also security improvement feature where you You you're simpler you don't have to monitor that many components and your code is very centralized into into one piece of of binary Uh performance as I said, uh performance is really important to us We we providing security, uh through cata containers and When people realize the value that cata containers provide the next question is what's the overhead? What what is what am I going to pay for for getting this this this added security? and The answer is I mean we we don't have like a You know one answer for this, but you are going to pay for you are going to pay for this added security You're you're going to have a memory overhead. You're going to have your startup Your startup time that is slightly higher than than your regular container Uh It's verbal and for for most of use cases It doesn't really matter But we want to make sure that this stays as is and we want to make sure that we can improve this So we want to make sure that boot time is reduced Is getting as close as possible to the regular container boot time? And and so on one thing that we did for For improving performance is vm templating. So that was added in 1.2.0 It's an interesting feature where when you started a vm a virtual machine or Well cata container is a virtual machine So when we started cata container, we do start a virtual machine From from zero from scratch. So we have a what we call the called vm boot time And this is really starting creating and and launching the virtual machine And what vm templating does is At some point of the boot process We do a vm snapshot a vm template and we use that template to create new virtual machine after that and this reduces the the boot time significantly And overall reduces the cata container startup time so in the big Scheme of things when you start a container with with kubernetes by Between the time you press the button in your kubernetes dashboard and the time that the container workload is actually accessible It's going to cost you at least several seconds in the best case. Most of the time it's 15 20 seconds Cata continues consumes today Around one second out of this and we're trying to improve this So it's one second out of 20. It's not a lot It's almost noise in the the overall kubernetes startup time But it's still something that we want to improve. We don't want to Um, assume that people don't care about this because kubernetes takes that much longer So it's really important for us to make sure that our boot time are always reduced They never increase Out of control and so this is the kind of features that we are pushing into into cata containers uh tc mirroring Traffic control mirroring just um, I'm just going to make an explanation of Why we need tc mirroring and you may not know what tc mirroring is um As I said in the previous session if you were there and I see some of you guys are Here from the previous session When you do networking with cata containers, um, you're basically Having a virtual machine with all the virtual machine networking Assumption talking to the container ecosystem networking assumption So the container ecosystem and the kubernetes ecosystem is using cni Which is a networking interface you Have a lot of different cni implementation for depending on the network you want to build with with kubernetes But all those plugins all those cni plugins which are binaries that that actually create your your software networking All those make an assumption on the kind of interface that they're going to talk to They assume that they're going to talk to A container Specific interface which is typically a virtual ethernet interface a vith They assume that they're going to talk to one end of a vith and the other hand is is basically linked to the host and When you have one end of a vith and you're trying to Make that talk to a virtual machine. It just doesn't work. The virtual machine world is not vith aware We're trying to fix that but in the meantime we need to bridge those to work The virtual machine interface is Typically expecting a tap interface and so on one hand you have a tap interface on one hand And on the other hand we have a vith interface and what we're doing with cata containers is Building a bridge between those two. So we have several implementation. We have a mag vita based implementation ip vlan implementation. We have a We can also do just regular linux bridges And tc mirroring is another way of Bridging those two. So depending on the the kind of networking you want to do depending on the performance you want to you want to reach You may want to select one of those three implementation So tc mirroring is not the default implementation yet We're considering switching to it as the default implementation for bridging networking between the cata containers virtual machine and the container host And and the host sorry But tc mirroring is yet another implementation of this so When you run cata containers you can on on your host you can specify With what kind of networking bridge implementation you want to use And with one for zero you'll be able to select one more interface one more implementation of this bridge I hope it makes sense And here in in performance i'm also mentioning nimu, which is again the The the the stripped down version of qmu that's That cata containers is is able to use Nimu was built with security in mind. We're with nimu. We're trying to drastically reduce the the attack surface of of qmu And by doing so we are simplifying the whole um I provides our device model that cata containers using so With nimu the the cata container workload is going to see a lot less devices It's it's not going to see any of the legacy devices um typically it will with nimu you go from More than 200 devices that the the the container workload would see in the cata container virtual machine with qmu Down to around 40 devices so That simplifies thing that brings A lot smaller attack surface, but it has a very nice side effect as well is that When you boot a virtual machine that has 40 devices instead of 250 it boots faster, so Just by using nimu vanilla nimu without any optimization. We get better boot time Just by simplifying the the whole device model in the in the in the i-provisor So that's a performance improvement as well and um Finally the one that we are actually really looking forward to you to be integrating and this is this is a work in progress I don't even have an eta for this Currently we're using when you're not using a block base device for you host For for the for the place where you container workloads are stored You can either use a block base device back end or an overlay back end When you use when you use an overlay back end, we're going to use What we call nine pfs which is a virtualized implementation of the nine p5 system and it works okay It has two very big very big drawbacks. It's slow And it's not fully posix compliant. It's pretty big drawbacks So when you use nine pfs You're going to end up having very Not very but poor performance in terms of fio. So we strongly advise for Using a a block base device back end when you when you deploy kata containers in in production if you want to if you want to see Much better performance So you you're going to have poor performance and in some cases your workload are not going to be able to run because You if you workload access Do some very specific? five system related operation nine pfs one support So we're looking for a replacement for nine pfs and this is something we're currently working with working on with the red hat And we're working on a fuse based virtualized five system implementation So we're not trying to implement a specific five system Specification, but we're really virtualizing the the five system operation through fuse and vertio. It's it's a relatively complex implementation, but it's From the prototype that we're seeing we're we're having much better performance and a full posix compliance So it's very interesting to us Because it will allow us to say you can deploy kata containers in production on overlay or our device mapper or whatever Back end you want to use for your container workloads and volumes So this is a this is a something that we really much are looking forward to integrating into into kata and finally multi architecture as I said Back then before kata containers when kata containers were either brand v or clay containers I was on the clay container side and every time I was presenting clay containers people were really excited about it But when they realized it was an intel project. They were kind of thinking. Oh, I'm going to be Locking myself into a vendor Intel I like intel, but I understand people don't want to be locked to it so One really important thing we wanted to to add with kata containers is multi architecture support and really show that It's not bound to any any instruction set or any Hardware visualization specific implementation so you can run kata containers on the arm on x86 on power non-power but on p-series on on ppc 64 so this was a really important step for us and With 1.0. We we already had arm 64 support So arm is a contributor to a contributor to kata. So we really have native and and Fully full full support for for arm 64 on kata containers and with With one one we added the support for another architecture ppc 64 and People are chiming in and adding more architecture support to the assets When you want to add if you if you maintain an architecture that is not one of those and you want to add that to kata containers Um, it's basically adding ipadvisor support for your for your architecture. So if your architecture Is supported by qmu You will be very easily Adding support for your architecture into into kata container. So yeah, it's all designed to to be a Welcoming architecture into into kata containers Okay, so that was the technical side of things. Um, I also want to briefly mention the More community part of kata containers Initially while click containers was intel brand v was was hyper and yeah the contribution were pretty much either intel or hyper and I just want to Describe and explain how this has Changed over time. So as I said initially and and want if you look at the 1.0 contribution, it's Probably 99 percent intel or hyper Because we were just the two companies trying to merge everything into into a 1.0 release But after that we're starting to see Other contributors are chiming in Silicon vendors Huawei arm IBM even in video contributed to to kata containers Um Osvs So suzer red hats an oracle i've been contributing to i've been contributing to to kata containers and are are contributing for many reasons But it's very nice to see that Because we went from a click containers or run v to a an open stack pilot project backed project Contributing just started the chiming in because people are more confident when they see that this project is not bound to any specific company It's it's actually backed up by a neutral foundation that can Keep the project afloat even if one of those company actually leaves and decide that the project is not relevant anymore So this really changed there the whole contribution scheme when we move to Click containers around v to kata containers There's an architecture committee That's make very big decisions on kata containers. It's actually not true. We we don't we don't make that many decisions. It's yeah The community actually handles that very nicely. Yes I'm what that's that's the previous architecture committee. Yes So the architecture committee is pretty much useless. That's what i'm trying to say Uh The community is just handling everything by themselves, which is what i'm really happy about. I just Yeah, this is very nice. Yeah, trust me. You don't have to spend all your time, you know, looking at prs and issues and stuff People are just handling that by themselves. So this was the first architecture committee that uh got brought up when 1.0 What was out uh with people from microsoft google intel yway and hyper Uh and as scheduled we add Another round of election with three seats out of the five Being set for reelection And we now have a new architecture committee With eric and john That joined this this architecture committee So what i'm trying to say here is that it's uh the process is just moving on Uh Election went really fine. We had more than 50 people that were eligible to vote So one really important thing for uh for cata containers And I think for most of the pilot project under the open stack foundation is that You don't you don't pay to play you don't You don't bring your money and just have a seat on the board or the architecture committee Or you don't pay and then you can vote for the architecture committee, which is pretty much the same thing The way this works is that you you don't vote if you don't contribute so if you don't Contributing them in a meaningful way. You're not allowed to vote for the architecture committee and architecture committee members are themselves contributors So this is a this is a very refreshing way of Of handling a project as part of a one of those foundations and it's very healthy So by the time we had the second round of elections Um We had 50 people that were eligible to vote. So 50 people that uh in our metrics were Um had contributing in a in a meaningful way to the project and were allowed to vote um, I also want to mention that Uh, how the project actually Influenced the the rest of the container ecosystem Um, we had some interesting discussion in the in the previous session where We're talking about all the components of cata containers and Why they exist and for many of they for many of them they exist because we're trying to stuff A virtual machine into an ecosystem that is absolutely not virtual machine aware That is they live in their own world in the container ecosystem and they make strong perception of what a container is and before cata containers before making that a neutral project with That is backed up by many companies Um We and I spent around Almost two years trying to talk to specification bodies, uh kubernetes six Uh docker and all those people trying to say, you know A container is not necessarily just a process that runs on the host and that Isolated through namespaces. It can be something else. It can be a virtual machine It can be a process running inside a hardware virtualizing clave um Not a lot of people were listening to me But when you go and talk With the cata containers hat things change and you get a lot more influence, which is which is very healthy as well So beside the many presentation that we did at major conferences Thanks to cata containers and thanks to how the project grown and and evolved we are able to Be seen as a reference implementation for a very important work for us in that that took part of Of kubernetes, uh runtime sick, which is defining A runtime class It may sound minor, but for us it's very important to say Uh container runtime is just It's not just docker. It's not just runcy container runtime can be a lot of things and we're going to define A set of classes for runtimes. So you can have runtime for Running your untrusted workloads. You can have runtime for doing whatever you want But you define your class of runtime and you make kubernetes aware of this So this is a big shift for us. It's going from kubernetes being talking directly to docker, which is Which was what was happening back in 1.3 1.4 Now we're at the point where kubernetes is saying We runtime agnostic and we saw runtime agnostic that you can define the class of runtime that you want to use If you want to use a vm base runtime, then you just can go and define that then kubernetes is aware of this So kubernetes itself is saying containers are no longer just processes running inside namespace Container can be whatever the runtime defines So this this change thanks to project like cata containers and also g visor And this is this is a very very healthy change for us. It makes our life a lot easier We also changed the oci specification Oci is the open container initiative if you run a container in your in your cluster It's very very likely that it complies with this specification So all the docker container all the kubernetes container they all are complying with with oci And oci the same way is oci was also making all all those assumptions about what a container is And we we influenced this this specification to the point where now you can specify as part of the oci specification Which kernel you're going to run inside your virtual machine if your container is running inside a virtual machine So the concept of running a container inside a virtual machine is now Part of the oci specification, which is the the foundation for all container runtimes and that also We tried to push this for I think 18 months this this pr was 490 days Without any feedback from the oci folks until we said oh They heard about cata containers and they knew what cata containers and they heard about the the traction behind the project and they realized that they had to do something to cover for Containers that are kind of like cata containers So we work with them and these pr get magically emerging to the specification Once cata containers was born That's all I had. Um, that's all I have. So I don't know if there are any questions comments Was I hope things were clear. Yes So so the question is, uh, how do we compare cata containers with with cubert? cubert is The the simple answer is is completely different and The more detailed answer is that cuberts for those who are not aware of what cubert is cubert is a way for running legacy virtual machine Inside your Kubernetes cluster. So you have a You have a virtual machine that your company has been running for 15 years and no one wants to Break that monolith into microservices and content rise all of this because it's making a lot of money That's the typical use case and so but you still want to Trans transition into transition your infrastructure from whatever you were running to Kubernetes So cubert is a way to take this very old virtual machine that you don't want to touch and make that run As part of your uh, of your Kubernetes cluster um So those are completely different use cases because cubert really tells Kubernetes i'm going to run a virtual machine And this is a completely different resource Uh, that's handled completely differently with uh its own life cycle and and really virtual machine specific stuff We cata containers make sure that Kubernetes does not know that we are a virtual machine We don't want it to know that we are a virtual machine We wanted to believe that we are container because we want to be treated as a container Um, so it's it's very different. We are running cloud native application containers containerized application whereas cubert is really for legacy big monolith virtual machine that you want to run as part of your uh, Kubernetes cluster Yes, no other question So the question is uh, is it possible to do live migration with cata containers and um You said that it was not a good idea, but I I can't ever agree with that So when people hear about virtual machine, uh, they want to do live migration It's it's one of the nice feature of virtual machine one of the nice feature of qm you can do live migration but it's kind of uh Antenomic with uh with containers if you want to do live migration with containers you probably Have not understood what a container is so We don't support live migration you don't have plans to support live migration and most importantly Kubernetes does not support live migration. So we won't support live migration Until Kubernetes supports live migration and I think Kubernetes will never support live migration If you want to migrate to your container workload, you just kill your container and restart it somewhere else That's the whole paradigm of uh containers If you want to of course, it's possible. It's a it's a virtual machine. So technically we could do it In practice, we don't want to support this live migration is a complex stuff Very difficult Yes, yeah live migration with 9pfs And I I will personally block it No, seriously if you yeah, I don't advise for doing live migration with containers But yeah, it's a virtual machine and you can in theory you could do live Yeah Any more questions So the question is uh, can I can can one take A regular container workload without rebuilding it and running it with kata containers and the answer is yes and if If you have to rebuild it, then we have a bug So You should not have to rebuild any of your container workload for running kata containers. It's a really important feature that you can run your docker workload that has been for been running in your in your Docker deployment docker based deployment and this is going to run with kata containers. So We are running internally The top, I don't know 100 or I don't know can't remember which number But we're making sure that the most popular container workload are always running with with kata containers as is without without having to being Rebuild Yeah But it's not related to the to the container itself So if you want to do if you want to do host resources manipulation So if you want to get direct access to do to your host networking, for example This is not going to be supported with with kata containers and it's independent from the container workload itself It's just that you won't be able to reach your host namespace from your from the virtual machine So there are use cases that we don't support, but it's not related to the workload itself We don't have to rebuild you can rebuild the workload if you want to Get access to your host namespace You can rebuild the workload the way you want. It's it's never going to work So that's that's a different story. There are use cases that we we don't want to support And if you if you want to do host namespace manipulation, for example You you're doing something that is completely orthogonal to what kata containers trying to do so you're not doing something secure So you don't want to run that through through kata containers But yeah, that's independent from from the workload itself And if you are running a a container Workload that is not supported by kata containers, you please open an issue in on on kata containers github This should this is a bug on our side so the question is about the the OCI specification and I mentioned that part of the OCI spec now is Letting you specify the kernel that you're going to run inside the inside the virtual machine And the question is isn't that part of the build process the the when you build a container workload you don't You only specify what's your user space is going to look like so what you container workload itself is going to look like And you don't specify that as part of the OCI specification the OCI specification only lets you say I want to run this specific binary as my in it command on on the container And the build process is completely different So you you never specify a kernel or a guest image as part of the build process What you do when the kernel and the and the and the guest image are runtime runtime entities, so this is what you Run when you start your container workload, but it's not part of the container workload itself so you specify as part of the OCI specification you give a Specific path to a kernel Which is going to be your your guest kernel But it's only used when you run your container workload and you can use the same container workload on on completely different kernels So you can use you don't have to rebuild it Some more questions Yeah, that's a that's a good point. It's you don't the kernel and and guest path the the root fs basically Those are optional in the OCI specification if you don't specify any of those Cata containers will try to find a default one on the host So when you install cata container it comes with a default good working Kernel for for cata container so you don't need to specify one But if you want to use an alternative alternative kernel that you have installed on the host You can do that through the OCI specification Anything else? Well, thank you