 I'm Graham. I work for Intel. Intel is a source technology centre. I've been working on catacontainers for a couple of years now. I'm Judy Strawpole. Who's heard of catacontainers before this conference? The guys over here don't count as they work on catta. That's cheating. It's our introduction today. It's not going to be too technical. There's a couple of technical slides. It will be a balance of an overview of what is it, why would you want it, and then a few technical bits and pieces, because you can't really not see those and understand what it is. Then we'll move on to how we can get it, use it and contribute. What is catta? We'll do a little bit of the architecture. How do you integrate it? It has to be used in some sort of framework that you guys are probably using. Where are we heading with it? It's an ongoing project. Then how do you get it? Who uses containers? Probably lots of people in the conference. Who uses virtual machines? Anybody use virtual machines and containers, which is fairly common. You want to wrap, whether you know it or not, a lot of the cloud services are VM based, but quite often you'll want to wrap a VM around that container or that stack of containers. In the world of containers and VMs, generally we pay off between speed and isolation security. Traditionally over here we have our nice containers. Great, they're really quick. We love them. Deployment, simplicity, tooling. On the isolation front, software isolation is great. Nothing's perfect. We have our virtual machines over on the left. Traditionally, much bigger, much slower. You've got boot time of minutes, size of hundreds of megabytes. Cata containers is really a blending of those two. We wanted to give people security and isolation of virtual machines, but we didn't want to give them that overhead, that bloat, that speed hit. What we have is we have the isolation of the full virtual machine with the hardware back to security. We have the tooling of the containers and we have a lot of the speed of the containers. It is a VM. There's always going to be some sort of VM penalty. I think that's probably never going to go away without a few tricks here and there. How do you use Cata containers? All intents and purposes, it looks like RunC. It's an OCI compatible runtime. You could rename it RunC, stick it in your stack and it would work. That's a bit skanky. What we actually do is, in the modern world, OCI layers and OCI compatibility, CRI and Kubernetes. You can have a parallel runtime. It's an OCI compatible. You can have RunC, you can have Cata, and you can run them together. You can run different pods in different containers in different runtimes. You can have some of the containers running in RunC. If you have requirements or you don't have such strict security requirements. Then you can have other ones running inside Cata. Here we have OCI compatibility, Kubernetes. We plug in CIO. We'll have a diagram later. Docker is where we really began with this. Anybody name the dolphin symbol? It took me a while to dig this up. I believe that is the symbol for ZIN. If you're doing an open stack integration, you can use ZIN to drive Docker, which can then drive Cata. My folks are now reviewing the slide set. They're going, that's new. What's with the dolphin? I thought we were at the open stack summit. I really should put ZIN in there. Some history. Say I've been working on this for two and a half years before Cata there was clear containers. In fact, it was two years ago in Berlin at the open source summit. We were doing a clear containers presentation and the guys from Hyper were doing a Hyper RunV presentation. We were like, hold on, we're basically presenting the same thing. We've both been working on this source base in parallel. A year later, we merged that into the open stack foundation as an open upstream project. Then we hit our V1 release. 1.3.1 was released quite recently. We're really pushing to do a 1.4 release. I'd love to say this week, but half the developers are sat in the front row. That may or may not happen given network availability and beer. No drinking and releasing, Eric. Let's explain a bit more about how Cata looks, what it is. Your traditional containers, you have a host, you have a kernel. Your containers run within namespaces on that host. You have some security, you've got seccom, you've got capabilities, your namespace, your C groups, which is great unless somebody breaks out of a container, which has happened at which point you have access to all the containers, particularly if you break into the host kernel. Some of you said you run virtual machines and containers. Quite often, that will be your run-a virtual machine with a container stack running multiple containers. Maybe you have a different virtual machine per customer, maybe you have noisy neighbor issues, maybe you want some isolation between multi-tenancy. That's fine, but you've still got a traditional VM. You now have to manage a VM and then you have to manage a container stack in the VM. Where Cata is different is we have very lightweight virtual machines. We've optimised. They're based on the same virtual machine technology as the virtual machines that will be running every day. I'm going to use the word container and pod interchangeably through the talk. Every container gets its own virtual machine, it gets its own kernel, it gets its own isolation. Effectively, you're running a container in a VM, but all the orchestration level is done out on the actual bare metal host. You plug into your normal Kubernetes deployment, you deploy it with the Kubernetes tools, but it's actually deploying your container in a VM. Your container shouldn't know. In fact, I've got an annoying thing right now. We're just about to merge a pull request adding some setcomp support. I'm not sure how I test that, because I'm sat in the container going, is it on? I can't tell. Brilliant. Annoying for a developer, but that's what it's meant to be like. Your container is now running. It doesn't know it's in a VM. It's just a container. This is really the quick overview of... This is Cata. Cata is running each container or each pod in its own private VM. Talk about how is it architected? How do we actually do this on the slightly more detail level and how do we integrate that into Docker Kubernetes? I may or may not talk through all of these in detail. We are a VM, so these are the basic components we have in our Cata runtime. We do use QMU. Out the box, we use QMU and KVM as the VM stack. We have a runtime like you'd have RunC under Docker. We have a runtime. It's called imaginatively Cata runtime. You can rename it or whatever you want. We have a kernel to sit inside the VM. You can have a different kernel per VM. So if you have specific kernel features you want for one container but not another, you can optimize and refine your kernel on a per container, per pod basis. Then we have a root FS. The container never sees this root FS. It's a VM. It has to boot something. It can't boot a raw container. There's not enough stuff in a container. So we have a very small root FS. We boot into the root FS, which then sets up the container inside the VM. The shim and the proxy are skipped for the moment. The agent, ultimately, we need work to happen in the VM. Somebody has to configure the mount points, the networks, the memory resources, the C groups. We have a thing called the agent, and that's really the work force. The runtime talks to the agent and tells it what does it need doing in the VM. I'm going to let people stare at that for a moment. I'll grab my water. You might notice on here I have skipped the proxy. The proxy is something we're working away from. Now we've moved to using Vsox, which is a fairly recent kernel feature. The proxy is less involved in the picture. So you have your virtual machine with your pod with multiple containers or a single container running different commands. Here we have the agent outside of the container namespace. That's really the controller of the container. The kernel hypervisor. Run times, I don't know if most people realise that run times like run C, they're very transient. When you do Docker run or Kubernetes launch, it launches a runtime which does a load of setup and then it dies. The runtime doesn't hang around. The runtime is a transient thing, but then there's another process that your stack, say Docker, needs to monitor to see if your container is still running, ask it for statistics. Traditionally that would monitor your actual container process. But we can't do that. The default would have started looking at QMU and going, I'll monitor QMU. That's not really what you want to be looking at either. That's not going to give you the information you're after. We have something called the shim. The shim does the IO and it allows the monitoring to say, I'm going to watch the shim and the shim basically translates information out of the VM and says, you ask me a question, I'll tell you what you should actually be asking. So that's really our stack up. We come to Kubernetes. So here we have an OCI compatible runtime. So Kubernetes now has CRI. This part I'll probably explain that a lot of the work we've done for the last couple of years is worked with the Kubernetes 6 and the OCI group. When containers began there was no notion of VM containers. So you'd find things in the spec or in the implementation that were very, it's called a VM unaware or even VM unfriendly, which made our life really difficult. So we'd try to plug the VM in and it just wouldn't fit or there would be presumptions. We found a line of code the other day that had a hardwired call to RunC in the middle of the code base. It's like, that's not going to work for us. We're not RunC. So we spend a lot of time working with a community and the people who are doing the specs to make sure that the future implementations will handle virtual machines. Part of that work was working with CRI. When CRI came out, we made sure that it would be able to talk to Cata containers. And to clarify it, you can have Cata containers running as well as RunC. It's not an either or. It's not a replacement. You drop it inside by side. You can choose, particularly in Kubernetes, you can choose on a per pod basis which runtime do you want to use. So you can say, well, this pod is fairly lightweight. I trust it quite a lot. Let's just run that in RunC. I'd like the speed. But this one, it's an unknown entity. I don't trust it. Definitely run this one in a Cata container. So that's how we fit in. And it's fairly seamless. You get all of the tools from Kubernetes and Docker and your stack that you'd normally have. And you can do it at a global level. You can say make the default runtime Cata. Or you can do it on a per pod container basis in the configuration file. Containers aren't really anything without networking and storage. If you have no network or storage, you've got a processor, but it can't get any data unless it's in the image and it can't store it anywhere. If it's in the image, you can't write to the image without file system backing. So we'll cover networking storage which are fairly key components. They're also one of the places where we spend some time VM networking is subtly different from container networking. VM storage is different from container storage. So we have to do some work in this area for compatibility. So the basic networking... Containers run out and I... My iOS model history is a bit fuzzy, my head, I believe. Containers are running at basically L3. But VMs tend to run at L2. So you can't, by default, out the box plug the container network straight into VM. There's a mismatch. So we run a Mac V tap into it and we can then match between those two levels. Out of the box, 90% of the situations this works. You plug it in, your network appears, magic stuff happens, your networking's in your container. But there are some situations where you want to optimise your networking or you have a different setup. So I know I spoke to some people before. Anybody use DPDK, et cetera, acceleration techniques? So we can plug DPDK directly into the virtual machine. So you don't have to go through a Mac V tap layer on the extra layers in the L2 or L3 translation. So if you're running an accelerated software stack for networking, there's probably a direct route into the virtual machine. So you can plug that straight in and get the full benefit of the speed. Anybody doing SIOV or NIC assignments? So we also support NIC assignment and we can assign a whole raw hardware NIC or we can do a virtualised NIC with SIOV. So you can assign those directly into the virtual machine as well. So we have the acceleration techniques available that you might use as a medium to high end container user. Storage, storage is an interesting one. I'll work down from the top. Anybody used to 9PFS? If you're using virtual machines, you may have come across 9PFS. It's very easy to use network-based file system from Plan 9 OS. It's used a lot in virtual machines because it's very simple to plug in and it pretty much just works. So after the box, our default connection, depending on your graph driver, in Docker they call graph drivers, I'm not sure about the Kubernetes terminology, depending on how you're doing your backing store on your host side. If you're using overlay, then we'll use 9PFS. It mostly works, but it's not quite a full posix UNIX file system. So once in a while you come across a small anomaly and we'll find a container that's doing something particularly strange that doesn't work and we'll track it down, it'll be 9P as an issue. So then we worked on block devices. If you're running a backing store, say a device mapper as your graph driver, then as a VM we can go and find the block device that presents to the host and map it directly into the virtual machine and then mount it. So now, not only you've got a full file system mapped in, so you're now compliant, it's also not going over network connection, so it's actually a lot quicker as well. So that's a preferred optimised method we have for mounting file systems. The same thing we use, I should probably backtrack a little. I expect if I stop now and said questions somebody might ask, what's the overhead costs? So it's a VM, we do have overheads, but overhead you can boot into container in well under a second. So rumsy isn't much quicker than that. And then diskwires or footprint, we consume I think somewhere in the region of 50 megabytes per container. And if you look at a traditional KVM QMU, you stick a nought on the end of those easily. That's much bigger footprint. And one way we've optimised this is we use, I think it would envied in non-volatile memory mapping, so we have a fake non-volatile memory which we direct map, DAX is a direct access mapping into the code. So for our root FS, we basically map that directly into the memory of a VM bypassing all the caches. So it is a one-to-one memory map straight from the host into the virtual machine. That saves us a bunch of time, speed, booting. It saves us a huge amount of footprint and cache footprint. You can use that as a user. If you had a fixed image you wanted to map into your container, we had the ability to do that through the same method. I'm not sure I've seen any customers using that in anger yet, but it's available. And then most people are probably actually doing their storage over networking, over Cef, Gluster or somebody other networking. So you don't generally have your networking on your host. You may have it remote. We have a network connection network. Storage just works as you'd expect. You configure it in your container. It plugs in. Roadmap, it's an active project. We have features we're adding and we have extra support we're doing. We are predominantly about security and isolation. You don't normally run the VM to get an extra feature. You have asked some extra things you can do maybe on ESI or V-type work or you want a different kernel, but generally it's about that hardware back to layer around your container. And then we distinguish between are we doing the security inside the container, inside the VM, or are we doing it host-side? Quite often we have a debate like the sec-comp. We've just added sec-comp support. Do we put the sec-comp around the container inside the VM or do we wrap sec-comp around the QMU on the host? And we tend to make a decision we do one or the other. We don't normally do a security layer both. Quite often here, if I wasn't on a live broadcast, YouTube audience, I'd probably have a picture of Shrek. See if anybody gets the reference, security, security is like an ogre? No, security is like an onion. It's made of layers. In the Shrek references, ogres are like onions. They have layers, not. They make you cry and they smell. So security is hard. It's all about layers. So we could have all the layers inside the VM and all the layers outside the VM, which would be fantastic for security, but each layer comes with a cost. It comes with a footprint cost. It comes with a speed cost. So we tend to make a choice and normally put them at one or the other. So we've just added sec-comp in the container. We're hoping to get that merge this week. And then on the host-side, we're doing more C-group isolations. We're doing more namespace isolation. So namespace and C-group are one of the places we do actually have in the VM. We use the same library RunC does. So it's very much you have a RunC-style container in the VM. So that's namespace and C-group. And then we also wrap some of those around QMU outside. And sometimes that happens for us, such as Docker will place our runtime into a network namespace by default. Rootless QMU. This is one that's bugging Eric at the minute. We're pretty sure we can do that. We don't have that at the moment. We'd like that. Why would you want your QMU VM running as root on your host? We don't have to do that. Let's make that work. And then SE Linux policy. We're going to work on an SE Linux policy for the runtime QMU on the host-side. So to wrap it on the host. Networking. We've just made a change, which is probably going to become the default with VMAC setting. We're now using TC mooring. That gives us some more compatibility benefits, but a very small performance cost. And when we add new features like this, we have a large Tomal-based config file for the runtime. So quite often we'll put a switch in the config. So we may say by default we're using TC mooring, but there's always going to be a switch. So you can say, actually I want to go back to the VMAC or I want to turn this feature on. So we're very much about there's a spectrum of needs for cat containers. Sometimes you want all the security and you don't care about the speed. Sometimes you don't care about the size. Sometimes you care about one or the other and then you can tune it. So we're always optimizing the default path on networking. We're always looking for more performance. Docker swarm DNS is an interesting one. When you do a Docker swarm it sets up its own DNS controller and that causes us some issues on the networking side which we think we've now got a solution for. And then a new one. Enlightened CNI plug-ins. Sometimes you're writing the CNI plug-in and you know you're going to be running on a cat container. You know you've got a VM networking stack and you know there's an optimization you want. But by default there's no way for the CNI to pass information across to Cata. So recently we've had a PR. I think the PR was merged in CI. I think, yeah. So we've got a CI. It's open or it's merged. It might be under discussion where you can now add effectively hints or attributes into your CNI network config and they will make it through to Cata and then Cata can tweak the system appropriately for the best use case. So that's quite a nice add-on feature. And what normally happens with these things if you add a tweak and it lands at a runtime that doesn't support it, you end up with a workable system. I think that's a good idea as optimal. File systems. 9p. It works but it's not our ideal solution. So I'll drop down one. We did look at NFS VSOC which works and might be slightly better than 9p but it's NFS. It always comes with its own caveats. But there's some work ongoing. It's under discussion. I think there's some... It's not published yet or not widely published. Not inside Cata. We've been working on VertioFS. It's basically running a fused file system over a socket. Vertio. That's going to give you a much more compatible file system over a much cleaner, faster transport. So we're quite excited to see that coming. If that works out, which it looks like it will, I imagine we'll flip that to being our default rather than 9pFS. Cache enablements. A lot of my time spent running metrics to see how we're performing and where the hotspots are. It came to realise that we're a VM. So inside a VM, we've got mount points and they're being cached by the block cache. But they're mapped through to the host who's through QMU who's also enabled caching. In some situations, this works really well. You've now got a cache shared across containers on the host but you've got your own local one as well. Other situations, you're just duplicating the data. So you now have to traverse an extra layer. So it's a little bit slower. And then you're holding probably the data twice. So it's a little bit bigger. So I'm going to be adding some tweaks to the config so you can say at what level do you want to do your fault system caching? Do you want to do it at both? And then there's all the complexities of do you want it right through, right back. So that's going to be fun. Other features. A feature we're working on right now is live upgrades. We've got a reasonably aggressive release cycle which we're working on. And we don't want to put the customers in a position where they can't migrate. They have to shut everything down and we reinstall. You need to be able to live upgrade. This is a key feature of your stack and your live system. More device mappings. Docker has this interesting privilege mode. So every now and then probably once a month we get somebody saying so I've run Cata with a Docker priv and it didn't work or not as I expected. So it's a priv mode basically says give me access to the whole of the host and we're trying to run inside of the M. How did you think that was going to work? We do support some features of priv. Sometimes somebody just wants a device pass through. But there are things we just cannot do. So a priv mode may never fully work in Cata. But what we will do is try and add more and more support for the device pass through. More hypervisors. I was asked this early on the Intel stand. By default we run QMU KVM out of box. We really want to support more hypervisors. Somebody will know what hypervisors is going to be really hard to do but we're looking at other hypervisors that we can slide into Cata. There's a whole library of neutrality in Cata called virtual containers and that is designed that you can just slot in more of almost anything. Particularly hypervisors. If anybody wants a favourite hypervisor they want to see supported just come along and have a chat. We'll help you merge that code in. Non-linux workloads quite often every now and then somebody will come up and say, Hi, my container isn't it's not a Linux container. Can you run my workload? The immediate answer is not today. The moderately easy answer is it shouldn't be that hard. The only thing as far as I'm aware that really would need doing for a first step is porting the agent. We need the agent to talk to inside the VM. So we have a Linux agent we run a Linux wrapper around our containers. So if somebody were to port the agent to a different OS that would be a really big step to running different non-linux workloads inside the containers. I guess there's a major upside to Cata against your traditional soft container. We have a separate kernel. We have a new, in theory, canned on a different OS. So you could probably mix and match containers of different OSes in Cata. Size and speed. We're always looking at size and speed. We had this playoff. Everybody wants new features and more features but every feature costs you something costs you a bit of size, costs you speed. So we try and strike a balance but we do try to keep an eye on our growth. We don't want to bloat with featureware. Getting, using, contributing. So it's on GitHub. It's all open source. We're hosted under the OpenStack Foundation. So it's Apache 2 license. We have a number of packages available. We're working with a bunch of the vendors, the distros right now. So we have RPMs, DABS, we've got an APK. We're working closely with the SUSE guys right now. We have a snap which makes it easy to install a number of platforms. We have it installed in our own clear Linux distro. And then you can reach us. Slack is quite popular right now. But IRC, we have a bot that translates between the two as well. Or hit the mailing list. Or we understand out here for the next two days. Pretty much I think the Cata guys are going to be on the stand in the afternoon. So if you want to come and ask some Cata questions, probably come by after lunch. Anybody tried Cata? It's happened this morning to somebody as well. I think I've answered my own questions about footprint performance. So I'm going to say that's a good sign I've told you everything you needed to know. Everybody's happy. Oh! We have a microphone but I don't. I'll try and... I'll repeat your question if they can't hear you. Compared to LXD or LXD, what are the key differences? We don't normally do a lot of comparison with LXD. Generally we're looking at Docker RunC. So I'm not an LXD expert. So any of the Cata guys down the front have any input on LXD? So no. Generally we've targeted the major cloud stacks. So Docker, Kubernetes and CRI. OK. Hi. One question. Hi. Is there a plan to support type 1 hypervisors in the future? Is that something that's possible with Cata containers? Type 1. We've discussed with the hypervisor people. It's quite difficult just because of the level of mapping. Things want to talk to each other but due to the isolation you just can't see outside. So I think it would involve some modifications in the actual hypervisor or maybe some sort of transport agent living at that level. So you can have like an Uber process that everybody talks to and then can talk back. So we'd like to. It's just probably trickier than the hypervisor we have today. OK. One thing I probably didn't say is also multi architecture. You can tell that we have people working on it from I.M.miniskot guys. There're some bigger companies besides Intel were supporting the Cata project. Yes. On the Cata project we have I.M.miniskot people working on it Huawei, HyperV, Intel SUSE are working with us. So, yeah, we have quite a growing community now. We're talking to a bunch of the ISVs and then there's some cloud providers. Mae'n gwybod i'n meddwl, mae'n meddwl i'n meddwl i'r ddechrau, ac mae'r cyfrifiadau yn cael ei wneud. Ond mae'n meddwl i'r cyfrifiadau cyfrifiadau top tier. Mae'r gweithio'r gweithio'r gweithio'r gweithio'r gweithio, mae'n cyfrifiadau am hyn o'r cyfrifiadau top tier ac mae'n meddwl i'r gweithio'r gweithio. Dwi'n cael ei wneud, yn fwy o'r cwestiynau? Dwi'n cael ei wneud.