 Hello, okay Hey folks so we are here we are talking about re-involving in-demo in Kata containers with Mobi and staff engineer from ENCOOP and I'm Corey Snyder a software engineer at Morantis and a maintainer on the Mobi project So today we're gonna be going through re-enabling Kata containers with Mobi Starting with the history What happened to break Kata containers with Mobi? Why things went south? how we fixed it the future and Wrapping up with a nice little demo wish us luck So starting off with the history Beginning of Docker it was just a single Single monolithic binary that did everything everything related to containers From what I hear is before my time, but even the command line was part of the same binary Yeah There was a component called lib container that handled the actual low-level bits and in 2015 the Open container project now known as the open container initiative was formed to make containers more vendor-neutral as part of this effort Docker modified lib container to run independently and Donated it to the open container initiative as run C They then followed up a few months later by releasing container D a daemon for controlling run C And then a few months after that in 2016 Container D and run C were incorporated back into Docker engine version 1.11 So due to that shift in architecture. Oh, excuse me apologies due to that ships in architecture it was Now possible to substitute a different low-level runtime without changing the higher level components and Then first-class support for multiple runtimes was added to Docker at 1.12 Now with that any number of runtimes could be plugged into Docker engine so long as they follow the same command line interface as run C The command line interface has since been formalized as the OCI runtime command line interface which is not to be confused with the OCI runtime specification The runtime command line interface is merely one possible API which an OCI compliant runtime may choose to support then 2017 the MOE project was launched as the upstream open source product project for the Docker engine And as well Cata containers Yeah, with that since Docker has support open the option to support different runtime We started the Cata containers. It was a merge of two already open source project run C Oh, sorry wrong V and the clear containers and We hosted this We merged the two plastics and the Cata containers and hosted it in opening for foundation and within within name of we provide the speed of container and security of machine machines and This is a initial Cata containers architecture The idea is that we pretend the Cata containers is dropping the preference the placement of run C And as you can see there are many components. We have a Cata sheen then that is essentially like Condition and we we also have have color on time is a command line interface as run C and also a proxy to to allow us to talk to the version machine about Vertile Port I think zero zero port as yes, and we need a Yamax to do a to do to do a multiplexing and we have the current agent inside the virtual machine to To actually spawn the containers inside of it. So what happened? So what happened? well all comes down to The runtime interface v2 almost so yeah container D does not directly launch those low level run times like run C It actually launches a lightweight Damon sub-process known that is known as the shim And then it issues RPC commands instructing the shim to drive the runtime The reason for this architecture in this intermediate process Is that it allows containers to survive a sudden restart of container D so yeah, yeah with with Kennedy to With that shim we to interface Kata is able to simplify its own Architecture so then so we combine the cut a sheet different color shims of this import In and also the proxy into a single Container the shim we to process so this gradually simplifies the color architecture and the deployment the simplicity and also performance and the density all the all this come to this new cut architecture and With but in Adobe side the WD to have support for Kennedy long time v2 for quite some time But it only support it is already supported with limit limited fashion while support of Multiple runtime was retained. It was done. So by hard coding the use of run C Sheep so cut a shim is not possible for that in that case and also It instructs the shim to extrude a different runtime binary. Essentially, it used the new API to implement existing behavior, but nothing more so It's a blockage for Kata shim we to and With it and then and then time we released the Kata 2.0 and the production ready solution for win-based containers and Because we can The initial architecture is too too heavy for production. We we decided not to support it and and then time We dropped Docker support as well and Bobius and the moment So, yeah, things went south because Moby lagged behind in supporting other runtime v2 shims and Well, Kata needed that support So with Kata 2.0 that support was lost Now how we fixed it? Well the first step in getting Moby and Kata containers working together again was adding support to Moby to select other Container the runtime v2 shims interestingly Enabling Kata containers was merely a side benefit of the work The main reason why I added that functionality was to allow MCR Marantis container runtime our commercially supported Moby distribution to ship distro packages for Another runtime which actually doesn't need its own C run So that it could be installed side-by-side with RunC and use without any additional configuration now C run is drop and replace run C as I mentioned and so could already be used with Moby without any modification, but Side-by-side installation would have required configure Moby The Moby Demon to register the runtime with it. So that old days Yeah, Moby's configuration system is not really suitable for stateless Deployments, there's no real affordances for configuration like drop-in files or Drop-in directories or anything of that sort There are you've got command line arguments. You have a single JSON config file and that's it some distro package to Like to install a side-by-side runtime would have had to modify the demon config file Which might have been customized by the user With all the hazards that entails Like well because it is JSON. Yeah installation could probably be okay But I shuttered to think of all the ways things could go wrong when uninstalling So I just didn't want to go anywhere near that So my first attempt I did attempt to extend Moby's configuration system To support more stateless drop-in files But it turned to be a huge project where I would have had to basically boil the ocean totally overhaul how config works in Moby and all of this just for additional runtimes So I looked for a another way to do it without having to touch their configuration configuration problem Well container D doesn't need any configuration to use different runtime V2 shims You just tell it the name of the shim and it just automatically works it out It really isn't much magic actually When container D is instructed to start a container using the runtime I'm using the example IO dot container D dot food up bar It just looks for a binary so it takes the last two parts of it prepends container D shim and Searches for that in the in the path environment variable list of directories so I Chose to instead add support for additional runtime V2 shims so that Moby could just leverage the same functionality directly the same stateless configuration magic and Now once we do get around to packaging C run for a side-by-side installation with run C on Miranda's container runtime future future We'll be doing so by pretty much as packaging up a copy of the run C shim that's been tweaked to start C run instead So that was the first first problem solved Now we could select IO dot container D dot cat a V2 as the runtime and Catac containers could be started just a handful of little issues That needed to be sorted out, you know little ledge cases you know like networking Like any networking no one would need that in a container, right? No one would miss that so You want to take this? So yeah, networking didn't work with cat as a QMU based hypervisor because of a little problem with the way cat I implemented OCI runtime hooks For those unfamiliar hooks our programs which are specified in the containers like OCI container configuration file that's passed the runtime and The runtime then goes and executes various the different hooks at different stages in the containers lifecycle and Well as you can imagine since I'm talking about networking that conventionally hooks are used to set up containers networking and In particular Moby relies on the pre-start hook for its network configuration Now notably about this hook is that it predates the OCI runtime spec and version 1.0 of the OCI runtime spec did codify the behavior of the pre-start hook But it also deprecated it in favor of more granular hooks Unfortunately it codified the behavior of the pre-start hook in a way which Tiffered from how RunC implements it and so when Cata containers went and implemented the pre-start hook They implemented it according to the spec perfectly sensibly And that's actually why can you're networking did not work Because Moby having co-evolved with RunC with RunC coming from Initially the Moby sources it expects the pre-start hook to be invoked the same way that RunC does and so the solution was to Modify Cata containers to invoke the pre-start hooks the same way that RunC does at the same point in the life cycle and in the same execution environment and Also by patching the OCI runtime spec to match up with the on the ground reality so that no other runtimes will make the same run into the same incompatibilities And also my little pet theory for why no one else noticed that there was a problem with the runtime spec that I'm guessing that all the other high-level runtimes Let the ones that would interact with container D and RunC Just did not have the historical baggage of Moby and so avoid the pre-start hook and instead use the more granular hooks Which were all implemented the same across the board as well Yeah, one of the Cata's containers other hypervisor choices cloud hypervisor Just was not working with networking because it did not did not support Never gonna face hot plug But as has been explained to me It is was that came from it starting out as a copy of the firecracker hypervisors agent runner whatever component which firecracker, which does not at all support Network hot plug so that was just copied over despite cloud hypervisor the actual hypervisor component supporting it so it was apparently just a simple thing to fix and the last interesting Not the last interesting issue one of the other interesting issues was that yeah If you try to use Docker exec on a Cata container started with Moby It would work until the exec exited and then the container would exit and the Docker exec command would hang So that's not helpful What was what turned out to be happening was that Moby was actually misinterpreting the exit events From container D for the exec process So it saw the it received the exit event for the exec Process, but it was misinterpreting it as the containers in it process exiting and it was acting accordingly Going and then cleaning up the container killing the whole process tree The whole usual thing and then leaving the exact hanging because well the exact hasn't exited yet, right? So yeah, the root cause of that was a flawed assumption in Moby specifically that the process IDs Or the I should say the PIDs And the host system of the containers in it process and it is its exec process were all distinct Now that is true When all the processes are running on the host kernel Well Cata containers run in a virtual machine on a different kernel and so that assumption is not true What happens is Cata goes and reports the same PID for all for all container processes specifically the PID of the hypervisor and So the fix for this was to modify Moby to instead use the Process ID string which is an arbitrary string in the container D API That's assigned to every every task and process started through container D and reported back through the events So in that way there's no ambiguity And allows Moby to then correlate the exit events to exactly which process which started container process it was without any troubles with VMs and There was one more in compatibility which actually we had only discovered a few days ago while we were preparing to Preparing the demo for this talk Yeah All currently released versions of Moby cannot start containers with Cata version 3 2 0 alpha 3 Unless the CPU shares option is explicitly set when starting the container or creating the container rather As it turns out Moby has as far as I can tell always been unconditionally emitting the In the container config This resource limit Linux CPU shares property It's a C groups thing It's all unconditionally been emitting it Even if it was not set by the user and basically the user wanted to like leave it out whatever the system default was Trouble was oh Yeah, also that yeah property is optional in the runtime spec So really the sensible thing to do is just leave it out if the user doesn't want it set Well Moby was Unconditionally outputting it and it was setting it to zero Trouble is the minimum legal value that the kernel accepts is two and Yeah, this is gone unnoticed for ever because run C Sees the value of zero and says okay looks like it's on set and runs along merrily and Yeah earlier versions of the Cata agent running inside the VM were more tolerant of those invalid values Now the Cata agent does distinguish between set none set for that that CPU shares resource configuration and so goes and tries to set the container CPU shares to zero through system D and System D correctly returns an error saying zeros out of range and Then the container fails to start because Cata handles errors like a well-behaved software should now Cata is arguably behaving correctly here and Moby's wrong and I have gotten a fix to Moby that's now been merged into master But it's not yet made its way into any release Now how for the future? What is the future of Cata and Moby? well Really the diversity of run times is beneficial for the whole container ecosystem like as as demonstrated by all those incompatibilities and issues that came up Having a runtime Implementation which does not attempt to be entirely compatible with run C and which is not written in go or written in C It is really helpful for challenging all these implicit assumptions in the implementations And so yeah that CPU shares issue Just mentioned is a great example of this As the Cata v3 agent is written rust it follows different conventions internally Then the go code bases of a Moby and run C So it's a rust code base. And so the option type is pervasive and ergonomic to use Lucky rust developers. I'm jealous. And so it makes it natural To distinguish between unset and unset value in a set to zero value in a rust code base And a go code base zero means unset except when it doesn't except when it does and Or your options other option and go is to Use pointers represent optional values which has its own Unorganomic stuff not to mention all the performance stuff was spilling onto the heap and it was actually Like the reason why that all went up why the bug was happening Moby and why run C was Was like not Was not falling over was because of the conversion between those two go conventions Moby was using the zero means on set convention internally but then on the like the actual serialization to json the The pointer to on as unset convention was used And then on the run C side it un-martials json using the Pointer as unset convention and then converting it to zero is unset convention And so run C couldn't like couldn't tell the difference because both of those get coalesced to Zero and also now more more concretely for the future Well Love to go and support Kubernetes pods with CRI docker-D And now in order to support coupods Cata will need to know which containers need to share resources go so it can group them in the same virtual machine This has a historically been communicated to Cata you through OCI annotations and Moby has historically lacked support for setting OCI annotations on containers I've recently gone and added that support into least the Moby docker engine component In preparation and all that remains is wiring it up to CRI docker-D contributions welcome Although now with the container D1.7 sandbox API Maybe we want to leverage that instead The thing is Moby still targets the version 1.6 can energy API So the sandbox API, which is newer 1.7 We cannot depend on it yet, and there's not been any work No one's even thought about how we would plumb that Can the sandbox API up through Moby's engine API so that CRI docker-D can leverage it So once again contributions welcome, please Lastly, would you like to take this one? Okay. And so also there's a docker command Docker connect then to a docker network connect then he is able to connect a container to a different network And then he's then it is started on and this will require some modification of the containers network so with but When we drop Docker support we also removed we used to have support with a kind of monitor kind of network monitor to to do it but when we drop docker support we set you to remove the legacy code or the legacy code as well so Then end up being missing when we and and the docker support bank and also we are since we are rewriting the Kind of runtime in rust we we may need to need to need a rust version of Container network monitor then to this work is being designed and it is been proposed as a summer code project of kind of containers and we have given in code camp project and University student is actually we working on it. So This is to be expected. We should have it in the next release So, all right now demo Where's a demo page demo time? Alright, this is gonna be fun because I don't have the screens mirrored Okay, first up Do this with three hands? No, oh, okay. Oh, thank you. Okay. So first to show there's nothing up our sleeves It's just there we go. No config file at all just for reference. There's you name dash a and That a container And also just to show there really is nothing nice leave Running with run C so you can see as very much running in a totally different kernel Thank you And we still have a couple minutes. Are there any questions? What's the target for GA on this? You can try it out today and let us know what what bugs you run into It's gonna be very much an ongoing process. All right. Thank you. Thank you