 Okay, so next up we're going to have Taiko Anderson and Mike McFacken are going to be talking to us about CreolexE, FreyolexE, which is that thing. Which actually according to the diagram earlier in one of the talks today I think we should rename this to Something else, OCILxE, we're when we named it we didn't understand what we were doing and we still don't so Anyway, I'm Taiko this is Mike we work at Cisco we do stuff One of the things I gave a talk here last year about using OCI images with without the tar all the traditional stuff, but in Putting other things in here, so I just want to go over this a little bit For starters, I'm going to talk a little bit about what the OCI format looks like so it looks like this roughly And this OCI layout file You can tell that there's a version I Guess this is all not aligned again. We tried to fix this in anyway Cool, so imagine the arrows are one up Somehow the slides don't align, but so basically the index points to a Like a blob and that blob is really just a JSON blob which itself points to a configuration Which then that configuration points to some layers and so the layers are these G-Zip compressed tar balls with the file system stayed on them and whiteouts and all that stuff And so this is sort of what a traditional OCI image looks like if you're using OCI based runtime today. This is what you're using and there's Some drawbacks For example each layer is a tar file, so there's no D-tube So for example if you include the same file in two different layers, there's no intelligence there The file just gets included twice Or if you say modify one bytes of a particular file in a container image That in the file is a gigabyte long and you change one byte the way that the OCI format is designed today You include that file again verbatim Not just the one byte change which is sort of annoying The whiteouts are painful You have it's like a specially named file, but what if you do if you have a file in your image called dot w h Large layers are painful. You can't seek in tar balls Actually, there's I if you know Alexa he wrote a large blog post about why tar is a bad format for this So if you're trying to think about what you actually want in a container image format You want to be able to discover image provenance. So in particular People care about signing images that when I downloaded this did it come from the you know Nginix container maintainer image. So that's important The guy who's building the image signs it at build time and it would be great if that signature you could then verify at runtime Unfortunately, what happens is your container runtime downloads this tar ball G unzips it extracts it puts the bits on the file system And then you throw the tar ball away But the tar ball was that the thing the thing that the maintainer signed So unless you do some additional work with ima or something You lose that You want to be able to update stuff so in particular A kind of a design annoyance is that suppose if you're a container administrator You you're running some giant kubernetes cluster with gazillions of containers And there's like 50 different applications and you're you know, you work for a big organization like sysco And there's 50 different teams and each maintains one of those and then all of a sudden there's a vulnerability in ssl You the operator know, okay, I need to go patch ssl everywhere But the You really have no way to do that We have to go hat in hand to every dev team and say hey, can you please rebuild an image with this new ssl? That's annoying It would be great if you could use less space. It would be great if you could just distribute the ssl patch You could dedupe within an image That would be great So basically for the image prominent solution What would be ideal is since this there's this whole pointer tree if the guy just signs the index Since all this stuff is content addressed You don't need to know any more than this than the signature on the index is valid And if you're not destroying these tar balls when you extract the the stuff then you're not destroying the signature metadata and that's useful So for example in these two little things here you could use squash if s instead of tar As just a straw man And squash if s if you're familiar with it is basically It's a mountable read-only file system It's I think the kernel documentation says this it's intended for general read-only file system used for archival use For example in cases where tar.gz may be used. So it's basically intended to be a drop-in replacement for this thing We're already doing the the nice thing about this is the metadata is stored separately So for tar the metadata is stored in line. So if you want to read a file at the end of the archive There's nothing at the beginning that says there are 15 files in this archive and the 15th file starts here So you have to just keep reading the whole archive, which is annoying So squash of s the metadata is stored separately, which means this whole thing is seekable You can also Do parallel compression if you're familiar with like downloading large like gigabytes and gigabytes of container images Um the single threadedness of gz it becomes very painful Um, so how do we implement something like this? So basically we want to use squash of s for uh, the the actual layer parts um and when we do a instead of Mounting something uh Or extracting it and throwing the tar ball away. We would just mount it directly out of the oci image So there's no nothing's lost with the signature on the index We haven't destroyed that squash of squash of this image so we can verify that the the guy the thing that guy built Is the thing that we're running now Yeah, uh, so uh, also then one of the other goals of this uh was updating So if you think about there's a kind of a spectrum And so on one end of the spectrum when you get an image from your development team It's a docker image say and that is bit for bit with the development team random tested Which is one of the things that people really loved about docker because Um, now then they know exactly this is the thing you want to deploy and on the other end of the spectrum there's uh traditional application packaging which is uh You say I want ssl and you don't specify a version at all and maybe that version of ssl There's something where it doesn't provide a symbol that this thing needs and that's where you can get into weird dependency Hell issues and that's why people like dockers because they never have to solve that problem The ops people don't ever have to solve this dependency hell for the develop What was the version on the developer's machine different than mine? Um, but unfortunately that is um Different than you know what? What what you what you want out of docker Is is really sort of something in the middle here Where I can change out specific versions of things if I know where they are in the in the layer stack But it's not all the way back to dependency hell. Maybe you know, you say I'm using this version of python or whatever And the administrator can go build his own version of python with with the single patch applied and then stick it in and that would work Um, so we that's kind of what we want So for if we're trying to do a strategy for container updates you might have two docker files Um, you know, they're they're two different applications So they both require open ssl and python you clone one of them You clone the other one then you run their install scripts and now you've generated a container And if you if you if you're just straw manning a different way to do this you might do it like this So you have two different, uh layers that you're describing one is the we start from ascentos and we add ssl The other one is we start from ascentos and we add python And then on this side for the application install, you might say start from sentos Grab this ssl grab this python and then install my application And so in through this way if you propagate this this stuff in the oci metadata, um, then you can, uh Tell the operator. Hey, we have um, we have this, uh Different way of annotating things and we know that this particular layer corresponds to this thing so in pictures um If you have a something that looks like this Or a python 3 i'm going to use blue for ssl and green for python um Then the end result is if we do this apply thing We stack the ssl on top then we stack the python on then we put the diff from the install So ultimately this is sort of how the the colors correspond and that's the idea of Then the administrator can go in and say i know this blue layer has a bug I'm going to swap it in for this other layer with this one patch applied that i'm really worried about um, so or i guess, uh Or you can do it with python 2 i guess is the point here, um, thank you, uh, so Uh, so taiko mentioned some some strategies for building images that that allow you to more flexibly apply updates And and be sort of more in the middle of that continuum of dependency hell versus everything static and i have to go back to 50 Uh 50 dev teams and ask them to rebuild their images when i need to fix an exploit And you know that could be weeks or months and in the meantime i've i'm running a cluster that's exploitable Right. Um, so i titled this let's use a container runtime. I actually mean let's use kubernetes. Let's uh Let's use a runtime system that supports this, uh image building and and uh and mounting scheme and so Just in a in a nutshell taiko's Suggesting a layer format that uses squash fs as the layers and so instead of unpacking it you mount it That's basically it you you mount the squash fs in subsequent overlay overlay fs mount points and You uh, you don't have this extract step the extract happens as you read files from the image just just like with uh With mounting any squash fs. Um, so this should Fit into a container runtime really well if i've got if i've got kubernetes running a cryo And uh and a cryo come cryo compatible runtime Such as the one that we've built cryo lxc that uses lxc or runc the sort of more common one um, we should be able to just plug in these images that we've built with squash fs layers and Uh ship them and use the regular oci oci image management tools Uh and do minimal changes Because this is all compliant to the oci spec Basically the oci spec for container For container images Let's you just say the blobs are any type you want and the index points to the blobs and we don't care what happens um In general though, uh everything has been Has been written and tested for the the way that taiko described first where they're uh gzip tar balls Because that's what the original docker format was and that's what everybody uses Um, so yeah, so ideally uh the kubelet will talk to cryo cryo This is the red text here is where we have to do something different. It'll Pull and mount And so this might be the first time we mentioned adam afs right here, but this is basically um if you If you think of a an overlay a series of overlay fs Mounted squash afs file systems if you think of that as a different kind of file system You can call it adam afs and you could and you could label each of those layers as a a molecule and an adam or an adam in a molecule right um But uh taiko just skipped straight to the implementation details of the thing that we have called adam afs Which is fine So and and then the the final point is cryo lxc and or runc if you're using runc We'll just run with that because what cryo does the the oci runtime spec interface is Hand me an unpacked root afs or really hand me a mounted root afs But actually the way I just described that points to some of the some of the bumps along the way um Is that the like I mentioned that the the assumptions are Are uh, there are a lot of assumptions in the tooling because of the because there's only one way that people have really used This and this is with the the compressed tar balls in the layers So for example scopeo is a tool for moving around and unpacking um, sorry just just moving around and inspecting oci images both local and remote And it assumes that uh Uh, let's see Basically, it's it's assuming that the that layers are compressed and layers should be compressed and if you if you use it for um If by default you use it to ship around oci images with uncompressed layers It will helpfully compress them for you, which is not what you want if if you're shipping squash afs layers um, some of the tooling sometimes rejects layer types that aren't common even though the spec says you should accept uh Unknown layer types that's being worked on um This is another example umochi is a another uh image management tool um Just assumes that things need to be compressed. So they try to be helpful by compressing things by default. It's not what you want um, and so yeah, I uh, I sort of did a slip where I said, um The oci runtime spec expects an unpacked root afs, right? Um, so there's there's sort of a decompressed there's an implicit assumption of a decompressed step there Which we don't actually have with this the squash afs layer type. You're just mounting it So there's there's a lot of the architecture of the creo project and and the uh, the libraries that it's factored into containers image and container storage Um, and this is what github.com slash containers slash image and slash etc Um That assumes sort of like a decompressed and applied diffs step You're going to decompress this blob and you're going to apply its diffs to here and then and then at the end You have an extracted thing We don't need to do any of that work and there's uh, there's a bit of friction in adding It's so what we're adding is not just another way of storing images because of the way that that it assumes you have to do this um Yeah, so this is something uh, the container storage library implements implements graph drivers and graph graph drivers terminology that started out in docker because the the layers if you have multiple versions of a of a container or if you have Multiple containers that were built off of an ubuntu image base or or a sento space image There's actually a a directed graph of of layers if you know, there's one base then two different things were built on top of that Etc etc etc So if you think of the layers in terms of that graph the graph drivers the storage the storage driver that That uh models that graph of of image layers and so it's it's definitely uh focusing on the extract step and uh Yeah, so for running your extracting images or for for building, you know When you're building you have a you have a container that was running Um, and you did some commands and then you you know You make a snapshot and then you do some more commands And you have a diff that you need to you need to discover the differences between the two file systems And and make your new layers So that's the sort of the what that's referring to the render diff Mentioned this. Um, so there are actually a lot of different graph drivers all for different ways of storing images you can use Thin provisioning type file systems regular file systems or even just Be really inefficient and do lots of copying And all again all of these primitives are very extract step oriented that the The interface for adding a new graph driver into uh into Crio is You know you you do a create you do a delete You you apply diffs from you know, if you get a new layer you apply the diffs from that layer to get a new thing And and then when you get the final image for the container, that's when when you actually do a mount um everything's in terms of individual layer ids and and it's uh You know, it's expecting that you will be Unpacking and extracting like like tecco mentioned that you're doing a uh sort of a lossy thing where you extract the tar ball That's the thing that you signed But what you have at the end of the process is something that you can't check the signature against Um So yeah, what we want to be able to do is is uh use our new squash fes tools in Our new squash squash fes style layers in crio And the question is sort of how do we patch it and the the approach that we're sort that we're taking now Like what you'd like to do is is there is there a high level uh function that says okay I've got this image url and i'm going to create a container from it and I want to just sort of Cut off there and do my special squash fes thing But there it's it's not quite that easy. There isn't really one place to patch there. There's a there's code that's uh That implements the process of pulling images and saving to storage And and there are some assumptions around that extract step in there And using the the graph driver primitives and then there's a separate code that handles This is called the runtime storage server that that handles Actually Getting to a place where you have a usable root of s and this is doing the extract In and then amount and these are two separate things when in fact we we're proposing something that doesn't need two steps Right. Yeah, I'm saying it it uh it grabs the image and And the library downloads it and then we also extract here and then mount It's two steps. Um, so we have uh The the answer here is that we don't actually know the best way to do this now. We have an approach We have some ideas. This is certainly something that would be we'd be interested in talking to uh to people and that that maintain Creo and the uh the uh Yeah, that it's for the for the best way to do this because You know these assumptions that we we talk I don't want to say that we're that they did it wrong or that what you know They made bad assumptions. These were very valid assumptions until we came along and wanted to do something kind of weird with it and so You know, maybe there's a better way than than uh, where we're hacking it in now um Also just to quickly introduce since uh It's it's not entirely related to the uh to the image storage format discussions from before but We wanted to mention that there is A drop-in replacement for runc that uses lxc It's just another box. Uh, no, I guess maybe not everyone is here for the diagram talk before it was a couple hours ago but um if if you're thinking of uh How so so how containers get run in a kubernetes cluster that uses cryo Is there's a there's a format for a kubelet to talk to cryo And that that uh, that's the cri container runtime interface um And then cryo Translates that does some work translates that to The oci runtime interface spec which uh, which it then passes to a compliant runtime runc is one of them And we've written another one that will instead of using the the libraries that runc uses it uses lxc um And and you may have your own reasons for for wanting to use lxc One reason it might be that you're using it in the cluster for other reasons And you don't want to have to maintain multiple runtimes Okay, and so now it's time for a demo that taiko will yeah motivate and Something like that anyway, so suppose this is the this is the updating thing So suppose that I have an image where there's some vulnerability that's known The one I cooked up is uh when you run the python web server It forks a task that just serves up slash so you can look at whatever you want So anyway, we call that python 3 rooted So suppose you have a this image and you'd really like to fix it And with some other version and if you notice the bottom two layers here stay the same That there's actually a bug in our tooling for generating this that where the second layer is there That we'll see so the bottom two layers stay the same But the top one is fixed and so this is the idea of the admin knows You know how to patch some particular thing and this one's in the python standard library I just hack the file, but um You know you could you could imagine doing anything any any bug ssl or whatever So if I go here I can uh I can start so this is going to start um Oh, oops Have to run all this as a root Because I'm not cool enough to have unprivileged containers There it works now Yeah, boo So I log some this these errors here are really not errors. This is just for um debugging purposes. So for the demo This is actually in the backwards order that I displayed in the slides, but this is how creo actually thinks of it So this is the image the actual like Base root fs. This is that empty layer and here's the image with the bug in it. And so um What I want to do is if I do a wget I don't even have unprivileged wget. So if I do this wget you can see look I My rooted image I can grab atc issue or whatever file I want off the system So now I uh So if I look at this replace script you can see here I'm uh Doing a creative this fix.json. So what's in there? There we go. Okay, so um The fix this fix.json basically points at a fixed image and this fixed image is uh The one where the administrator has massaged the layers He doesn't need if if the administrator knows because of the oci annotations He doesn't need to go back to the developer. He can just say here I I generated a fixed image. So I have this uh script here That does a replace which basically just Takes one swaps out the other one again these error messages now you can see this these two are the same This is the same 7 7. This is the same empty layer here. But now I've generated this fixed one. So if I try and do my Wget again now I get a connection refused because that little bit wasn't there and it wasn't exploited So the idea here is with a little bit of annotation And if we all as a community start thinking a little bit differently about how we design our images Um, we can make life a lot easier for operators um I don't know with that. I think we're done. We'll take questions Towards the start of your uh talk you had python and uh when ssl kind of like mixed together How How would you deal with conflicts if you had two things that Somehow touched the same files or very good question and conflicts are a reality that we've we sort of encounter conflicts in two different places One is like with rpm databases So that like basically updates some binary database that says i've installed python and then the other one updates Some different binary database that says i've installed SSL yeah, so that's a conflict where you know if if you're This is again a little bit more of a philosophical leap here But if you're building everything from source because you're a big enterprise and you You have to ship things There's no real reason you need to ship things as a package Right as an rpm package. You could just build the oci layer and ship that instead and so Philosophically, that's what we're doing the other conflict which we which uh, we've run into that that doesn't help with is the ld cache so when you install something it regenerates the ld cache and that always conflict conflicts and basically There isn't a great solution to that so We right now just whitelist that so a third thing which I've thought about but we actually only do this with libraries right now, but like some packages add users or Add other stuff to text files. You could imagine doing some sort of automated diffing We don't do any of that mostly because Delving into our application code is scary. So I just did the libraries Other questions So when you patched cryo For example recently container restarted the work to have a remote snapshot or interface so Different limitations of distance other could exist that would be possible for cryo that you investigated or I'm not sure I understand your question. The container recently they started to add. It's not merged yet A remote snapshot interface. Okay, so that different implementations of how they will prepare and pack and mount the layers Can be implemented without be compatible for cryo. So it potentially the question is how how How what what is that plug-in interface look like so Initially, I thought when I first set out to do this like cool I'll just implement another Like plug-in for container storage or whatever and then it'll all just work just fine But the problem really is that the primitives are phrased in this way of like everybody sort of assumes that you're going to have to Process the image in some way and as a design goal. We don't want to have that step and so Uh, I hope I mean I hope so but I think at least given the way it's worked out In fact, when we started writing the the patches for this talk I thought it was going to be easier than it was so Yeah, I hope so I haven't seen this particular pull request But ideally that would be awesome because I think that would mean much less work for us to try and land some of the Substream if you look we have some pretty brutal hacks in places in order to make this work Just a step that process the image exists But in our implementation not the upstream container d1 We just skip it because it's like we do a huge mount anyway, so we don't need to process anything We just need to mount directly. Yeah, so you're having you you're doing the same thing that we are So you did you do all this the stuff about diffing and all that do you have to implement all that stuff? Okay, so then maybe yes, this will actually help us Um other questions One more over here. Oh, yeah When you're doing all this stuff with mounting several layers of squash fs Am I right in thinking that um, you can only do that if your container runtime is itself privileged Uh, I can do the mount Yes, uh, because Nobody has chosen to fight the battle to whitelist squash fs. Um, I believe overlay you can do unprivileged so It's we're half done. Um, the other thing is squash fs isn't necessarily the best format for this You can I could imagine some other ones that would be Even better in terms of for example one of the problems that I mentioned this design does not actually solve Which is the the duplication problem So we just are using squash fs here as sort of a straw man to like Play around with this idea But if if somebody came by and said i'm going to write a kernel driver for this other format that's even way better I'd be happy to switch to that so Yes, you're right that it needs to be privileged Other questions. All right well off time. Oh, okay. All right. Thank you