 Welcome, everybody. It's here. Um, my name is Vincent bats. Uh, go by V bats most places get hub or otherwise. Um, I've worked in open source for quite a while. Um, and it's interesting like the fine golden thread that will lead you to whatever projects you hope to work on. If you're new or interested in open source, uh, it's, it's a very common and kind of interesting problem to be like what problems, what problems or teams do I want to work on or like finding your project that you would like to work on and contribute. Um, I remember asking some of the same things and finding that just having fun kind of led a path. So it was, I started having fun with systems in general. And then I started enjoying and seeing the power of like Ruby and other things like that. And then what could I do with Ruby? Uh, I left it off here, but that even led me into contributing to KDE with some of the first projects, which led me into Slack or Linux and being a contributor there. Um, at some point I started having a job that paid me to work on some of those projects and I got involved in Golang, uh, pretty early, uh, more on the consuming end of it. And that led into Docker and containers, which has been the last, uh, 10, 10 years or so. Um, so I'm currently at Azure, um, by way of joining Kenfolk and, um, lots with containers and Linux and otherwise. So first off, kind of the, the, the current situation here and if I'd love to hear people's, uh, input on maybe what you hope for or look for in the nature of this talk, like pushing things to container registries that aren't necessarily containers, but the current situation is that we have cloud native. And so if you're deploying or even reading other people's YAML, if you're touching anything with Kubernetes, um, or you're just working with container runtimes directly, Docker, container D, um, whatever it is in the Kubernetes space, so much YAML, you see lines like this one and, um, obviously with mountains of YAML, everything means something, hopefully, uh, otherwise, why is it there? But it's interesting that what such a small little line like this one could, uh, have typos and stable, um, but have a lot of implications of what's happening behind that one line. So here now you have an image, um, and you're like, well, it's, it's, it's super easy. What's, what's going on behind that image? Uh, and so what's actually hidden behind that is, uh, potentially dereferencing that tag. And some of this is probably stuff you might already know. And if you, if you're not familiar with some of those mechanics, we'll dive a lot deeper into it. But right now, if you pulled that engine X stable, it dereferences and ideally so that you have more of a deterministic deployment, it's actually going to a very specific image, a very specific pointer to that image right now. So if you wanted to deploy the same thing in a couple of different locations, you should actually be pointing at the exact, uh, image digest also in that, you know, reference is going to be the API for how to pull that image down. Um, people hear about Docker hub. It's usually even implied if you say Docker run engine X, it's already implied that it's coming from Docker IO. Um, but there's a whole registry API behind that. You might be using, you know, quay, you might be using GCR, you might be using Azure's ACR. Um, there's, there's a whole API defined for that. Um, the actual packaged image, like what is it actually doing? Um, how many of you here have ever put your hands on a guitar archive? Okay. It's super exciting. It's, it's technology from like 40 years ago. It was literally meant to put on the tapes, uh, and it's still at the crux of everything that's being pushed around the internet. We've tried to remove it. I promise you we've tried to remove it and think about better things. Sometimes it happens. Sometimes it sticks and then we find ourselves using it again. Um, but it's surprisingly at the center of, uh, kind of the hotspot. I mean, it's built into the Kubernetes cluster. You, you, if you're just doing a one-off deployment, as soon as you're moving to some cloud, you're one of the first things you're trying to figure out is, you know, what's the container registry? How do I trust who's pushing there? How do we audit? How do we fetch? You know, and then you start down all the different processes. So it's kind of at the crux of this whole cloud native movement. A bit of background on it. Um, the container ecosystem as we know it was very fast moving. Um, and there was a lot of things involved in the early days that rightfully kind of wanted to keep a tight grip on how things were evolving, um, specifically in the Docker company. And there was lots, you know, right, rightfully, there were a lot of companies and other just contributors that wanted to be involved and figuring out how to have an open governance conversation around that. It took time. Um, and so that part took time while the containers and the APIs and the format were quickly evolving. Um, went through several different pretty drastic iterations. Um, if anybody who had, you know, touched containers in the early days, does anybody remember the days before the checksum actually being a real checksum? Anybody? No. In the early days when you would actually Docker build something, it just did eat a little bit off of dev you random and checksum that. And that was the ID of the image completely not, you know, checksum cat content addressable at all. So like big iterations over time to where you could actually trust and have predictability about it. Um, so the open governance. And so as we moved into open governance, some of those formats started going into a project that is called open containers initiative. It's an LF project, um, has a governance board or, you know, to to be maintainers. And otherwise at first it was just specs. Um, and then it kind of teased out into being like, first we're going to talk about the runtime like run C, the thing underneath container D and Docker, you know, the low light level of execution. And then it became for image format. And we got various things packed into image format. And the last thing to get defined in this open governance was the API. So for a long time that was still held by Docker and basically people had implied standards like de facto, but it was not really an open governance conversation. So if you ever find yourself reading or trying to make sense of that order, understand that some things might be in the image spec that now should probably be in distribution spec, but it's just the evolution of it. And there's a lot of overlap between the maintainers and the conversations. Some of them are here this weekend. Um, and particularly, I'll touch on this the OCI media type. As we got into this open governance conversations, the de facto standards of being like Docker run, you know, Docker images and otherwise were basically a drop in translation to these new open governance formats and media types or mime types. And so as we ironed out some of the changes that were going to happen there, still today, I mean, this is the first V one of this open governance OCI format was in 2017. And still today, you'll find kind of a hodgepodge of media types, mime types, whether they're still Docker, Docker manifest lists and stuff like that, or if they're OCI image index, and they're pretty interchangeable. But I'm just to emphasize, you know, Docker came out in 2013. Kubernetes came out in 2014. Some of these changes in like open governance format happened in 2017. To have a V one of like cool, we all agree on some of these standards. And here we are six years later. And it's still a hodgepodge of some of those media types. It wasn't like a quick switch. Most of the time, that's not a problem. It just kind of shows you that things move fast and then some things move painfully slow. As people either depend on them or otherwise. So container images are basically Merkle trees. Has anybody played with Merkle trees before? Have people heard of Merkle trees before? A little bit shaking. Yeah, I know this is this might be something of a beginner track, but it's kind of fun to at least see and hear about some of these concepts. Generally speaking, in a Merkle tree, you'll see here the L one L two L three L four, those are the chunks of different data involved. And you see the arrows pointing up because it's actually kind of cascading in the way that these chunks together will actually contribute to a you'll make a check some of them. It'll go up to the next node next node and eventually you'll have a single digest at the top that is comprised of all those different components. So you could you could take and say actually I'm only interested in L one. Well, you could have the hash of that and figure it out in the content addressable store. But if you say alright, I've already got that one chunk and I just need to know like what's the top. So we're like looking at the same thing. It bubbles up. If anything in any of those L one L two L three L four change the top checks on changes fundamentally a different object. And it's not Angela Merkel. I just think it's funny. And so in the container, these are fundamentally some of the different components. I know this is hard to read. But you have the image index, which is usually when you say engine x latest, it's going to point point to some image index. That thing might have different architectures arm whatever AMD 64, a whole subset subset of them. They will be referencing one or more image manifests. Those image manifests will be where it says like here's this config for how it's going to launch, you know, entry point, environment variables, all that kind of stuff, and a list of tar layers that we're going to assemble into a file system. That's all the stuff that's happening behind the scenes. So you see, it's going to have a one to one of some kind of a config and a one or more set of layers. The part that's just a pile of text here is a number of the different mime type media types that are interchangeable at this point. So each one of those objects in the stack and the order that they're in is all part of the Merkle tree. So when you see, you know, deep that dereferenced digest at the beginning is actually the checksum of the top portion. And then you can figure out, you know, at least calculate portions of that whole stack, or at least, you know, test it. I think we'll have probably time for questions at the end. I usually like people to interrupt me throughout. But if you do have a question imminently, feel free to say it and I'll repeat it, repeat the question for those streaming. Yeah, so briefly, and this goes way off the screen because SHA-256 is big. When you fetch something from a registry, the first thing that you're doing here, I was playing with a Debian testing image, some of the HTTP APIs that are hit is that it will check to see, like, dereference that first tag. So I said nginx stable here, it would be Debian colon testing. It first hits that and it gets back a checksum that is going to be the next object it steps to in this tree here. So then it gets back an object that is the manifest and of that manifest it kind of skips over the image index in this particular example, but it'll get the manifest. Now it will go fetch that checksum of that manifest. The thing it gets back is some JSON object. That JSON object will then tell it where the config and other layers are, and it can fetch those as well. And then it can check some the whole thing and say, I've got this checksum, I've got these objects. This is exactly what I was expecting to receive. We're good to go and then it unpacks it and figures it out. So all this is happening every time you say docker run or container D nerd cuddle, podman, whatever. But deployments now are much more than just containers. You're seeing a lot of this evolution. You're hearing, you know, NIST and government policies talking about attestation and you know all this kind of stuff. So many more of these are going to or tease out and continue to become important. For most people it's Helm charts and so they hear about Helm charts and they say it's not just a single container, it might not be directly this YAML, but I need to figure out like what this service actually looks like when it's deployed out. It's a constant evolution. All of these that still is at the crux of it is using containers in the cloud native world. So what kind of needs do you have in that space? It gets into like kind of storing the state and related configs. How do you start having different metadata that's attached to this? Sometimes you'll have outside services that will just store the containers, digest and you know correlate that data. Sometimes people will put, there's data structures for like labels and annotations, but you're seeing lots of like linking here. And so at some point when you're creating new metadata, how do you not reinvent the wheel? And we're having some of these interesting conversations with whole other ecosystems that are now in the cloud native space, but effectively you're trying to reinvent containers for their space. So how do we do this? All my stuff is whacked out. So it's, to put it simply, one does not simply just do this. Like eventually if you did create a new place to store objects and you have a new deployment fashion, I get it. The apis that we have, they evolved quickly and then they've kind of slowed down and now they're heavily watched. I'll be perfectly honest, they're not great all the time, but they work and now people have processes and they've started to harden them and they've kicked the tires of it and there's lots of situations when we get down to some of these conversations about image format or the apis. We all wish it could be more ideal, but it's also pretty ironed out and to change it would be to start over and do that whole lift again, which is completely possible and some people might want to do that. Just know that you're, you know, a little bit of a Sisyphus task that we've already done that. How can we reuse what we're doing a lot? So even here recently there's been a discussion around a new media type that would make things a little bit more generic and not sound like a container image when it's actually a signature. You know I want to push a signature at a registry and reuse all those mechanisms. How can I have something that's more generic and so we've had a lot of those conversations and it might still happen it's just we recognize it would be a long, you know, on ramp to bring in new media types. So currently even particularly the solution is how do we bolt this onto the existing container image format and continue basically the onboarding we already started even six years ago to enable new use cases. So with that the ground's kind of set you see how some stuff moves slowly and how there's this component that's basically slow moving and boring at the center of Cloud Native most people use it and it's just completely implied what it does. So how do we start extending that and saying okay cool if that's a given either I know I'm going to start at the bottom of the hill and create something new or I'm going to just figure out how to hack on this thing that's already available. Some of the various tools that I'm going to show next and I do want to emphasize you know my slides are already online and you can go use them. I intentionally did not paste these examples but made screenshots so you have to type it in yourself. Is this is maybe interesting and it's a learning exercise for you so that you can put your hands on some of this but most of everything that I'm about to show you is should be in transparent to you as a user. You're just going to be using various tools and components and workflows and probably even things from companies that will make increasingly value add services or make things workflows more easy you just should be aware of what's happening behind the scenes. So very very simply I even worked on a project while I was at Red Hat where we were wanting to make the the source of a lot of the rpms that we were publishing in container images available in a like manner for certain licenses to make the source of those source rpms available in the same way that you would fetch and use it so if you're fetching a container image how would you fetch the source rpms for all those without having to hit different apis effectively use the same apis. Some licenses are a little bit more particular to this like if I'm remembering correctly it was like the LGPL31 like manner so this is basically how the state of the art has been and probably will continue to be for some folks because it checks the box they're fine with it but from scratch over there you'll see I can't get my pointer but you see from scratch and you have some local file like high dot text and you just add that thing to a register you know to the file system and you push it it's only you know the whole thing is only pretty tiny well that's not the whole thing but still and then you can push it somewhere now this thing's not going to run it's it's this is not an executable at all you would have to have tools to fetch it unpack it see the you know content there's more metadata you want to have around it but at the crux of it this is the state of the art for packaging non-container content what you don't see but it's kind of implied in that high dot text is the file system there's still something to unpack so this is still a file sitting inside of a tar archive that's gzipped that's the whole object so next what if we this is just stepping through it sorry so here here's the first use of one of these tools so here now I've pushed this container image at the registry and I'm using crane is a tool that I go to pretty often manifest is the command it's actually reaching out to that remote registry and the image that I just pushed I get back the jason object of that manifest and I'm going to see what is the media type of the thing that holds high dot text so here you'll see that exact thing I just said it's actually a tar archive that is that object so if you fetch it you'd still have to unpack it so then just to show some variety I'll use this other tool or as or or s or whatever how are you going to pronounce it it's always funny saying things out loud when you only ever read them it's oci registry as storage or as so here I'm using or s to push the same file now I gave it a different image name but the big part that I'll show is different is that I'm pushing that file high dot text and then their markup for this particular command is it colon now I can give it an arbitrary mime type or media type for that file so if it was jason or you know plain text application you know plain text I could tell it exactly what the media type is of that file and push it to the registry and then here's basically the same command to see what the media type is of that layer and no longer is that thing a tar archive it is the raw text file itself and I gave it my own vendor you know made up vendor identity but if you fetched it there's not even anything to unpack it's just straight the text file where this is interesting and now is if you imagined like that example of pushing source rpms you might have a layer like many many many layers that are all just the source rpms themselves not a file sitting in a tar archive that has to be unpacked over and over and over redundant steps and you start seeing a lot more reuse of space so now when you see that checksum or that digest of that particular object you can see the checksum of that same object locally or in a yum repo or otherwise without having to say oh it is g zipped and it's in a tar archive so you know now your your content addressable storage doesn't line up now you start to see how it lines up a lot more um this example is the same as well almost the same as what i just showed you now but without the jq command to parse to that media type when i did the ors push this is the actual media type that got pushed there um so as i was saying the bolting on or like just making it work i've just pushed a thing to a container registry that is a single text file and there was this other boilerplate config that still had to be there because it knows how to interact and all the tools or if you wanted to have portability between different cloud registries or local registries certain things have to be in place so you'll see in that config object and it has the media type and digest that config is null and void there's no actual config there there's no environment variables there's nothing to run it's actually just curly brackets it's only two bytes large so some of the tools that are now starting to work with these kind of features no or you're becoming a smarter and aware of like when you get something that has different media types or possibly has different annotations that don't actually try and run it it's just going to be the file that you're looking at and maybe some other annotations around that so this is again should be transparent in the future but it's good and nice to know about so next a similar example in this here I have that Debian testing image local on my computer I use crane I push that out to the registry so that you can see the whole check some of it the important part of this first command is once it pushes what I said as this Debian testing you know name and tag it comes back and says well here's actually the checksum that it is I mean b75 bc75 something I'm going to go and actually edit this thing that I just pushed so I pushed a text file I pushed this text file to the registry but somehow I want to say maybe this text file that I've pushed is somehow I want to add some metadata to it and somehow relate it to that Debian testing this is a contrived example I'll circle it back if you're confused I understand I've been there too but I'm going to go back and add this block of JSON to the text file and make it actually relate to this Debian image I could have laid this out better so crane edit manifest is another neat command it actually will fetch the content from the registry open it in your local editor allow you to vi or whatever it is so I'm going to actually add the JSON to this object it then repushes and gives you the new checksum because it's fundamentally with the merkle tree it's fundamentally a new checksum so now high text here's your new digest the tag is updated and it added the JSON that I was interested in but regardless these are some of the tools that you can use to like put your hand on it so now seeing a few different examples of like here's merkle trees in action here's editing the JSON like here's object content addressability why would you ever want to play with this or you know how does all this relate together so just like I said earlier you do have deployment configs you have helm charts at this point helm charts actually have a deployment tactic where they can push to OCI registries most of the time those are somewhat independent of the images that they might even reference it might be that they are just using tags and they're not they're not pointing to specific instances there might be situations where you have helm charts that are very explicit and say actually this exact digest of an image and at some point they would need to be linked to those images so you could kind of see that cross-reference more more explicitly signatures how many people would like to know that the person that built the the person they expected to have built that image was actually the person that that built that image hopefully everybody in the room most of the time people are just expecting that they have an implied trust on like HTTPS trust that they fetched it and that nobody man in the middle of them you couldn't actually prove that MariaDB maintainers actually built MariaDB you need signatures for that so now you've got this new object that is deeply related to the exact build of the image where are you going to store that signature you can store it in the registry and link it software build and materials also might relate deeply to the exact image push that to the registry we're going to continue to see new different types of packages jars rpms wasm BPF all kinds of stuff those objects really don't always need to be stored in a tar archive they might just be referenced directly and you'll see the media type be some kind of BPF bytecode or wasm or otherwise and I'm certainly missing a lot more there's just a few projects that have bubbled up to the top recently and we'll be in that situation so all this comes to another feature that has been working in the OCI recently called the referrers API we already have a release candidate for an iteration on the distribution spec that specifically now allows that text that I've added to my high dot text image earlier is kind of a keyword there so referrers list in the API you can actually have images manifests that actually point to other manifests so you can explicitly link these things rather than contorting a workflow and have the thing that they're referencing be different types of media types so some of the registries actually specifically here I just want to show that even when you're pushing some content you might actually see that it's adding it sees that there's a new subject and it adds that link as a new referr type so you see how they're connected yeah so why do all this why see some of the building blocks of how this is kind of boring this is intentionally slow moving why not just build a whole new service or you know like iterate more fastly more more quickly at this point the big effort is to try and keep this simple like what are the most common building blocks between a lot of these different challenges so when we hear different communities get involved and they rightfully have additional services or additional like semantics that they need for their language or their project what are the common problems that we're all trying to solve here and that keeping it simple is tough and it requires a lot of conversation a lot of deliberation and a lot of debate but how can we use this content addressable store make it extendable and build on what we already have that's at the crux of most cloud native deployments yeah that's that's basically that the probably the most useful slide in this whole deck is right here and again you can find this slide find this online because this is too small to take a picture of but from the top it's the oras project the crane utility additional projects in the space like reg client and scopio have different aspects of interacting and manipulating or just interacting with the two different signing projects that are going on information around how helm charts interact with oci information around fermion spin and how that it works with wasm and there's other tools in that space as well bumblebee is a project that's actually storing bpf code in registry so you can actually pull it down and interact with deploying bpf applications as if it was a container blogs around how you can use this upcoming feature functionality on like azure container registry at least a really good dive into this references like what all this references means by chain guard explore dot ggcr dot dev john johnson is one of the lead people behind crane but ended up putting a really simple and really really useful website together that you can just paste and get like click through and most things are hyperlinked so you can even click into images and see what's actually inside images and like figure out what's going on a couple more but even most recently if you ever found yourself running like docker run registry just to have something to play with is the pull request even for references support into that docker distribution or distribution distribution registry and with that i'm done we have a few minutes for questions and i'll be around this week in this room right now if you are working or interested in this space to reach out just like i was saying with the kind of the onboarding interest as an open source or user or consumer person that learned in this space it was only through people that i was almost afraid to talk to when i first got started and that i'm like permanently indebted to so please reach out to anybody that's involved there's several folks in the room that i keep looking at that are maintainers of some of these projects they're involved in the standards and specifications they're worried about backwards compatibility and they're out there fighting the fight for people who will never know their name and so if you have use cases that you know you're just interested in or that you're worried about breaking you know that would be broken by this kind of features we want to hear about that so specifically find us seek us out because we want to hear and otherwise i'm vbats online and feel free to find me any questions cool all right thank you