 Hey, everybody. So our talk today is going to be on securing your container supply chain. So I wanted to talk, first, we're going to introduce ourselves. I'm Katie, and this is my wonderful teenage Diego. Folks. We work for Microsoft. We are what's called app innovation global black belts. Does everybody know what that means? No, huh? You'll lay it down. Do you know what that means? Oh, hey. There's a former GVV, right? So basically what we do is we work with customers out in the field to kind of innovate the products that Microsoft sells. So we have a lot of deep technical knowledge, and we work with a lot of big customers on everything for app modernization. So we do serverless. We do Kubernetes. We even have gotten into the low code space. So it's a lot of fun. We see a lot of different customer scenarios on a personal note. Diego and I enjoy cooking, eating. I crossfit very, very poorly. And Diego, what can you say about yourself? I like to ride tractors in Prince Edward Island around corn mazes of all things. So and then we have a colleague who is kind of like our honorary co-presenter, Ray. And Ray is here represented as a coyote. We'll see the coyote across different slides from this point on. Yeah. So one of the things that we see in Kubernetes, right, this is a really advanced path that we don't see a lot. A lot of times we just see the dev, build and push their image right to Kubernetes, right? And everything always works so well, and everything is perfect and very easy. And Diego and I were like, huh, you know, turns out that it's not always how it works, right? No. And I think one of the things that I mean, again, working with customers that we see is there's a period of time where everything is like unicorns and rainbows and things are just to work out, and it's lovely on my machine. I think I just saw a sticker that works on my machine. Pick those up still. And then when Katie was telling me about, hey, there is something here that we should take a look at, it's not that simple. How do you actually ensure that the container image you have on your laptop or, you know, your CI, it's actually the one that's running that somebody didn't tamper. So we think about NPM and all of the madness that can go around that. And, you know, not to pick on JavaScript or any of that, but that's one of the things we've seen. There was another use case you mentioned, two with SolarWinds and all that. WindSack, that's kind of what, you know, thinking about malware that's being inserted. Now, you know, this might not have solved that problem, but it's just, if you think about it, it's just kind of another layer in your onion that you can help provide some type of support, even if it's not gonna catch everything, you know. It's a start and we wanted to look at, these aren't the only tools that are available to help do this. These are just kind of what's currently out there and something that we found interesting and we adapted to some customer scenarios that we had. So it was like a really fun kind of thing to investigate and see what's out there and help other people kind of think about how to secure their container supply chain and what they can do and just the different tools that are out there right now and what might be coming. Yeah, so maybe we wanna go real quick on, like an overview on this here before we kind of dive into the concept. Go ahead. And maybe just a couple of caveats to this. I mean, we're probably all familiar with how CNCF projects come and evolve, right? Whatever we're talking about is probably applicable today, six months from now, chances are, you're gonna have to, because we have a repo, you're gonna probably have to update, we're gonna try to update as well, but versions go up and go and all that. So looking this into like a snapshot in time of what can be done today. If you look at that picture there, so we have a developer, again at your point, like coding, pushing stuff from the local laptop, you know, running Docker. And then what we wanted to do is we wanna be able to sign that image that I'm creating locally here. So you can see that I have notation there from the Nori project. So we're signing that. We're keeping certs in Key Vault, in Azure Key Vault. So we're making sure that we can reference this back later on and then you can see that we have a signature. That signature gets pushed from our laptop into a secure location. The moment we have Azure Container Registry running for us, we can now verify in two ways. From a developer perspective, we're gonna show you and Katie will go through all of the concepts about, hey, is that actually valid in there? If I actually manually wanna command and say, that's actually valid, we can do that. It's gonna be in ACR. And then later on we have components in Kubernetes. So we're looking at gatekeeper and ratify at the bottom there, if you keep going down. That will also check. What are we checking for? We're checking for pods. Pods that we're running based on images. These images will have a SHA-256 signature attached to them. So if that is a mismatch, we just don't want it to run. We're gonna intercept that call. We're not gonna allow it to run. So two ways here. If it's signed, approved, you can run. If it's not signed, just deny at that level. Yeah, and this has kind of made the magic as like the admission controllers are really what's allowing this to happen. So we're gonna take you through some concepts. Here's our coyotes. You can enjoy this for a moment while you think about what concepts we're gonna go through. So the first one is OCI and Oris artifacts. Just like if you're not, I'm assuming everybody's kind of familiar with it, but we just wanted to throw it out there. It's just an open container initiative, just set standards for container images. And one of the big things it looks at for in this is the config.media type. And then everything has to be dependent on like an OCI compliant container registry. So everything that runs through that has to be compliant. You have then the second part of this is you have Oris, which is just like your registry, which kind of just extends what can be, what kind of artifacts can be saved that kind of relate to your container. It's not just like a container image. It's like non-container artifacts also can be stored in Oris. And you kind of use it to discover artifacts, use the OCI registry or storage. And that's basically, I just wanted to bring up those concepts in case you're not familiar with them just to see what's going on. So one of the big things that the signing that we do is we use something that's called notary. It kind of solves the problem of trusting content within and like across all the different registries. So it just makes sure that whatever we put in the registry is what we expect to nobody has tampered with it. It keeps a record of all the changes that happen inside the container so that we know if something has happened to it. So it's kind of like a super detective that like, you know, sniffs out if anything happens. So this is just kind of some, these are just like the high level concepts of what it does, right? So the artifacts are signed with private keys, but they're validated with public keys. And a little bit more about it. Like, what else does it do? So you have to, you know, you're thinking about concepts like content trust ensures integrity and provenance of your container images. And it does that, you know, because they sign them with keys. You have transparency. It maintains that log. So it's easier to track, you know, what changes have happened. Scalability, one thing that's nice about this it's set up to handle large scale. Yeah, I mean, the only thing that I want to mention on specifically on notary here. The moment we have these components running in the cluster, there is kind of like a fire and forget situation. If you don't have this, you would sign in one time. And then the next time, let's suppose we're deploying NGINX here, but somebody tamper with that installation or something wrong, you know, through all the CISD pipeline. Notary would actually deny, even though you might have gates throughout the pipeline, that needs to be approved and everything is okay. Everything looks okay from an EML perspective. Notary could stop that. So it is transparent on that respect because you do have the cryptographic key. It will say, hey, there's something here that doesn't match. And it will alert the user like just, I'm not gonna run this for you, right? And this is kind of, as it stands today, there is a lot of small components that if they're not all lined up together, this won't work. So the difficulty here is that the beauty when this all works is what we have here, like the transparency end to end from a user. Because I guarantee you, and maybe I should put that as a question here, if we were to start enforcing too much things on developers, they will bypass that. They're like, I'm just not doing this. So I think that's a beauty of having something like Notary as part of the solution. Yeah. So yeah, it manages your offline operations. It allows, so it uses, you can sign it without necessarily relying on an external service. And as flexibility, it supports like a variety of different container registry types. Okay. The next piece of this is OPA, Open Policy Agent Gay Keeper. So this has two big components to it. One are the constraints, and that's just kind of your representation of your security policy and the constraint templates. And those are statements that kind of, that's just like the declarative form, kind of like a YAML file saying, what is your security policy? So you might have like a constraint that declares allowable sec comp profiles to be deployed to like a specific namespace. But then, so that's your constraint, but then your constraint template is what allows you to go and extract those values and apply logic to them and see like, hey, can this really get at the same space or not? So. Is anyone actually using Gay Keeper? Are you using prod or development? Both? So you would see like, if you're already doing that, you already know how to do a raggle at that point. You already have, you know, it takes away statements like, maybe I'm trying to block, I don't know, you know, PID to run a remote open SSH server or you name it, right? So that component sits in here to ensure that policies are applied. So you kind of need all of these side-by-side for this thing to work. And we'll demonstrate that. We have a little demo. We can also show kind of the repo there. But then what we have is, you know, Gay Keeper with some pre-canned policies so we didn't create anything new. We're relying on what all of the other components when they come and they spin up, they already have some things in place. So at this point, we're not really writing any raggle statements. You could, but we're kind of taking the vanilla approach of whatever it's outside of the box, we're gonna go with that. And at that point it's pretty secure. It's actually locked down in many ways. And it's really nice too, because obviously it's customizable to whatever, you know, security or whatever policies your organization feels are important and that you, you know, I know there are all these like regulations out there now that have all the different industry-specific solutions. So like how do you, you know, this would be a great way to, well this would be one way to think about applying those policies and kind of have them centralized, have reusable code snippets, instead of like maybe having logic all over the place that does it. And this is just some more like, hey, what else can Gay Keeper do? Like what else are people using it for? So like auditing. So it like, it will periodically, like investigate your environment against, and all the resources invalidated against the constraints that you have declared to make sure that nothing, you know, has changed the drift, as I said at that point. Yeah. Config drift. Yeah. Yeah, like you drive runs, you have like, so you can test canary releases in a cluster, namespace exclusion, and there are the scope of the resources that a policy can be applied to. So there is another component here. So we kind of touched on two of them. There's a third component, which we're going to go next, called ratify. So ratify is- Just the workflow engine. Yeah, it's an actual workflow engine. I, and I think, you know, speak for both of us, we spend more of our time, I mean the point Gay Keeper, to be honest, is a no brainer. It's a Helm chart. You fire, forget. There's a little bit of configuration, not many. Notary and notation. So on the client side, there are some configuration there. They're going to be a little careful. But ratify is where I spend a lot of my time trying to understand what was going on, especially. So if you do, and I suppose that we're going to try to do some of this. If you do deploy this, and you say, oh, I got that wrong. Whatever it is. The moment you remove that Helm chart, it will leave behind like a handful of CRDs. Just FYI. If you don't remove those, you're not going to be able to just install with a different configuration. Cause those CRDs point to some secrets and config maps in different namespaces. So you would yank the whole Helm chart and you also probably write some XRs to remove all of the, yeah. It's like, yeah, it's all the bash glue that we have to write. To remove all of the rest of this stuff that are in there. Just so you know. So in case you're testing, getting frustrated, that's probably why. So if you have to do like, I got to change this one little parameter, remove everything, put everything back. Yeah. And so those were kind of the big pieces that we worked with to get this working. So I guess our next, we're going to have, you know, show me the money, right? Let's go and show a little bit of the demo there. Yeah. So let me go here. I hate not having them on. Maybe we just want to do a full scale. Do it. Yeah. A camp and hit play. Yeah. Can you see? Can you see? You want to move? So a couple of things in here. I mean, this is just a schema for now. This is coming from TTY-REC for some of us old free BSD users. That's where it's coming from. So what am I doing here? Couple of things. We are, so this is the local laptop as from developer perspective. So Katie is developing this. We're logging in into a container registry through notation. So think about the stuff you would do with Docker, the local binary on your laptop, where now we're using notation for that. That's our binary. So we're logging in, getting those credentials. And at this point now, you're going to see that. I don't know. I couldn't get rid of it. But anyway, what do we have now? Now I'm going to build an image. So I've logged in. So this is a caveat here. If you're familiar with AZCLI in Azure, we're not going to be doing a login back to ACR through AZCLI or through Docker. You need to do this through notation, otherwise this won't work. And that's a little bit of the caveat there. But the moment you're logged in, you can now build that image. And that command right there, we're building directly against the container registry. I don't have that on my machine. It's pulling the source code from somewhere. And then the Docker file is also there and it's building everything at that point in time in Azure directly. So not on my laptop. And you can see we're just, it's a regular Docker build at this point. So it should look familiar. Now the second thing that's happening after that is that we are going to just sign that. So remember, we built this image. So think about your NGINX or whatever we're building here. Now we need to get that because we build this. So think about automating this through a CIC pipeline. You have that build and you're now pointing that chart 256 signature back there. So it's actually going to be attached. So this is where when Katie was talking about auras, so we're not just putting an image per se of a container, we're now storing all other artifacts. And that's one of them. We need that for later on. So for now, we build something, we log in, we build, we put the signature up there. From this point here, there are a few directions we can do. What I'm showing you is just, can I verify the images I have? So you're gonna see I'm typing notation LS, and then I'm pointing this again to ACR. So the tool is local, kind of like Docker, but I'm pointing this there. The more you do this, the more that tree will grow now with different signatures. You can also remove and you can prune and clean those. So the importance here is that it is there, I can see it as a user. Now, let's suppose a different user comes in. Can that person verify that? You can, as long as you have access to the actual keys that were used here. Remember, this is like a, we're using certificates for this, that are storing key vaults. So the moment the user, the developer has access to that, he or she can try this out. And this is what we're showing at this point. We're verifying whether or not that was signed. So the command is notation verify, you point this back up, and you can see it says successfully verified. If you try this with an image that was tampered, it will just say, no, I cannot verify that. That's one scenario. Scenario two, which is, you should also be aware, which is the caveat right now, is that if you keep rebuilding this, again, as you're like tinkering and you're trying different things, it will fail because it takes 24 hours for, yeah, for this to be refreshed in the cluster. Now, there are ways to get around this, but by default, that's kind of what it is. So just be aware. I mean, I have some notes in our repo about all of the little things that we were bumping against, and that's one of them, unfortunately, at this point in time. So right now, everything is looking good at that perspective. It is signed. We know what it is. And then if I just do keep CTL, run, dash, dash, image, and I point to that either V1, so column V1, or to the entire, like shot to 56, that will run against the cluster, and it will work. The opposite of this is if I try anything else, it will just say, no, cannot run at this point. So what we have out of this would be, if I have already a private cluster, if I already have all private endpoints to that cluster, how do I make sure that my developer, when that person is using this image or the CICD, which is probably another developer, I mean, what we're showing here is a step, baby steps. When I automate this, I want to make sure that the signatures match, right? And that's kind of what that is there. I wouldn't, this can all be copy, I mean, it's Askinwa, but we also have all of the commands and everything automated in the repo. I just quickly go back to this. So if you want to see the resources that are associated with this, we have our own GitHub repo, we're again, AppDevGVV, it's called the chain on the big Fleetwood Mac fan, so. This became the chain joke. It's all in there, right now as it stands. So Katie put some extra links in here. So just size for this, the first one, and we can probably open it so folks can see. It is all done in Terraform, that's the first link that we have for our team. So it's a bit of bash script and the rest is just Terraform, just to spin everything up. You don't have to use Terraform for this, you can just do this through bash if you want to. That's fine, I have an example on that too. From a learning perspective, because what we have there is mostly, I just want to kick the tires and see how this actually works. That's what we have. But from a learning perspective, that link there, build and assign a container image from Microsoft Learner is where you can actually have a step by step. It will go through all of the things that we're showing here locally on your machine. And then the second chunk of that actually points out to the days labs. I don't know if folks are familiar with them, but it points to their GitHub and they on their GitHub will show how do we interface this with AKS and all the things that we talked about. So in order to ratify, it's on that second chunk. So it's a two side like tutorial. One is all local and then the second one is how do we actually take this to the cloud and scale this out? I may be some questions. I mean, we probably kind of, we still have time. I kind of went in a bit of a hurry, but is there any questions or thoughts about some of these approaches? Try to get the mic. And I'm gonna leave this, I mean, we're in the questions. We're gonna leave this as a little bit of a, this wouldn't be a Kubernetes talk if there was no memes. So hopefully that's not you by the end of your experimentation with this. Question about the signing phase. With what identity do you use to sign and where do you keep the private key? Okay, that's a good question. There are a few examples in the actual repo. We are using workload ID for this. So the moment I create the AKS cluster, and maybe I will go back here and open up the Terraform for us. So it's using workload ID. If you... It's in the diagram too that we have. But if you were to go back and look at the docs, they have a service principle being used. So when we decided to actually have our own repo, and oh, it's not showing there. Maybe we can put it in there. So we've decided that workload ID is where we wanna go, right? So as the cluster gets bootstrapped, we also have a user manage, and then we use workload ID with that. So I extract that object ID and client ID, and that's what I use. Now in terms of the certs themselves, right now they're self-certs. So we're generating through Terraform. You can generate this just through a JSON file point this to Azure Key Vault, and Key Vault will get that for you. So you have the CRT and the PEMs, you can download those, and this is what I'm doing at this point. And we use Azure just because we're Microsoft, but this is really a cloud agnostic solution. Yeah, so if you were to... Again, we'll share all that in here with the talk and everything, but it's all a chunk of this here. We also have a jump box. So in case you don't wanna install any of these on your machine, I provide a jump box, you can SSH, it will firewall through your public IP. So just you're aware, like it will create the firewall rule, and you can access that. And I have Qtl, Helm, Notation, all of that pre-installed as an image in there. In case folks wanna go there. You just wanna check it out and kinda not. Yeah, but that's a good question though. So again, two paths. The one that you will see out of the days documentation is using service principle initially. And then they have a second set where they mention a workload ID. So this is what we have automated. It's all using workload ID, because I think that's more realistic for anybody who runs it in production versus dealing or babysitting service principle. I don't think anybody wants that. But thank you for that question, yeah. Any other questions? Okay, well, thank you for coming. Thank you for your time, everybody. Thanks for attending. And if you have any questions, you know, we'll be happy to have. We'll be around here for the next questions. Thank you.