 I think it's 11 30s, so we'll get started. Uh, whoa, whoa. Whoa. All right. Thank you. Wow. Uh, so, uh, excited to speak to you today. Um, this is a talk about Argo CD, but it's actually also a talk that involves Argo workflows. And so I know if some of you are Argo workflows, refugees who washed up in this room, that's okay. You'll have something to look at too. We're going to talk about being secure by default with GitOps. And this is just kind of taking a different approach to security. My name is Dan Garfield. I am the co-founder and chief open source officer of CodeFresh. Uh, raise your hand if you know what a chief open source officer is, uh, and does. Okay. I'll check with you afterwards. I'd like to find out. Um, but, uh, but yeah, I work, uh, on in CodeFresh and I help lead our open source efforts and, uh, our contributions to the Argo project. Uh, I am an Argo maintainer. Uh, most of the work that I do is, um, it's, uh, let's just say it's less code. I don't do as much code these days. Uh, I do a little bit here and there. Um, I'm also, uh, a founder and, um, create, help create the open GitOps project in the GitOps working group, which created the GitOps principles, which I hope many of you are familiar with because we're going to be applying those principles today. Um, if you're not familiar with CodeFresh, we are a, uh, a continuous delivery, continuous deployment platform built on Argo. We have a, an enterprise version of Argo that, uh, I would love to hear, have you check out and hear what you think about it. Um, I wanted to start really quick with a brief story. Uh, when I was a young man, when I was a kid, I, uh, had to skateboard everywhere, uh, to get places. And I started a business building and selling computers and fixing networks, uh, and doing networking for people. And on one of my jobs, somebody actually paid me in an old laptop. And suddenly I had a laptop for the very first time I had portable computing. It had 128 megs of RAM. I threw net BSD on that thing. I stuck it in my backpack and around two or three in the morning, I would skateboard out into the neighborhood. Uh, I had about 40 minutes of battery life. I'd put on my headphones, had to reinstall the audio driver for a net BSD. And then I would listen on war driver for networks. Uh, and then I would find WEP networks and allegedly, allegedly, I would try to break into these networks. And the trick with WEP is that you basically, if long, there's lots of clients going on, you can send out a signal that says that they need to re authenticate. And then the WEP clients will all spray their, uh, their keys back to the server. And you can get in the middle of that and intercept those keys. And then you can gain access to the network. So I tell you this so that you know that my security credentials are very outdated and not that useful for today. Uh, and second, because getting in the middle is what we're going to try to stop from people doing with Argos, uh, getting around Argos CD. So what are the problems we're going to solve today? First, do you know where your software comes from? It's 6pm. Do you know where your software comes from? Uh, and this is a real issue in the community is the supply chain. How do you know that an image or a binary hasn't been intercepted? How do you know that a tag hasn't been overwritten? Um, you know, Docker images are supposed to be immutable, but when I pointed that tag, you can switch the binary that's behind that tag. So how do I know that what I'm getting is what I want? Um, how do I know a programmer's machine wasn't compromised? I know of a company where a programmer clicked on a phishing link and suddenly attackers started deploying to his client's websites. Uh, so this stuff happens. Um, the second issue is how do you enforce rules on your clusters? You do lots of trainings. You do lots of automation, maybe. Maybe you audit and you check and make sure. Uh, I know another story of a big credit card company that went to go debug a service that was failing and they found out that the team that had been working on it hadn't checked any code in for the last year. But they'd been pushing. What the heck were they pushing? How do we, how do we know? Um, and then third, get-offs is really great. So how do we make sure that people are actually using it? Uh, it's very easy to, to get lazy and drop out of the mode of making git commits to make all of your changes. So how do we make sure that people are actually following the process and using the tools we wanted to do? Um, when it comes to security, training is a great first step. It should never be the last. If it is the last step, then it might be your last step as a threatening, threatening to turn a phrase. Um, so approaching security, what we're going to do today, one, we want to verify our software supply chain. Two, we want to enforce security policy. And three, we want to do it all with get-offs. And to do this, we're going to be using Argo CD. We're going to be using the get-offs principles. We're going to be using a project called Sigstore Cosine. And we're going to be using Open Policy Agent as well as Kyverno. Uh, so starting off with Sigstore Cosine. Many of you, how many people are already familiar with Cosine? Have you heard of this? A pretty good chunk of the audience. Probably most of you have played with it. Um, this is basically a tool for doing supply chain security. So in this presentation, here's my first gift. I have a free workflow template for you to use. Uh, so the first thing, if you're going to use this, basically the way that Cosine works is you can sign images, helm charts, other binaries, and you can sign it with a key and then you can verify that key later. And so you can make sure that the artifact that you're looking at was signed using the key that you expected. And this way you can verify the provenance or where, uh, an artifact came from. So to do this, uh, first you need to generate a key pair. Um, you just run Cosine Generate Key Pair and you can specify a Kubernetes namespace that you want to put that secret into if you want to put it directly into the cluster. Uh, two, you can, um, you, you need to sign that image and upload it to the registry. And three, we can actually automate this in a workflow template. So I'll just show you really quickly. Uh, we, if, I don't know how many of you are familiar with, uh, uh, Codefresh Hub for Argo, but this is a, uh, collection of workflow templates that are focused on, uh, CICD, DevOps, and basically automating your software delivery chain. And there is a new one in here called Cosine. You can grab this template, throw it in your workflow, and use it to start deploying stuff. And don't worry, there'll be links to all this stuff at the end. And we're gonna be using that one today as well. Um, so the next part of this equation is gonna be OPA. Now OPA stands for Open Policy Agent. Uh, many of you are probably already familiar with Open Policy Agent, especially from its use in service meshes and it allows you to create policies, general-purpose policies like this pod can talk to that pod or this thing, can't reach outside the network or those kinds of things. A newer implementation of OPA is OPA Gatekeeper. And, uh, this is a really interesting tool that sits on your cluster and it basically sits on injection web hooks and inspects elements and make sure they meet policy. Um, so these policies can do things like protect namespaces, they can prevent privileged containers, they can block and allow image repos, uh, they can limit replicas, they can do all kinds of things, and they have a huge library of these already available called Gatekeeper Library. And at the end of this, I will also show you how to use it to prevent people from deploying things without using Argo. So if they try to apply things directly, we can actually stop them from doing that. Um, so with OPA Gatekeeper, basically you apply your manifest, the admission web hook picks these things up, it grants the constraint, uh, and then it, uh, enforces or doesn't enforce or allows things to be deployed onto the scheduler. Um, if you're using this with Argo CD, the first thing that you'll notice is that the way that OPA Gatekeeper works is that for each template, policy template, it actually creates a custom resource definition. Now, everybody knows that if you're using Argo CD and you have a bunch of custom resource definitions, you just need to, like, think through it. Um, because, uh, you need to make sure that your custom resource definitions are applied before your custom resources. Um, so, uh, for applying these resources, I have a couple of policies that I've already set, and, uh, I'll just show you, I've got a policy, for example, that only allows, uh, images from my Today was Awesome repo. This is a policy I've already set up on my cluster. And if I were to try to apply this, and let's just do this with, um... Let's see. Examples, pod, restrict. And I'm gonna send this into my secure namespace. Now you can apply it to all namespaces for the sake of the demo I did it that way. And I'm gonna do a dry run. And on this one, I'm gonna do a dry run equals server. Uh, so this will try to apply my pod that shouldn't be allowed. And you can see that it's been forbidden. And an admission webhook has picked up and said, hey, uh, you're actually using the latest tag? That is a good option. No, no. Do not use floating tags, right? So we've detected that. We've blocked it. And we've detected that it's not from the expected repository. Um, and so you can have, you can have these mixed and matched, and, uh, you can see how it works. Now with Argo CD Today, in 2.4 and lower, the way that we actually validate resources is we actually do a dry run client. Now what happens if I do a dry run client? Is it gonna pick it up? No. There's nothing for it to test against because the constraints are on the cluster. Now that means that when we sync these resources with Argo CD, we're going to have failures. And it's going to, it's going to flag them as fail. It's not gonna allow them through, but it's gonna show up as essentially, uh, if it's a pod, it will show up as just progressing. It'll be stuck in progressing. Um, if a specific resource failed to apply, then it will flag that resource. But starting with Argo 2.5, and we should be able to start cutting the release for 2.5 this week. That doesn't mean it's been released this week. I mean, we're starting the branch, the release branch this week. Um, it will actually do, uh, server-side, uh, dry runs, which means that it will actually pick up and throw errors earlier, uh, and have better support for using injection webhooks and things like that. Okay. All right. So, uh, we just saw how we can use gatekeeper to prevent things from getting deployed. So what about with CICD? I'm gonna show you a slide, and it's gonna be really intimidating. I'm so sorry. Delivering software is complicated sometimes. There's a lot going on here. So starting off with number one, we've got our application repo. This is where my application engineers are working. They're making changes. When they cut a release, this is gonna trigger CICD. And in this case, I'm using CodeFresh has a version of Argo workflows that is hosted that I'm gonna be using because it's sort of optimized for DevOps, and it takes care of my events and stuff for me, so it's convenient. Um, we're gonna run our build. This is gonna build our image. Then we're gonna run the cosine template to sign that image, which will then be pushed up to Docker Hub, uh, at which point we can trigger three, which is to add a new release to our GitOps repo, where we've actually defined what should be deployed, um, at which point Argo will, uh, try to deploy that, and, uh, gatekeeper will step in and examine the policies that are there. So once that's done, it'll deploy the resources or reject them depending on what I'm deploying. So, um, I've got a release here ready to cut, and, uh, I've marked it as v1.0.1 secure. I'll go ahead and publish this release. Now, once I publish this, um, I should see my workflow kickoff relatively quick here. I said relatively quick here. The Wi-Fi is, uh, you know, an issue sometimes. All right, we can see that this is kicked off, and it's building, and it's gonna build that Docker image, and as soon as it does, it's gonna kick off the cosine and push it up to my Docker registry. Um, now while we're doing that, we can actually prepare the release. So I've got my OSS secure repo here with links to everything from this talk, as well as links to the slides. So, uh, feel free to use that. Uh, and I'm gonna go into my source repo, manifests, and in this case I'm using the demo accept app, and I'm deploying it to the OSS secure server, and I will edit this customization. And when I do, I'm gonna specify the new version, which is, I think 1.0.1 secure. If it's done baking, yes, it looks like it's done baking that. We can see that it signed the image here, uh, and pushed the signature. All right, so, um, once I've done this, I can go and kick off deploying new secure version. Now actually, uh, normally I would use, like, Argo CD image updater, or some other, maybe CI CD to automate the creation of the deploy, but everybody likes looking at, uh, pull requests being merged live. So, we just did it. Um, merge your own pull request, give yourself a high five, feel good about yourself. Uh, and now we're gonna go over and look at our applications, and our demo accept app. Um, now, we can see this last thing 14 hours ago, and we'll trigger refresh, even though it actually would do it automatically, would get there, wait for a second, and this should trigger and kick off. Now, what am I expecting to see? It should just work. So this is actually not that fun to look at. So what if we deployed something that wasn't gonna work? Um, how would that look? Uh, so let's do this one again, except now instead of deploying the new secure version, I'm gonna be deploying an out-of-date version, right? So, um, to do this, we'll go back and edit our file again. And instead of doing the secure version, I'm gonna be doing the blue version. Go blue. It's not secure. Fun. And we'll commit that one directly because nobody laughed at my joke about merging my own pull requests. Uh, okay, so, uh, we'll trigger the sync again really quick. Uh, again, it would pick up automatically, and now it's going to start going. And you can see it's gonna be stuck in progressing because, guess what? We can't deploy it because it is denied because the check image signature has failed. No matching signature is found. So you saw that it worked, and I showed that I could break it. Okay, so everybody's, oh, thank you. Yes, smooth demos. Smooth demos are good. And you can see this is actually fairly simple to implement. And from your developer's perspective, once you've implemented it, it's basically in the background. They don't even know about it. It's just getting verified automatically. And you can apply this to a lot of, there's a big push to get more community images to be signed so you can verify their signatures. Um, and of course you could build them and push them into your own repo, and you could push them into your own repo while verifying them. You could also make verification contingent upon security checks. You could do image scanning and all these other things and then say, and, uh, we're not really going to go into this, but within Cosign you can make attestations. So attestations would allow you to say who made it. In this case, this secret lives on my Argo workflows instance that is sitting on my AWS cluster that I have managed in CodeFresh. Um, and so I've verified its provenance. But I can make attestations and say this has passed these kinds of security scans. I can verify that all the commits were, um, securely done and that they were all signed. Uh, so you can do all kinds of interesting things. You could actually take it even farther. So, um, we just did it live. Oh, yeah. I was supposed to share that slide before I did it live. Because it's a cool slide. Um, all right. So using OPA with Argo CD. So, uh, this is, uh, one of those talks where I do the demo in the middle and then I explain it. So hopefully everybody's feeling, oh, yeah, that made sense. Everything seems smooth. Um, within Kubernetes, I think of there are essentially two kinds of resources. And I made it these terms. So you're not going to Google this and be like, oh, what kind of resource is this? But this is the way I think about it. There are fragile resources. And these are things that are, once they are started broken, they will stay broken. And a good example is, um, an ingress getting the wrong class, uh, because it's been applied out of order. Um, a month ago as, as a common issue of making sure these resources applied in order. Um, the other kind are what I think of as resilient resources. These are resources that, if failures occur, because things are out of order, they will eventually correct themselves and it'll be fine. Um, so a good example of this is custom resources. Custom resources will fail to start. Uh, but once the definition is found, they will work. So for resilient resources, uh, for your ArgoCity application policy, you can just use a retry. You just set a retry policy and eventually it'll work. I'm not worried about it failing to start right away. It'll get there. Um, but for fragile resources, you really want to use something, some kind of dependency ordering. Uh, and so this is where sync waves, sync windows, uh, sorry, not sync windows, um, sync hooks, um, become really useful. Um, because when you rely on admission controllers, it actually means that every resource you use becomes a fragile resource. Meaning that your policies have to be applied before anything else. Now, this is really only an issue at Cluster Bootstrap, right? Because how often are you going to be applying a new policy along with a, uh, new service or new resources that you expect that policy to catch? Uh, which could potentially create a little bit of a race condition if we didn't apply them in order. So, um, bootstrapping was a really critical part of this because in order to be GitOps compliant, I want to be ready to bootstrap and tear down and restart on my services at once. Um, so, uh, we can actually demo that. Um, well, let's talk about sync waves for a second. So, when you're using ArgoCD, uh, with Gatekeeper or another security tool like Kyberno, um, you can use retries on things like policies because they rely on custom resource definitions. Uh, you can apply the policies before other applications so you can make sure that you bootstrap safe. We can also monitor application sync status because, uh, if we are, um, if you're applying resources like a service, it will fail to apply if it violates policy, and that will show up as a sync error. But if you're applying it just to a pod, that will show up as progressing and it'll just be stuck in progressing forever. Um, this also works with multiple providers. So, I haven't really touched on Kyberno. I'll bring that up again in a second, but this is a competitor to, uh, OPA Gatekeeper. And you can actually stack these tools because they just add hooks to your admission controllers and they don't fight each other because they basically work off of whichever policy gets blocked, that blocks the process. Uh, and we don't have to go, you know, farther than that. Um, with OPA Gatekeeper, it is possible to run audits so you can have an audit report that says, hey, these things are actually exist that are already on your cluster that are violating security policy. I would prefer not to have that happen. This is security by default, not proactive security because I'm really good at paying attention to my emails. Um, and then finally, this is gonna be a lot better in Argo CD2.5 because we'll actually do server-side apply so the behavior for pods, failing policy versus other resources will suddenly become the same. They'll both show up as synchronization errors. Uh, so let's talk about sync waves for a moment and how we did this. Um, many of you are probably familiar with sync waves so I probably don't need to go too deep onto it, but if you're not aware of it, there's a great blog by Christian Hernandez who's sitting in the front row over here on how to do, uh, application dependencies using sync waves. You can use these, you can mix and match these with application sets. You use app of apps with application sets under his child apps. You can make it work. Um, it's definitely possible. So, uh, for applying the policies first, um, because we're short on time, I actually pre-recorded this demo because it takes a little bit longer to happen. Ba-ba-ba. Oh, yeah. Well, uh, uh, yeah, let's do the demo first. So I'll just show you what it looks like. So here I'm using, uh, Argos CD Autopilot. If you haven't used Argos CD Autopilot, it's an opinionated, uh, open-source implementation of Argos CD that basically bootstraps your, uh, Argos CD instance so it's self-managing and it creates a directory structure for you. So here I'm using Argos CD Autopilot app create source and then I specify the app source and the project and it's going to generate my application for me. And this is, this is basically what that's going to go. I have an app of apps. I have a source app that specifies a system requirements app which contains all the things that I need bootstrapped before I can go into the user space and then I have a user space app which is okay, users can all deploy to this, anything they add into this repo, that's where they're going to work. So I have system requirements that can bootstrap and everything happens after that. So when I do this app create what I'm doing is I'm creating that parent application that references a folder that already contains those two applications. Once I've done that it will, very quickly you'll see all the apps start to show up and if I look in the, um, my source app here you can see that system requirements is going to completely finish syncing and deploying all of its elements before the user space. And so that means that even if for some reason I'm bootstrapping and there was something in my user space that violated policy, it would actually be caught. Um, it wouldn't be you can also, uh, and you could you could make it security only if you wanted to be extra. You could just say this is the security layer this gets applied before anything else that's in the system so you could, you could go farther with it. So to do that, uh, is pretty easy. Um, you need to enable application sync status to stop progression if apps aren't working. So when app of apps was first created and sync waves were created, uh, by default the behavior was that if a, uh, if an app was stuck it wouldn't move on to the next app and start deploying it. We changed that in Argo 1.8 because for many users that was unexpected behavior and often times you don't want applications to necessarily be blocking other applications. Uh, so you can reenable this behavior with a config map change which then will monitor if an application is progressing and it will count that as part of the sync status that needs to happen as a prerequisite to the next step in the process. So this allows us to set the order and make sure that resources are always applied in order. Um, if you're using this with all the Argos CD AutoPilot you'll need to use a customization. It's the same code it's just in a customization in case you, you know, haven't seen how to, how to merge that. Um, and then you just set a sync wave and you can set a sync wave 1 on your system requirements and a sync wave 2 on the user space. Now remember this only applies to the parent applications. So if you have within the user space a whole bunch of applications defined that have a sync wave minus 1 in normal operation those will get deployed before the applications under system requirements. So the use case that I was really talking about here is bootstrapping. That's why it's useful. That race condition that could happen if you're trying to apply a policy and an application change at the same time it's like a, it's like a bizarre scenario that like I can't imagine happening. Um, and you'd probably have to take some extra steps to mitigate that if you think that's an issue. But it's like it would require an attacker to be pushing a change and adding a policy to restrict itself at the same time. So why would anyone ever do that? It's like counter to your incentives. So, um, I don't think it would be an issue. But like I said, I'm just a kid on a skateboard. So, uh, if, uh, if you think that's a bad policy, let me know. Um, okay. So next there we go. Uh, the next thing is to prevent applying resources without Argos CD. So I mentioned again at the very beginning of this thing that the whole approach to security in my mind is how can I get in between things so that I'm passed off as authentic. How can I just listen in or how can I inject something? So what if, uh, first of all, can I apply if I apply manifest directly to the cluster without Argos CD? Will OPA gatekeeper not function? No, it'll function, right? OPA gatekeeper will still apply policies. But what if I'm just trying to deploy stuff and in this case it's our is my team actually using getups what resources are being deployed that aren't tracked with within Argos CD. So to do this we can create a simple policy and this is what that policy looks like. Um, if you can see it here I have an enforcement action deny this relies on a gatekeeper library can, um, uh, so many so many buzzwords all at once it's hard to get out. Um, gatekeepers library has a, has a constraint template, um, called k8 required annotations so to use this we need to make sure that Argos CD is tracking with annotations and then we can just set our policy in this case I'm restricting services and pods in the secure namespace and I actually don't have that enabled right now so let's enable this and we will push it restrict Argo managed apps we'll push that oop merge conflicts are fine nobody minds I'll re-push that okay so now that I'm pushing this policy again all these policies are managed in Argos CD um, you probably don't care that much about seeing constraints managed in here but you can see it just deployed so you can see each of these shows up as a resource in here just like you expected to so now if I try to apply a restricted an application directly and let's do not pod not allowed let's do um, service I think it's service not allowed what do I have in here I have reject non Argos CD application and we'll do this in the secure namespace secure namespace and we'll do it dry running against the server so you can see the policy applies so you can see it's forbidden you must provide annotations basically it's looking for annotation IDs from the Argos CD sets to make sure that those are set so otherwise you can't deploy it if I were to apply this outside of my secure namespace let's say the default namespace then it should work just fine because I scoped my permissions to see it works just fine because I scoped it to only that namespace so alright so last thing bonus tip if you are using Kyverno you're probably thinking what why are you talking about Kyverno Kyverno is the competitor to OPA gatekeeper they're both CNCF projects they're both great projects I recommend checking them both out if you are using Kyverno it mutates resources and adds a lot of information and so you will need to add that to your application checking to make sure that you're instead of creating CRDs it creates these mutations so you just need to remove those there's a link on how to remove that so what do we accomplish we verified our software supply chain we verified that no kid with a skateboard is going to come get into the middle of my supply chain and deploy something I don't want deployed we enforced a security policy and made sure that people couldn't deploy stuff that wasn't signed and we did it with 100% getups we made sure that people are using our Argo CD they're not trying to bypass the process for how we deliver process software we're able to verify all these things and ensure it so even if I have failures down upstream like somebody getting access keys they shouldn't have or somebody having permissions that are looser than they're supposed to be or any of those kinds of things we still can enforce and make sure that this stuff happens and when it works it's transparent to the developer experience they don't notice it because they're not violating the policies and if they are violating the policies it means they didn't follow the training but remember training was just the first step not the last one so with that there's a link bunch of links in here again this is all in the get repo I'll give you a second to take a picture and then I want to give a shout out if you haven't already registered for it or if you haven't already done it the we have amazing get off certification that is the number one in the world we have over 10,000 students it's probably over 11,000 as of today we are doing a live run through tomorrow which is the 101 get off fundamentals and then we're also doing two or two get off at scale and all of the creators of that are here and with that I'm happy to take any questions so thank you have you ever seen a demo run so smooth I mean come on so yeah we've got a second for questions if anybody has a question feel free to raise your hand I'll keep you after no I won't keep you after oh wow okay threats work go ahead what kind of skateboard do I have it's just a blank board I'm not that I'm not that interested there's one over here yeah yeah is it not revealed to add the annotations that it's expecting yes that's a very sharp observation is it not trivial to add the annotation that it's expecting it is fairly trivial to add the annotation that it's expecting in this case this works as a pretty soft rule enforcement to say like so I imagine a scenario where maybe your hair is on fire everything's exploding around you and you want to go interact directly with the cluster and you've broken all the glass and you've gotten permission to do it you can work your way around it that is not a security enforcement when it comes to making sure that resources are applied with our GoCD in that case it's really about making sure that people are following the training and the policy of how things should be deployed if they want to get around it they could but you would be able to also audit it and follow up later and say hey why are you always doing this so it wouldn't be in the dark at least so what kind of policies can you enforce and can you enforce these policies where do you specify and when do you enforce them on time, deploying time of build time so you can enforce Gatekeeper works off of injection webhooks and so there are lots of policies that are designed around making sure that resources don't get onto your cluster it also has policies around pods and how they function so for example you can have a policy that says don't deploy any pods that allow privileged access and that kind of thing so in the screenshots there when we show the constraints you can actually see how those policies are written they're pretty short they're pretty easy to write and they're pretty easy to modify so they're pretty accessible so yeah in general when resources are applied and then also when pods try to start those are kind of the two main areas of policy for Gatekeeper and then OPA in general if you're using like engine X sidecars or things can also have policies around how things communicate with each other on the cluster but that's separate from Gatekeeper that's just OPA oriented on like a not an engine X proxy an envoy proxy thank you alright with that if you have any other questions feel free to talk to me after and thank you so much for coming to my talk I appreciate it