 Hello everyone and welcome to our webinar today. We'll be discussing an introduction to cloud custodian simple rules for your cloud cluster and terraform Everyone I'm sunny. She I'm a staff engineer at stacklet and maintainer of cloud custodian And I'm George Castro community manager a cloud custodian. So what exactly is cloud custodian? Cloud custodian is a YAML DSL policy engine For the cloud it scales up from the startup level of just having a few resources and is used in massive enterprise scale in production By many large organizations the intent is to drive behavioral change and tighter feedback loops for your developers But what does that actually mean? So I'll give you like kind of my plane We can go back what my plane my plane explanation for it is You write your policies in in YAML and cloud custodian is a Rules engine that runs in the clouds control plane and ensures that the policies that you're writing get enforced on your cloud So a typical example we like to use is make sure that we're not Opening a database and leaving it on the internet. So you have rules that kind of manage all of your resources And typically you check these into get or some kind of version control and then cloud custodian ensures that those policies are enforced on your cloud to make sure that you know if you have a policy to make sure that a You are not supposed to serve open databases on the internet then we make sure so in a lot of ways the Analogy we like to use is a seatbelt to your cloud resources to enable that If you accidentally do a thing manually that you have automation in place to keep you safe next and Recently as cost has been more important for people over the past 18 months Using these policies is also a great way to ensure that you're managing cost in your cloud So you can use cloud custodian and people are using cloud custodian It's not just ensure that they're meeting compliance needs But do you have a bunch of unused EBS snapshot somewhere or resources that might not be tied? to a specific account that you were expecting or things like that so by Defining all of these rules you can manage your entire cloud deployment more smartly and things that aren't supposed to be there cloud custodian can kind of Garbage collect for you. That's where the analogy of the custodian is is you define what resources and their limits Are supposed to be in a certain place and custodian kind of forces that for you and this is useful in the cost aspect especially because Resources that you are not tracking tend to kind of pile up So having that garbage collection for a lot of organizations ends up being a Significant cost savings by ensuring that what they think is running in their cloud is the actual thing. That's right And of course compliance One of the great things about this tool is that you can catch kind of your compliance and rules and by version controlling them and using them in CICD Cloud custodian kind of enables a GitOps workflow that allows you to manage all of that stuff in a tight feedback loop Because it does do real-time Compliance checking of these rules. So if today I were to try to deploy something You know one of our resource and it was Violating a policy custodian, you know, if you set it up that way can remediate immediately and Notify me that hey, you know There was a resource that I asked for that isn't getting made because of these reasons and what we are trying to do as we alluded to earlier is kind of drive that behavioral feedback of You know, okay, so where where do I fix this if I tried to set up a thing that wasn't compliant Where's the actual issue that I need to fix? Does it need to be in my terraform or or how can I then enable my developers to kind of instead of running into these guardrails to kind of Allow them to have that self-service in order to help change that organizational behavior to be more compliant next and Correctness You know it It's kind of Inefficient to set up a bunch of stuff and then find out that some of it is Uncompliant to have to tear back down that cost time that goes resources Developer time especially so that's kind of why the model the mantra behind a tool like this is to enable that type feedback loop Driven all by your existing automation That you have and that's what we're going to talk about here today specifically around Kubernetes clusters and around Your terraform Yeah, so what does all this look like exactly you start out with a policy so First thing you do is specify a name for your policy and you also have to select a resource in this case We're looking at s3 buckets in AWS Then you can define Any number of filters that you want it to filter on for those resources in this case? We're saying we want to find any buckets that have a head bucket and get objects actions that Allow the account listed here to access it And then you can specify what actions you want to run so in this case, we're saying we want to notify the resource owner And also to send a slack message Using a certain policy template so that way you can send these notifications directly to the people that are violating your policies instead of having to do something like keep a list and then track it down and You know pass around a CSV or something to your your engineering teams And finally to do all this you just run the custodian run commands where you pass in the name of the file and give it an output And then you'll start to see your policies running So here's another example policy. So in this case, we're looking for I am roles that are over over provisioned So you can see that we also support These these knots and an oars so any sort of boolean expression that you want to have So we're saying ignore any any roles that are named I am provisioner And you want to check the permissions to say any roles to have this I am change password action inside of their Inside of the the role itself And again, we want to notify that so in this case instead of sending it to the resource owner We're sending it to the security email distro and copying the the cloud team as well So finally also custodian policies can be run in two different types of modes So there's a pole mode where you are querying the cloud itself or the cluster directly So in this case every single time you want to check those over provisioned. I am roles You're checking everything that's out in the cloud currently There are also event based modes which utilize things like cloud watch event triggers cloud trail and config On the aws side and we have equivalents for that in adjunct gcp These modes allow you to trigger off of events that happen in your cloud As well as in your cluster. So that way you can be much more reactive as well as do things like Remove any non-compliant resources that are net new instead of having to wait for the resource to Exist in the cloud for a while And then do some sort of action because that can lead to things where You can potentially take down live running services for example So clock studying in kubernetes has support for those two modes The first of which the pole mode so you can query your cluster with the same policy language as your cloud Basically, this means that if you're familiar with running kubernetes policies for aws azure gcp, you'll feel right at home In addition, there's a kates admission mode where you could run kubernetes policies in an admission controller mode to allow deny Or warn on any sort of object lifecycle event It's easy to to deploy your cluster with the helm chart And you can also do things like auto label objects as they come into the cluster to determine resource ownership. For example Finally we have terraform support as well. So Not only can you govern your infrastructure that's already out there You can also use custodian to govern your infrastructure as code. So this allows Your developers to know ahead of time that the the things that they're deploying are not going to be compliant or they're not going to be In line with the guard rules that you've set this way They can make those changes early on and not have to deal with the headache of going going back and potentially having to do things like stop a database schedule with downtime and recreate it In addition c7 and left will also annotate these policy violations in line Which is really nice to see this is the exact thing that I have to change according to the policy itself And makes it a lot easier for developers to do the right thing So we'll go off and do a quick demo Let's see So the first thing that we'll start with will be a kubernetes pull mode example. So On the left here on my screen, I'm just running the kubernetes admission controller, which we'll get to in a second But first let's run the All see for kubernetes. So like I said, this is pull mode. So this is pulling directly from my cluster And if I take a look at that Resource dates on that comes back. So This is basically all of the information that you would expect to see if you do like a two cube ctl described pod and if you get every single pod and This is a great way for you to see Attributes that you can filter on for example So let's go and take a look at the event based modes. So The first thing that we'll do is we'll take a look at our policies that we have here so The policies here are just in a config map that we've deployed to our kubernetes cluster And you can see here. We have a few so the first one here deny pod exec based on the pod We have another policy here checking for missing recommended labels Another one restricting service account usage on pods and then one last one showing that We need to require at least three replicas on any kubernetes deployment that we have So the first thing that we'll do is we'll try to create a pod so Take a look at our pod manifest right here. So the first thing you can see is We've we've got our pod manifest And if we try to deploy that You can see We get a warning saying this he recommended labels all pods must have foo and bar labels So you can see in our manifests here. We only have the foo one. So We take a look at our pod that we created Not only do you see we only have this foo equal bar label, but we actually use the um The policy itself to append the owner contact label here So we can see that the kubernetes admin was the one that created the resource And then we also have this additional message that we That we appended as the label saying it's missing labels. So If we delete our pod there And then Let's go ahead and add our bar label You can see if we don't get any warnings the pod was created successful successfully So this is a great way if you want to sort of ease developers into Making sure they're doing the right thing before you do a hard restriction The next thing we'll do is actually let's keep that pod up Next thing we'll do is we'll try to do a exec into that into that pod. So if we run qctl exec Let's see here that we actually get an error saying that It failed due to these policies which says you can't connect to any pods with database in the name or the namespace c7n system So this is really great to allow you to have more fine-grade control on some of the Actions that developers can have against the resources And let's So the next thing we'll do is check out how to What happens if we try to create a pod with a more restricted service account? So the first thing we'll do is we'll create this service account here. Well, that's called cluster admin And let's try to apply pod with service account Actually, let's take a look at what that looks like first So here the main thing is that we're using the service account called cluster admin, which I'm sure you can assume Has all sorts of permissions that you don't want everybody to use So if we try to apply that so we applied pod with service account You can see here that again, we get this restriction saying you can't use that service account uh, finally We had that policy there that restricted Deployments saying you have to have at least three replicas on your deployment so if we take a look at our Deployment yaml We see that this one has three. So this should be able to work just fine. But if we go ahead and drop that down to two And we run a cube ctl apply Deployment that yaml you can see here If failed admission due to the policy require at least three replicas so Let's go back in and change that two into a three You can see that our deployment was able to go through just fine um, so again, all the stuff you see on here is basically what you would see in your Logs for your deployment when you deploy this on your cluster Basically, it'll match against only the events that you actually care about for your from your policies So the next thing we'll take a look at will be our C7 and left demo so C7 left is a separate CLI from custodian It has one command. So we'll just look at the help here. So There's a C7 and left run command Can you sorry, can you increase the font up one on this one? Sure. Yeah. Yeah So if we go to C7 and left run help Um, basically what you do is you can pass in a policy directory Which will be your custodian policies as well as a directory for your actual terraform itself. So If we take a look at the policies You can see we have a policy here that says all resources should be tagged And specifically it needs to have this environment tag And then we have one saying that all sql must be encrypted. So If we run C7 and left run, uh, and we give it our policies directory as well as our uh current terraform directory we can see here that We failed two of these Uh policies. So the first one's saying that uh sql must be encrypted And the second one here is saying all the resources should be tagged So if you look at our main dot tf here We can note that so this first one we have a sql q that we just have Here it's not in a modular thing. It's just directly in the in the main terraform. Um, so if we Add our tags here So that should fix the first one And then you can see in the you can see here in the second one. We're actually using a remote module. So Rather than Only be able to test the terraform that you have directly inside of your local Terraform workspace You'll actually be able to look up the the module references as well. So here our problem was that if we had Uh managed ssd enabled set the false. So if we set that to true that should fix it and then we run c7 left again You can see that we have Pass all of our policy checks. Um, and you can also look at the summary based on the resources as well So in this case, we have some ion documents and those pass as well as our sql sql So those are the demos. Um, and I'll go back to the The slides here and that's basically a tour of custodian on a cluster And uh in infrastructures code if you're interested in this you'll find us at cube con and cloud native con in europe in amsterdam Coming up and uh, we don't have any information now But hoping to also have a maintainer session as well if you're interested in contributing and checking out all the Um Cool stuff that an open source project has to offer and with that summy. Thank you very much and thanks everyone for listening and uh Feel free to join us cloud custodian.io. Thank you Thanks everyone