 Hello everybody and welcome. We are going to go ahead and get started here. It's the top of the hour We got a lot of material to cover so we want to try and give you as much of this as possible and also leave time for questions This is the caverno overview and what's new session. Thank you everybody for being here My name is Chip Zoller. I am a caverno maintainer and I'm joined by my colleague Zach Swanson Hi, I am Zach Swanson. I'm a staff engineer at Wayfair and And this is us if you wanted to confirm our physical identities All right So before we get started if I can just get everybody's attention for two minutes So if you got a phone laptop just put it down for one second by a show of hands Who here does not know anything about caverno or have never heard of it who doesn't know anything about caverno? Okay, cool hands down now another question for you Who here is using no policy whatsoever in production in at least one environment if you're not using any form of policy today? All right. All right. Well, we got a lot of good stuff to talk to you about today All right, so first of all for those that raised your hand that you aren't using policy today Why do we need policy? All right? Well number one. Kubernetes is not secure by default a lot of people That have not heard this or kind of surprised to hear this is the case There are security elements. There are security tools in kubernetes, of course But by default it does not have the security posture. That's acceptable to most production environments RBAC which is a pivotal piece of kubernetes security and indeed any type of platform only takes you so far So you can't do some things that you necessarily need to do and RBAC is only gonna run it has a runway So policy can be used to extend our back to your specific use cases your governance is your business kubernetes is just an engine it has the tools that are available, but ultimately what you need for your environment for your Organizations, that's your governance structure. That's really on you to figure out You can use the tools to do that, but nobody knows what labels you need on there What you shouldn't do in some situations that go beyond basic security and Not everyone is an expert at this stuff like kubernetes is hard anyone here think kubernetes is easy like you got it in the bag No problems with it. Yeah, I don't see too many hands going up So this stuff is difficult and not everyone's an expert. So it's also about creating guardrails that help people Do things that they should do not because they have malicious intent, but because sometimes we forget and we need a little bit of help automation of Operations can be a game changer policy isn't just for security. It's not just for preventing bad people from doing bad things It's also about helping you in your jobs as DevOps engineers or whatever your role might be by eliminating some Manual efforts that you could be performing and we need less complexity not more policy can help us reduce complexity consolidating workloads and consolidating tools into one of course depending on the policy engine Which which if you're not familiar with Coverno if you're one of the ones that raise your hands You'll quickly see that it's quite capable and it allows you the ability to reduce that complexity All right for those that do not know what is Coverno The word Coverno is Greek for govern. This is a CNCF incubating project It is growing very rapidly started obviously in the sandbox now. It's an incubating project It is an admission controller and then some as you'll see But at its heart Coverno is an admission controller for Kubernetes. It is purpose-built for Kubernetes It was not originally designed for use cases outside of Kubernetes We'll be making some announcements here that will change that but it was purpose-built for Kubernetes Not a general purpose policy engine as a result of that the next bullet, which is why most people Tend to like and use Coverno. There is no programming language or knowledge of one that's required So one of the first questions that we get is how does this differ from something like OPA or gatekeeper both of those require Rego Coverno does not Coverno uses simple policy syntax and language that does not require that you do any programming It's also the most popular Policy engine that we found for Kubernetes and it has the largest policy engine of any policy Library that's out there. I think currently we have just over 300 sample policies So a lot of stuff that can get you started These are some of the Coverno adopters. So a lot of large companies are using this in production With more that are added every single day And here's some of the broad use cases not just for policy in general But what policy inside of Coverno can enable so pod security. This is sort of the quintessential use case Where most policy engines begin and end this is what most people think of when they think of policy It's oh no security. We need something so that pods don't run his route so that they don't run host path Etc. Etc. Of course Coverno does this very elegantly very simply, but that's just really the beginning Fine grained our backs so you can do things like only users that have this role can create a secret that has this Specific label key and value. So again, I talked about augmenting our back This isn't something that you can do with Kubernetes are back today, but with a policy engine like Coverno You can do this Cost control many elements of cost control. They're out there, but one of them Which is a pretty predominant one say you're running in a public cloud provider You want to be able to say something like perhaps only one service of type load balancer can be created in AWS Why because that has cost so you want to be able to limit things that have cost implications that are in your infrastructure Ops automation so you can do things like sync this config map everywhere when my cert is Updated so Coverno can watch that and perform synchronizations and create new resources and we'll get into that Multitenancy Multitenancy is a big deal. It doesn't necessarily mean multiple customers coke and Pepsi Coexisting on the same cluster a lot of organizations internally operate like they're a multi-tenanted business Coverno can be used to do things that help you along your journey to get multi-tenancy by eliminating some manual steps like for example Every time you see a new namespace be created create these three additional resources in that and do some things to them and Also supply chain security supply chain security very hot topic Coverno has deep support for all sorts of elements of supply chain security So for example all images that have this name or that come from this registry and are of this repository must be signed with a Must be signed and attested using this key So on and so forth. So these are some of the broad use cases that Coverno supports and now I'm going to turn it over to Zach and give you a little bit So he'll give you a little bit about how Wayfair is using Coverno in their journey. Thanks. Yep. Hello everyone If you're not aware of Wayfair, we are the destination for all things home. We are an e-commerce platform in the home goods market We've got about 14,000 employees right now around 2,000 engineers we Most of our compute is on GKE and everyone pushes to Kubernetes we have about 15,000 deploys into production every month That's around 500 every day And that all gets processed through our Coverno admission policies that we use So a little bit of the scale that we're running We this is sort of an aggregate of All of our Kubernetes clusters for the taking that a snapshot last week We run at a fairly large scale. We run large multi-tenant clusters multi-tenant being all of our developers, right? We treat each of those as an isolated tenant We're running about 56 validate rules the number kind of fluctuates depending on what's going on at any given moment We we add and remove them as necessary And around 20 mutate policies and about 10 of those are 10 of those rules are firing at any given moment, okay? Just a bit of scale right So what are we doing with Coverno? We kind of broadly Categorize these into two different things and the first one is we protect the platform There's pod security as chip alluded to that's sort of the usual thing, right that everybody thinks of but we're also doing things like preventing Someone from re-declaring an ingress host right and routing traffic to a different service than was intended or re-declaring a TLS using a cert manager That is actually manage external to the cluster, right? Stuff like that. We use it to prevent some difficult to debug situations preventing users from enabling certain features that make it hard for us to track down what they've broken Another point that chip made right is Locking down the types of ingress that users can create so that they're locked into the platform, right? Security if you attended the ingress engine X Meeting yesterday and learned about sort of the CVs that they've had recently We're using Coverno to help lock down lock down those annotations, right? Other stuff is you know making sure that If you're using an HPA or a pod disruption budget that you're using that in a way that works for the platform Since you can inadvertently kind of shoot yourself in the foot there We use namespace labels a lot in the policies to target them So not every policy is broadly applicable to every namespace and every application and we use the namespace filtering in Coverno to help with that And another big thing that we do is we change the platform out from under the devs without the devs really having to do anything So when you're running it two thousand plus developers and four to five thousand applications that are flowing Right and earning money for your company We try to minimize Having to go out to the developers and ask them to change things right because asking asking them to update four thousand plus GitHub repos to change an ingress class It's gonna take months, right? And at some point you're gonna end up having to do it for them, right? So here's just some examples of things that we've done with mutating policies to change things, right? We've swapped from the deprecated ingress annotation of for ingress class over to the ingress class name without requiring devs to update their helm charts Or change any of their configurations. We automatically set image registries for pods That allows us to Fail over from one registry region to another if we have to but also so the devs don't have to worry about what the End state of the image registry looks like they just know my image is tagged as you know way fair of my app Right and that and that's all they have to worry about the SDO team has used this to seamlessly migrate teams Between a gateway class that they're deprecating over to a new one to change how the architecture of the network work, right? we recently adjusted a The Kubernetes default scheduler a little bit to improve the impact and once we had proved that out we Mutated everyone's workloads so that they automatically use the the new scheduler configuration and and improved our efficiency on our compute platform and then another big win a Kind of an add-on that we do for our workloads Discover that we were over provisioning resources and we with a simple mutation policy We're able to without having to ask anybody to update a chart or do anything We automatically reduce everyone's usage of those resources and we save tens of thousands of dollars a month For our application developers Okay, a quick I'm not gonna go over all this for you. This is a lot. Why do we migrate to caverno? We were on gatekeeper before we didn't have a lot of rego experience. Rego is difficult The documentation for it is rather tricky. We found there's distinct differences between gatekeeper and OPA and they're not obvious and the documentation is not very obvious on that We came to have a very high reliance on stack overflow examples Where you discovered that there's ten different ways to do any single thing in gatekeeper? And a key thing for us was we didn't have a centralized policy team organizing all policy across all of Wayfair so Infrastructure was using you know Terraform Sentinel Security has their own tooling that they're doing so we didn't have a requirement to stay in OPA because we're doing this centralized so After we moved to caverno, there's a enormous policy library publicly available right for that We can refer to and use if it's applicable We drastically reduced the resourcing that we were spending on gatekeeper We went from running about 14 to 20 pods on average of gatekeeper just to keep it happy with acceptable admission latencies to running the minimum three pod HA of caverno And reducing the amount of CPU and memory allocated to it at the same time with no impact to the the platform We found the caverno community is very active. I'm a part of it. I'm in the slack channel a lot It's responsive you get answers to questions you get help then and there's a the community is willing to engage with everyone We had an engineer that spent over a week trying to write a fairly simple policy and gatekeeper and just work through the Syntax and how to test that and while we went to caverno. He wrote the exact same thing in under a day with tests And validated it without having to actually read any caverno documentation He's able to just look at examples and discern how to make it work. It was fantastic Hey Just real quick how we migrated. This is Fairly straightforward We started with just a proof of concept demo to the team where I took the most complex Gatekeeper constraint that we had that was perceived to be very complicated and then proved that hey Yes, this does translate into caverno, right even though caverno is not a fully qualified Programming language is able to handle the this big constraint that everybody is very concerned about right? We took that and then one by one shifted started, you know Retooling gatekeeper constraints into caverno and deployed caverno in parallel with gatekeeper Started building confidence in what we had built using the caverno testing utilities and then started one by one Activating caverno policies from auditing mode where they're taking no action Excuse me into enforce mode and that way while gatekeeper was running and while caverno was running at the same time We could verify that yet they have the they have the same output right the blocking the same things They're allowing the same things nothing unexpected And then at that point we're able to just slowly start disabling the gatekeeper constraints one by one Point here being Migrating if you're already on OPA and you'd like a simpler solution. It's not hard to switch to caverno It was a fairly straightforward process All right over to Chip Zach all right, so With that let's take a look at a feature walkthrough and get into some more details here All right, so first up Let's look at the components of caverno not going to boil this slide But on the left the cluster-wide resources just like a lot of tools you get, you know all of your RBAC components caverno Will create and manage dynamically the web hooks based on the policies that you have deployed Each of the colored sets of boxes corresponds to one different type of controller So caverno is broken out into different types of controllers. There's an admission controller There's one that handles background generation tasks report tasks and cleanup tasks, and I'll go over what these are in just a minute so naturally each of those things have their own set of services and a bunch of other Assistive assistive devices and resources that are that make them up. So policy structure This is what a caverno policy looks like if you've never seen one will show examples of that a Policy is a container basically for rules where one rule can be of Any of the types that are down at the bottom so validate mutate generate verify images. Don't worry. We'll go over those Each rule also has to have a match block You have to match on what it is that you want to do something to You can optionally have an exclude block and those things over there to the right the bulleted list are all of the things that you Can both match and exclude on as you can see it's fairly extensive you can even do labels and annotations One thing to call out here is caverno will actually allow you to match on your roles and cluster roles Because it will calculate or determine that at the time the request is made This is not something that Kubernetes normally surfaces by default an admission review request For those that may be familiar with that So these are some live examples of what a typical caverno policy looks like so on the left you have a validation rule This is the yes or no response This one is just checking for a label and I'm willing to bet that you can probably determine what that's doing If you've never even seen a caverno policy before as you can see there's no rago. There's no programming It's just a simple overlay syntax Policies can be more complicated, but these are very simple examples the one on the right is a mutation This is going to add fields if they do not exist with the values that you see there again. No programming language Validation let's let's get into a little bit more of what validation means So this is the the most common policy type. This is the yes or no response Here's a resource that the Kubernetes API is sending you have a matching policy Do you allow it? Yes or no has to be one of the two responses? Yes or no This is the most common rule type. So if every example of pod security, these are all going to fit into the validation space Here's a pod What do you think of it? Should we allow it because it's good or should we deny it because it's bad? Validate rules can be written in two primary different ways You can either use a pattern overlay, which is what the previous example showed. This is for more simple policy styles Many of them that we see employed use these simple patterns or they can get more complex by using these advanced level of expressions They're currently two failure behaviors that that validate policies Are capable of one is an audit mode in which the resource will be allowed no matter its disposition Or in force it will be blocked if it's bad Another cool capability of validate that that some folks may not know about is that Coverno also has the ability to validate YAML manifest signatures. So without going into too much detail. There's a project out there called six store Many of you may have heard of they have a tool called cosine There's a sub project that allows you to sign YAML manifest Coverno can actually verify the signature on a manifest So this is great for preventing for example tamper proofing of just Tactical fields that are inside of that resource if you wanted to employ that Mutation so mutation modifies a resource this always occurs first in the admission chain So whenever an admission review comes through all of the mutations will happen first followed by validation second Two ways that you can write these rules You can either write them using a strategic merge patch Which is that simple overlay style pattern shown at the top on the right side or you can use a JSON patch No funny business here. This is your standard RFC 6902 JSON patch you probably use them in other commands and other tools You can copy and paste those into Coverno and use those there if you like another nice thing about Coverno that At least my knowledge other policies don't really have as you can mutate existing resources So you can do something like watching an event and then mutating a config map that already exists in the cluster So not only can you mutate something that pre-exists, which is Not something that comes through the admission review review process. You can mutate something different. Oh And also if you're interested in on at the bottom there, there's a link and these slides are already uploaded by the way There's a blog I wrote that goes into how you can even take this to sort of the nth degree and set up a one-time Passcode system using quotas all using Coverno kind of neat to look at that generate mutate existing case Generation so this is what will create a new resource in your cluster in response to something else that happens in the cluster The source of this can be a clone as in here's an existing resource. It's already out there I want you to create the new thing for me But base it off of another resource or you can define it in the policy So the right side there shows an example of one of these generate policies In which the in which Coverno is generating a new network policy every time a namespace is going to be created And you can see there that the contents of that network policy are defined in the policy itself This also has a synchronization ability So any changes to the source resource if you're using a clone or a trigger resource the triggering resource Will result in that having an influence on the downstream resource This is naturally good for tamper resistance because if somebody tampers with the downstream resource and you have synchronization enabled It will revert that resource back to its previous state declaratively defined in the policy This is nice because it obviates or can't obviate the need for specialized or platform specific operators There's several different operators that are out there that do something very similar to this You can converge several of those into just Coverno and eliminate some of those other tools Coverno also has the ability to generate for existing resources So all of us have some sort of definition of what brownfield means You've got an existing cluster with some workloads that are already there You can do something like for example Introduce a new Coverno policy and say for the existing namespaces that are there not the new ones Give me a network policy that looks like this That's generation Image verification so Coverno has the ability to verify image signatures. So these are Container images that have been signed either with cosine or notary This is able to also verify attestation So if you're attesting images if you're using salsa provenance if you're using an attested s-bomb if you're using attested Vulnerability scans etc etc Coverno can verify not only that the attestations exist But that contents in the structure look a certain way This is all integrated There's nothing additional to install if you don't want this you simply don't write a policy that requests it There's no other components to install Any OCI registry is supported Coverno is not is not specific about which registry And also with the new changes that are coming The OCI 1.1 specification that has what's called refers API Coverno supports this won't go into detail on that There are other sessions going on this week that talk about refers API Coverno supports that this supports keys Including KMS from different providers certificates and key lists that that cosine offers you can also perform decision caching This is a new enhancement that that we're bringing out So whatever the decision was Coverno can cache that and not have to go look up those signatures time and time again And you can do multi-way checks so very granular you can do things like any of these Keys must match all of them at least one whatever the case is so that's an image verification Cleanup Coverno has the ability to help you in your job by keeping your clusters nice and tidy reducing costs Reducing sprawl. There are two different mechanisms that you can do number one Surprise a policy you can write a policy that enables you to clean up these resources through a definition And also something new that we are bringing out in the next version You can simply apply a label a time to live label and based on the the time that's there Coverno will expire that resource see it and then remove that This uses the same Coverno policy concepts as all the other rule types. You will need explicit permissions Coverno Attempts as best as we can to ship with minimum privileges So if you want to clean up specialized things you may need to grant it additional privileges It'll tell you by the way if you try and create a policy and it doesn't have those Some use cases for this are things like removing cruft in your cluster We've all got some bear pods that are sitting around that we've done some troubleshooting Resource expiration dates if you want something to be short lived and one it automatically removed you can do that and eliminating things like violating resources This dovetails nicely with Coverno's other rule types like validate mutate generate you can do things like saying Only these types of people are able to assign that label or change the value or remove it You can mutate a resource that automatically adds this label etc. Etc. So that's cleanup Policy reports so this is an in cluster report on the results of validate and verify images rules This is an open standard from the Kubernetes policy working group So this is not a Coverno specific thing But this is an open standard that other tools in the ecosystem also adopt This decouples policies from the results of those policies what this means is that you don't have to go parsing through policy engine logs You don't have to go looking at the status object of a policy to see its effects You look at this resource that's in the cluster called a policy report and it will give you all of that information Which means that you can do things like entitle it to different users and roles This empowers things like developer self-service because now nobody has to even look at a policy to know the results You can just have them look at the policy report This will provide results from admission mode So it review resources that are coming in through the admission review process or by background scans Coverno will periodically if you enable it Background scan the cluster and look at the resources that match policies and tell you how they line up in these policy reports This will assess rules that are in audit mode prior to enforce So as Zach mentioned this can be and often is a way to migrate to Coverno If you're using something else maybe you're using PSPs and you still haven't gotten off of them or something else Put your policies in audit mode look at the policy reports when you feel confident that they're correct Then you can flip over to enforce mode There's a namespace and a cluster scoped variant and you got a bunch of different results in the policy reports There is an open-source tool that's in the Coverno organization called policy reporter. It's got a nice UI Works with any policy report. You can do things like sending alerts to remote destinations emails got metrics a bunch of other things So encourage you to go take a look at that if you're interested in that and policy exceptions so policy exceptions are a Custom resource type and on the right there you can see an example of one this decouples ownership and lifecycle of exclusions from policies So Coverno has the ability to exclude things that are in a policy But you can also break that out into its own separate resource which allows Self-service so that developers and other users can say I don't know anything about Coverno I don't have access to look at the rules, but I need an exception so I'm I'm going to request a policy exception and It can You can accept the resource as a result of that This will also factor into policy reports So you can see if there's a policy exception that's provided there and this can allow even pretty granular Exclusions for things even like per image You can use our back get ops and even other Coverno policies to provide some guardrails for those Including things like yaml manifest validation. I just talked about on the validate slide So testing Coverno has a couple of nifty tools that you can use to test it has a CLI This is a standard Golang Fully compiled CLI that allows you to test policies outside of a cluster. So great for that shift left You can test for specific results using the test command or you can check unknown manifests and see the results of policies by using apply As I said mentioned using it in pipelines Got a Qt control plug-in that's available through crew a GitHub action that you can quickly plum in and of course just manually download and then fairly new here the Coverno playground This is a web-based graphical policy editor and tester And there's a public instance available at playground at Coverno I oh you can also run it on premises If you want we have a helm chart that allows you to do that easily or standalone binary It gives you access to some more advanced Coverno settings that the CLI didn't super nice got a bunch of extra capabilities and Some of those capabilities are shown here. So here's just a quick little demo on the left side as a policy on the right Is some resources click the start button and it's going to give you the validation results and you can see the results of What the application of that validate rule and now you can go do a Mutate rule for example same sort of situation we'll see what the mutation is going to be when we click start here and It'll show you that it passes the mutation and if you want to see exactly what Coverno did Open the details page and boom right there You can see the highlighted fields of what your source resource was and what Coverno did to change that resource All very simple and easy to use Okutramas so these are your additional things We don't have time to cover but if you're interested the slides are available as I said These are all linked to different things that are on the Coverno homepage Coverno's got a bunch of other Different abilities that assist in your ability to write policies Policy library with 300 policies a distributed tracing Pod security admission libraries baked in I'm not going to drain this and now what's new so This is the meat for a lot of folks so Coverno 1.11 is our next release. This should be available very soon We have released candidates that are currently available Validating admission policy Coverno is the first policy engine that has validating admission policy support in four different Ways you can write validate rules using cell expressions. What is what which is what validating admission policies support today? You can have Coverno generate and manage those validating admission policies from those cell-based validate rules If they make use of certain qualified policy language You can also test validating admission policies in the Coverno CLI So if you didn't want to use a Coverno policy at all you just wanted to be able to test your validating admission policies The Coverno CLI supports you to be to be able to test those outside of a cluster and you can generate policy reports from those validating admission policies Some other enhancements that are in 111 policy reports now our results are now per resource previously They were per policy Cosign 2.0 supports been added this includes notary updates and also that referrers API that I mentioned earlier That cleanup TTL label that I also talked about that is a new addition in 111 So you can assign that label and Coverno will automatically track and scrub those resources when that expires And there's been some massive CLI refactoring along with the new test schema that'll help keep you on the guardrails Two new projects that we'd like to announce today One of them is called chainsaw. This is a new test tool that was Inspired by another tool that's in the CNCF this allows you to test any sort of operator that you want It's not there's no dependency on Coverno. You can use this to test other operators and other resources And now Coverno supports json. So typically are in the past Coverno was a policy engine that was only limited to Kubernetes We're glad to announce that that is no longer the case Coverno now supports arbitrary json So you can use this in pipelines to test anything that you want any json that you can provide it You can write a Coverno policy and be able to validate that and and follow those links for more information What's next we are currently planning for graduation So we're working towards that we're going to be looking at resource caching for anything Version two apis. So our policies are up to up to date a new warning mode for each loops for generate rules Some policy exception enhancements and aggregated apis for policy reports are what we're looking at Here's a way to get involved Come see us in booth f 19 a couple other sessions that are happening on thursday The rest of this you can see on the slides when you get them and with that we have about two minutes for questions If you have a question, please step to the mic so that we have that on the recording Afterwards if there are any folks that have lingering questions Let's take those out on the hallway so that we can allow the next presenter to come up and get set So glad to take your questions now if there are any and otherwise Please scan the qr code and give us some feedback on this session. We'd love to hear it. Thank you very much All right, so just to repeat the question for the recording if it didn't show up So how do you balance or reconcile mutations with some state that could be stored and get good question The answer is it really depends on what you're doing There are some things that you really don't want to mutate And you want to either provide validations for or make sure that you shift that as far left as possible In other cases though, there are runtime data that you cannot get any other way So depending on what you're doing you may need an id or you may need a u id or some sort of Something that's just not possible to have codified and get it's also fine to add typically it's tip It's it's generally fine when it comes to mutations to add fields when you start running into problems is when you change fields That your get ops controller believes that it Uh that it owns Wayfarer is Shifting dargo. Yeah as the bootstrap. Yeah, we're doing the standard kind of terraform doing yourself at the moment at the moment And we're moving away from that Yeah, well, I think I can address that the question the question is When we're bootstrapping clusters do you also include all the policies and uh for our purposes? The answer would be yes, uh caverno would be a later dependency that would come in on the argo chain Yeah, yeah, so we yeah, it would be in a later caverno would come in a later sink wave. Yes All right, and we are we are unfortunately out of time for questions So thank you everybody again, and please if you'd like to follow up with us on questions Well, we're going to take them outside. So give us just a minute to get wrapped up here. Thank you very much