 Hello, welcome. I am Andy from Control Plane, and I am standing in for my right honourable head of security, Dr Rowan Baker, who very sadly has gastroenteritis and did not deem it fit to get on a plane very sensibly, and a lot of the work in this talk is part of the collaboration with Henry Mortimer as well, one of our cloud native engineers. So huge thanks and greetings, gentlemen, from the past. Thank you for all the work you've put in today, and I hope that as much justice as possible. We will be talking about infrastructure as code potholes with cloud controllers. This is the reverse intersection of GitOps, essentially, and event driven automation. So when we're trying to do infrastructure as code with event driven automation, where are the foot guns, how do we fix things, and where can existing tooling come into play? So we're going to talk about some pain points we've seen around configuring infrastructure securely at scale within large organisations using cloud and policy controllers in Kubernetes to address some of these issues. And as I say, a huge thanks to Rowan and Henry. Very briefly, Rowan runs the security team at Control Plane. Control Plane is a cloud native security consultancy. We are based out of London, UK. We have offices in North America and Asia Pacific, and mostly do consulting for regulated industries. We are, as always, hiring. If you're a security engineer, a supply chain specialist, an architect or threat modeler, please do come and have a word. And the great disembodded Dr Baker. Quite worked how I hoped. I'm going to try it one more time. It should be coming out of here. I've got to need luck. That's not. I think it might be my food poisoning a couple of days ago, so I've been unable to travel. I just wanted to introduce the talk before my distinguished control plan colleagues take over. Today we're looking to show how Kubernetes provisioning infrastructure such as load balances can cause issues for large organisations in regulated industries. Their security architectures are typically dependent on infrastructure as code pipelines featuring policy checks. So Kubernetes provisioning infrastructure just isn't part of the plan. We'll describe how that Kubernetes provisioning infrastructure can be secured using policy agents at cluster admission time. And then we'll look onwards as to how we can start using this pattern to secure infrastructure provisioned by an exciting new generation of cloud infrastructure controllers such as Cross Plane. Then towards the end of the talk, we'll show how we can work towards securing that infrastructure in a manner where we achieve both automated compliance against NIST standards and simultaneous enforcement of preventive controls, preventing misconfigured infrastructure from being launched all via a single OSCAL document that forms the single source of truth for both. And illness notwithstanding, we hope to open source the repo of example policies and OSCAL component definitions for AWS services shortly. So with that, I'll leave it to my colleagues. I hope you enjoy the rest of the talk and the rest of the conference. Thank you, Rowan. Right, there are some demos. Fortunately they have been ASCII Cinemod for me, so I can't but to them. Let's rock and roll. So using infrastructure as code policy with policy gates to ensure infrastructure is securely provisioned should form the backbone of any cloud security program. We can create infrastructure as code templates using Terraform at Al that hard code secure by default configurations that we can distribute and reuse for consumers of cloud infrastructure in large organizations. And we can back this up by allowing users some degree of customization, adding policy gates into pipelines using tools such as Sentinel policy, OPA, check of, et cetera. However, this does not account for infrastructure provisioned in an event driven basis by cloud infrastructure. Most organizations have some non Kubernetes based workloads into cloud first and it's often months or years before a cloud migration includes Kubernetes, which then throws a metaphorical hand grenade into the enterprise security architect's office. Most obviously, Kubernetes provisions service load balancer types. This happens rapidly and flexibly at a developer's request by the cloud controller APIs. However, this can bypass infrastructure as code controls. I'm going back to everything defined as code and get ops. These things just appear in our dashboards rather than being defined and subject to policy. Annotations can define that load balancers configuration and if unrestricted can create insecure infrastructure by adding roots and punching through other ways theoretically secure policies and firewalls. We can see from this example that the SSL cert is commented out and the load balancer is internet facing. It is easy to footgun or intentionally bypass policy in this way. Trying to provision a load balancer via infrastructure is code like Terraform and then integrate it with Kubernetes, which gives us something that we can apply policy to is more difficult. Pods and nodes scale with non static IPs of course and encoding these targets into the balancing pool of a load balancer is difficult. In fact, you can end up using a controller to do this. Whilst you might end up with something that's compliant to policy, you complicate life for developers and operators. So the logical way to ensure these types, these services of type load balancer are secure is to use a policy engine such as Converno or Open Gatekeeper. We install those as an admission controller and those admission controllers intercept the requests as the objects are written to the Kubernetes API server of course prior to the persistence of the objects. So today we'll talk mostly about validating admission controllers that simply send back a positive or negative response to allow or prevent an object being persisted. That persistence then feeds into the provisioning engine and then the infrastructure is either permitted to deploy or not based on the presence of that entity. Using admission controllers and cloud controllers in tandem, we have a single enforcement point for application and infrastructure with policy defined as code. For completeness, cluster admission is not the best time to discover that something is misconfigured of course and the reality is that we can't always run admission controllers to fail closed if there are errors or issues with them. An example being running an admission controller outside of a cluster experiencing high latency and then essentially breaking a cluster. So we need to integrate checks into the deployment pipelines for these infrastructure with things like Converno and Gatekeeper tooling exists that allow us to integrate the CLI and the CLI and gate or respectively. This is also helpful if we're looking to prove to stakeholders that we have equivalent security gates to infrastructure's code pipelines where tooling runs statically on the repository or before requests are made to the cloud APIs. So we're looking to gain parity between statically analyzed the clarity of configuration and dynamically provisioned ad hoc infrastructure. Looking beyond the service of type load balancer, engineers have started to realize the benefits of using controllers to provision back end infrastructure for Kubernetes hosted applications such as storage buckets or databases. There are now quite a few cloud infrastructure controller projects such as AWS act, Google config connect and crossplane and these broadly follow the operator pattern. Custom resources within the cluster paired with reconciling controllers for managing cloud infrastructure. The benefits of this approach, there is potentially less developer complexity as cluster talents are consuming a single Kubernetes deployment pipeline for applications and related infrastructure. There is a you build it you run it concertina of runtime SRE DevOps style risk and management here. This won't work for all organizations. This also enables GitOps infrastructure configuration, not just applications. So, of course, this is then reliant upon a lack of mutation for these entities coming through the Kubernetes API. Because we get back into the same situation where if we don't have a an offline representation of the realized cluster state, well, then we've got a reconciliation gap, even if the operator thinks it's done the right thing because mutating admission control has added in those extra annotations or sidecars or customer resource definitions in this case. The controller pattern provides some drift protection with those caveats. For example, you have to rerun the infrastructure as code pipelines to stamp out the drift. And finally, we can secure this with policy engines, which of course is the counterbalance to statically analyzing the infrastructure's code configuration that we would do traditionally. So, some early projects in this space to find a single controller per cloud service API. But the reality is that creating infrastructure securely is not only about creating one service and setting specific configuration options. These are the guardrails removing knobs and dials in order to stand up secured by default configuration. Often we also require other pieces of infrastructure in place. Take S3, which we will discuss a little bit more today. Not only do we need to create the buckets and configure it properly, but additionally we need to create the IAM roles and users for access to those buckets. The KMS keys for backend encryption, et cetera, et cetera, presumably some population of the bucket as well. Crossplane as a project looks to address some of this issue for us. It is a CNCF incidentally, this is a project that control plane is often mistaken for. Crossplane is a CNCF incubating project that enables platform teams to build abstractions that tenants can consume. Obviously backed by providers to provision the infrastructure. These abstractions are called composite resources. In our hypothetical organization, a platform team defines the composite resources that developers and human operators can consume. The composite resources group managed resources together. A managed resource correlates to an individual cloud resource, for example an S3 buckets or an IAM role. Providers turn managed resources into external resources in the cloud, e.g. S3 buckets. We have our resources grouped into composite resources. They are then managed by providers which are then deployed. This simplifies the life of cluster tenants, application developers, system deployers, who can claim a composite resource for their applications, for example to cure S3 and IAM roles as a single entity. And they are provided for AWS, GCP and Azure. So, why is adoption not more widespread? Well, enablers have only been around for a few years. The controllers themselves are maturing in much the same way as Terraform providers take some time to come to maturity. So, we see the same sort of underlying integration point here. Ultimately, we are configuring distributed systems and so we expect things to fail. Resilience needs to be built into controllers and shaking out bugs just takes a bit of time. We might not be able to migrate everything under a cloud account into cross-plane just yet. There is also the cluster zero problem. We need a cluster on which to install cross-plane itself. Ironically, that's provisioned via Terraform in the repo that we will open source to support this talk. But the biggest issues that we see for many large organisations, they have all infrastructure defined as code. With configurations vetted using policy engines. They then build teams around this pattern that satisfies C-suite and auditors. Kubernetes provisioning infrastructure does not fit into the model that we have taken so long to try and educate upwards to the audit space. So, we need to establish trust in a security model for provisioning cloud infrastructure via Kubernetes controllers. And we will look at how we do that. We have a proof of concept project to demonstrate that we can address these controls and be in a good place from a security standpoint when widespread adoption occurs. The library that we will open source here to be made public soon. We will do that after the talk. It contains libraries of validating policy for S3 and RDS provisioned using the cross-plane community AWS provider. These libraries are aligned to NIST 853 rev 5 and instead of generating mountains of policy and YAML that might ultimately be forgotten, we also started to look at how to make those policies enable the documentation of compliance in an automated manner. So, let's see. I might have to pop this in the window. Can anybody see the bottom of that scroll? There we go. What we have been doing is making sure there are no existing cluster policies. The cluster has just had Coverno and cross-plane installed on it. The policies we've created for S3 exist in the LS policies from previous moments, preventing public access, encryption enforcement, HTTP get requests, et cetera. This is what a typical Coverno cluster policy looks like. In this case, we're ensuring the S3 bucket object was configured with KMS, server-side encryption by default and a validation failed. In that case, the object was rejected from the cluster, preventing the launch of insecure configuration. This policy is now applied. We can see it's correctly installed. This is our minimal S3 bucket configuration. You can see from within the provider fields encryption at rest using KMS was not defined. We've tried to apply this to the cluster and that has been rejected. On the basis that the S3 encryption was not enabled, of course. Now we'll apply the rest of the policies shown earlier in the cluster. We'll just check cluster policies are applied. There we go. There is the deploy. We'll check momentarily. We've validated that they exist. We've created a bucket called the authorised bucket that satisfies all of these policies. This is the first 30 odd lines of that document. But when we apply it on its own to the cluster, it will fail. Because a bucket policy to deny HTTP get requests needs to firstly exist and reference our bucket to satisfy the security requirements as per that. Next we will look at the required cross-plane bucket policy. This is attached to our authorised bucket through the bucket name reffield, which is incoming on 8 and 9. The line is 8 and 9 there. As you can see, the policy denies all requests that have AWS secure transport set to false on lines 18, 19 and 24 and 25. So we will now apply these prerequisites and the buckets will deploy successfully to the cluster without being rejected by the security policy. So we'd expect the bucket definition, policy definition and other prerequisites to be wrapped up into a cross-plane composite resource for tenants to consume. We will then try and request the bucket itself. It will take a moment for the bucket to show it's ready because the S3 server logging is on a best effort basis and it can take some time for the bucket itself to report it's ready. And then finally, when applying the basic bucket, it will be rejected on multiple counts. There we go. So that is the bucket which we expected. But now the basic bucket will be rejected and we have a Coverno test file to ensure the policies are behaving as expected. To be clear, we expect these policies to be triggered by the platform teams when they're creating compositions. Compositions should never contain violations. These were a selection of demos. If they did, they would be self-defeating and useless. So that's a first view for how this can be achieved. So we've just created more YAML to throw at the cluster without fully defining where our controls come from. And I appreciate the frustration this brings on a difficult managing YAML policy at scale truly is. There are entire talks on that. One of our colleagues control plane by the name of Mr Chris Nesbit Smith has a fantastic talk called versioned policy as code which looks at how to make policy changes across massive government organisations. The TLDR is ensure people are allowed to test and have warning before enforcement in various different ways. But one thing we did look at here is how to create traceability between controls on the cluster and compliance requirements which gives meaning to the controls in the library. Here we get into OSCAL. So using OSCAL documents as a source of truth for the enforced policies. Huge thanks to Brent Keller who is the author of the tooling that we'll look at in a moment. We have the open security controls assessment language going on here. This is a collaborative project run by NIST. It is machine readable representations of control catalogs, control baselines, system security plans and assessment plans and results. TLDR, automated policy verification. This enables an ecosystem of tooling for creating, manipulating and validating compliance artifacts around that machine readable format. This removes resilience on out-of-date documents, spreadsheets, screenshots and manual assessments for compliance activities which I'm sure any of us who have gone through an audit are all too painfully familiar with. And this leverages a tool called LULA to validate controls in OSCAL documentation against reality and generate enforcing policy from OSCAL. So there are three documents that we will dig into here. Catalogs are collections of controls maintained by an organization or by a standards body. So in this case we'll be referencing NIST 853 which has been codified into OSCAL for us at 173,000 lines of YAML. Profiles, baselines of selective controls and so NIST has high medium and low baselines for NIST 853 but we can also create profiles for specific components such as an S3 profile that details all of the NIST 853 controls applicable to S3. Finally, component definitions are a more detailed description of controls supported in a given implementation referencing a control from a profile. So to build OSCAL documents for S3 we were lucky enough to have this information available to us. The AWS config ruleset mapped to NIST 853 rev5 I'm sure he's taking the mickey out of me now is public shown in the top left table here. From this we distill the NIST IDs into OSCAL profiles for S3 shown on the right and we wrote Coverno policy equivalent to intent and spirit of the AWS config rules OSCAL component document for Lula to leverage. I'll just repeat that last bit, semi-butchered it. We wrote Coverno policy for Lula into OSCAL. So the documentation here shows how the aforementioned NIST standard control IDs that the S3 profile are addressed. You can see on this slide references to the S3 profile documents the NIST control IDs within that profile followed by a description of the control and a Coverno policy block. This block is an amendment to the OSCAL component definition leveraged by Lula and proposed by the Lula team. In context, OSCAL is still a format under development designed to stimulate discussion and an ecosystem of tooling so NIST are collaborative when taking suggestions from the industry around modifications to OSCAL document structures. Lula is a tool by the right honourable Mr Brant Keller and team at Defence Unicorns to bridge the gap between the expected configuration, required compliance and the actual reality of configuration. It takes the component definition that's shown previously and leverages the Coverno CLI to run generated Coverno policies against the cluster or static manifests. My colleague Henry recently had a PR submitted this project to simply add a generate flag and save those rendered policies to disk so we can then get at them running in Coverno enforcing them at admission time. OK, demo time again. We're going to reset the cluster back to its original state. Target button there again, lonely. OK, so we have a draft. We have a policy here that we've drafted for S3 that is just resetting the cluster back. The important lines we're looking for are lines 28 when they turn up again. Yep, line 28, where we reference the NIST 853 catalogue and then from line 30 downwards, we can see the list of control IDs from NIST, which can be satisfied with the correct configuration of S3. The rest of the file continues like so. So what we will have next, yes, that is the further controls. So, to remind you, component definitions show how the control IDs in the profile are interpreted. This is a cut-down version of the component definition for the demo to reduce time. Under control implementation field on line 28, 29 and 30, we reference the S3 profile shown previously and then the implemented requirements for that profile are listed below. Each requirement has its own UIDs, line 34. The control ID references the control from the NIST profile, which is line 35, followed by a description field where we place the name corresponding to the AWS config rule we have rewritten in Coverno and the rules block, which is the port of Coverno policy used by Lula. In the component definition, we just have implemented one requirement, and that is the S3 version lifecycle policy enablement. So moving on, we will look at the basic bucket, which does not have the define that we're looking for, the S3 version lifecycle. Yes, it is absent. So without policy, we can launch this into the cluster and have a vulnerable bucket in our estate. That is not what we want. We're proving the negative case in order to lock it down. Bad news for the cluster. So we'll now run Lula to assess compliance against requirements in the component definition and see that it fails as expected. There is a compliance violation, which when you look up the UID in the component definition is the lack of S3 versioned lifecycle policy. We should also note that Lula does produce compliance report to disk as well, and so Lula could be leveraged in a number of ways to assess compliance in an automated manner. So let's proceed to remove the vulnerable bucket. There we go. We can also then run Lula in a generate mode, this time against the full S3 component definition because it is quicker. This will create the Coverno policies from the S3 component definition. Thus we have the S3 component definition becoming the source of truth for compliance and ultimately enforcement. OK. These policies are currently named by their UID and there are duplicates as rules such as version lifecycle policy are mapped to many NIST control IDs. We don't want to push these into a cluster just yet. So we will have a look at the component definition which we have looked at before. We've written a script that uses the UIDs to match the policy files with implemented requirements in the S3 component definition and renamed the file and Coverno policy name according to the first word in the description for the corresponding implemented requirement which is our AWS config rule on line 37. Yes, there we go. We will run that script and observe the policies that are created by it as if by magic. There we go. And these are the policies for the cluster. Let's install them as we did previously. We would expect them to fail with the basic buckets once these are deployed. I'll zoom forward slightly so the policies are created. We're now deploying the basic bucket which we do not expect to be permitted because we've applied our policy that we've gone through and generated. And finally, when we deploy the authorized bucket it is permitted to the cluster because it is compliant with all of our policy. Magical, fabulous and wonderful. We can subsequently validate compliance using Lula with the added benefit of preventing a vulnerable bucket being launched in the first place. Doki. So obviously at this point in the demo we've just scripted and glued a few things together with the majesty of Bash. What we're looking for is the final step for a policy to be generated outside by something slightly better tested, of course. We'd like to understand whether that's something we can build into the tool. If that is found useful. We welcome any feedback on the next steps. This is something that we have some utility with compliance at the moment. It would be logical to cover all of the AWS config rules in NIST 853 rev5. We'll also be looking at GCP or other controllers such as config connector. And in conclusion, policy engines for event driven workflows and infrastructure as code are a must from day one. We need to build them into security architectures for Kubernetes and beyond any of our cloud infrastructure deployment techniques. They enable not only the initial case, the load balancer service but can also be used more widely across a large organization to enable self service infrastructure provisioned by infrastructure controllers. There are a lot of developments in that space that could prove exciting in the next months and years. We have some policies enforcing compliance to NIST 853 for AWS S3 and RDS that you can use with cross-plane provisioned infrastructure under the Collie repo that I will open up shortly. And through OSCAL, Lula and Coverno and cloud controllers we're starting to get to a place where we can tell in an automated way where you can in an automated way tell if all your infrastructure and Kubernetes applications are compliant with policy standards while simultaneously enforcing those standards from a single source of truth. Thank you very much to the authors of this talk. I hope I did them justice and thank you for your attention. Would anyone like to ask questions? I may delegate them to somebody else. Do we know if anyone has implemented anything like this with anything other, OSCAL with anything other? My first delegation, Mr Keller. Time environment. But definitely from the perspective of, like, establishing those control catalogs, those are available for a variety of different standards. And then from there, you know, I think we've seen a couple different entities that are starting to dive in and kind of own PCI, for instance, has been explored pretty recently, as well as other kind of OSCAL-related tooling such as Compliance Trussell, kind of exploring these same spaces. Thank you very much. Is there another hand somewhere? Hello. So is the question, is it possible to see a macro view of all policy across the cluster? Yeah, I think, like, from this perspective, I'd say no. Maybe there are tools out there, or maybe there are different ways of accomplishing this that can do the job, and maybe the answer is yes in those cases. Lula from this perspective is very highly focused on, like, kind of an inheritance model of, like, this is my infrastructure or this is my application. Here are the controls that it satisfies or that you can inherit. But then from that perspective, right, it's very focused in nature so that you might still utilize, like, a GRC tool, Governance Risk Compliance, to provide you with that, you know, high-level abstraction of Lula might be the runtime component that's looking at very specific composable modules that's then, you know, producing compliance reports and then exporting those out to a GRC tool that is defining relationships, defining, like, how to look at the high-level picture of all of your environment or, you know, provide filtering into different aspects that might, you know, you might want to focus in on and try and find what's currently validated and compliant versus what is not. So it's one tool and a, you know, possible ecosystem that, you know, could be developed around OSCAL and system security plans, all the rest of the OSCAL models that are available. Thanks again. We might have time for one more question if anybody would like to pose something to Brandt. Hello. So the question perhaps is when different standards have similar requirements for enforcement, is there a way of deduplicating them? I'd say largely the answer is no. Right now, it's going to be kind of based upon how are you crafting the inputs that you're providing to, like a tool such as Lula, and they're trying to develop, like, if there is, you know, duplication across those, how are you performing that? And another thing that we kind of want to look at here in the very short term is, you know, having this evolved from not only CLI-driven but also, like, something that is maybe more event-driven in your environment, that can be able to perform this validation at a kind of a lower layer so that, you know, if only this one piece has changed, then maybe we can go and find, you know, not only one, identify where there are duplicates, but then, with that, be able to, you know, prevent the need to re-run validation against the whole environment over again and, hey, only this application in this namespace has changed. So, we can go and find the compliance information for it and let's go and validate that. Right. Huge thanks to my co-presenter, Mr Brachella. Thank you to everyone else. Have a wonderful day.