 Welcome to our KubeCon talk. Everything you should be doing but aren't. DevSecOps for Kubernetes workflows. My name is Dan Papandrea. I'm the field TTO with Cystic as well as the host of the podcast. Hey, everyone. My name is Steven Tirana. I'm a chief engineer at Boozown Hamilton. So we're going to walk through some fun with you all. And again, this is everything you should be doing but aren't. So take it away, Mr. Tirana. There's a lot to do when it comes to securing your application development and actually implementing DevSecOps. So this is our entire talk in a slide. So the first thing is what is DevSecOps? I've had the pleasure to talk to hundreds of people about that question. And it turns out no one actually agrees. So here's a slide that tries to level set what exactly we mean when we say DevSecOps. The shortest definition I can come up with is integrating security into every step of the software development lifecycle. What does that actually mean? So you've got application dependency scanning, static code analysis, container image scanning, continuous compliance, dynamic application security testing, accessibility assurance, all of those things work together to build a trusted software supply chain. I'm going to dive into more detail about what each of those things actually means. The end of that is a container image. We're at KubeCon. They get deployed onto a cluster. Then there's still more to do. All the security in the world is not going to help you if you're running your containers as the root user and privileged. And new CVEs come out every day. So you need to do continuous runtime security. And we're going to walk through how to actually accomplish that. And one thing to note here, Steven, and for everybody out there if you're watching, you don't get these things out of the box. These are not something inherently possible with out of the box with Kubernetes. So big thing out there for the 101 out there who haven't deployed a Kubernetes cluster. And that's a perfect segue into our next slide here, which is pipelines are hard. I don't think we plan that, Steven. No, we just have so much synergy that it pains me. So how to build a pipeline? If you're like me, you learn through Stack Overflow and Google. And what you're going to learn on Google is open your IDE, like five lines of code from a tutorial that gets you to step two here, which are automate how to run some unit tests in a poor request. But what does reality look like? So when you're in an organization, you need to implement these defense in depth security and quality scans for every single team who are all running different types of applications across an entire organization. So this is my favorite meme. I'll keep it PG-13 for KubeCon, but look up the OWL meme and you'll see some inspiration for this. And the point here is that there's a huge gap between tutorials in the industry for how to get started with pipelines and the reality of what it takes to automate a DevSecOps software development life cycle for more than a single team. And why is that so hard? So the biggest challenge in my experience, especially like most of us using Kubernetes, have also adopted microservices as a way to decouple our applications and allow autonomous teams to work in parallel. The downside is that each of those microservices needs a pipeline. And every CI CD tool in the industry is focused on building a pipeline for a single application. And what we realized was that different types of applications use different tools, but process rarely changes. So usually you'll write your 700 line Jenkins file that does all the different types of quality testing from unit testing to integration testing, browser-based test automation, pull a word out of the dictionary and put testing on the end of it. Then you throw in all your security scanning and you're like, awesome, I built a pipeline for one team. It kind of sucks that there's 75 other microservices that you need to do this for. So what do you do? You copy and paste your Jenkins file across every single source code repository. You tweak it, right? If it's a front-end app, you're probably using NPM or Yarn or Gulp to run your unit tests. It's a Java app. You've got Maven and Gradle to go run your tests. It's a Python application. You've got PyTest. So even though when you drew your awesome DevSecOps pipeline on the whiteboard, you drew some circles that said things like unit test, application dependency scan, build a container image and deploy it out to an application environment. It took you like three seconds to draw, 700 lines to write for one app, and then you copied and pasted it 100 times for each microservice and tweaked it depending on the types of tools that are being used. So the reason that this is awful, besides the fact that we're copying and pasting things over and over again, is time. You need to go on board each individual application. Complexity, you're tweaking the same thing you've copied and pasted over and over again to make it work for a specific app from a standardization process, especially if you're in a highly regulated environment like FinTech or federal application development. You've got a lot of auditability and compliance requirements. How do you actually know that all of those teams are following the same software delivery process? A little less academic way to say that is like, I said you all have to go do container image scanning. How do I actually know that you're doing that? And then sustainment. No one gets the pipeline right on the first try. I know I don't. So over time, as you want to make updates to what this automated workflow looks like, you've got to go make those changes across every single pipeline definition across every single branch of every single source code repository. And it goes back to our title, Steven. It goes back to everything you should be doing. But I think it's everything you should be doing, but it's too damn hard for you to be able to do all of these things. Whereas, that's kind of the problem statement here. It is. And let's talk about a better way. Despite the fact that all these different teams are using different tools, what we drew on the whiteboard didn't care what type of application it was. We were able to say in tool agnostic terms, what's the business process to get code from a developer's laptop to production as quickly as possible without sacrificing code quality or security. So we've spent the last couple of years working on an open source project called the Jenkins Templating Engine, which is a plug-in for Jenkins that we're going to demo in a second here. And what it allows you to do is take that pipeline definition out of each individual source code repository, define it in one place in tool agnostic terms, and then plug and play with which specific tool are we going to use to implement the steps of that agnostic process. So here's a pretty fancy GIF that I made that sort of shows the process. We've got test, build, scan, deploy. And that exact same scaffold can be used across every team, but swap what tool is being used to implement those steps. So if we take this over to some code. And let's kind of give the ground here, Steven. We deployed a, you know, in GKE, we deployed a cluster, you and I, we put some more tools on it. Pretty much it's just the out-of-the-box GKE cluster. That's exactly right. So here we've got Jenkins using the upstream Jenkins Helm chart. Nothing fancy here. We installed the Jenkins Templating Engine. We configured it to a point to a template. So let's actually kick off a build here, refresh the page. And while this is running, let's go take a look at what a pipeline template looks like in the Jenkins Templating Engine and how this actually gets implemented. So I'll zoom in a bit. This is probably pretty small right now. There we go. So if you're like me, you've written a lot of Jenkins files in your time. And they usually balloon out into 700 line files where you're trying to represent the business logic of your pipeline. So when we draw pipelines on the whiteboard, they're linear. You say build, test, scan, deploy. In reality, they are not linear. They map to the branching strategy or the way developers are collaborating on a code base. So if you want to change what types of tests happened on a pull request to a development branch or you want to change which application environment you're deploying to when you merge a pull request, all of that logic typically ends up in the same place where you're defining exactly how you're going to perform the different types of tests we talked about. So that turns into a 700 line file. The Jenkins Templating Engine sort of turns that on its head. The key value here is that we've been able to separate the business logic of your pipeline. So what should happen in when? From the technical implementation of exactly what tool do I care about, or is going to implement these steps for a particular microservice. So if we walk through this 17 line pipeline template, in parallel, we're going to run unit testing and static code analysis. Then we're going to run application dependency scanning. Then we're going to do a build and scan a container image. We're going to deploy it to a production environment. And then in parallel, we're going to do penetration testing and accessibility compliance scanning. So one of the most important things to realize about this template is those step names, unit tests, and static code analysis are generic on purpose. It doesn't matter if unit test comes from a Maven library or comes from a Gradle library or an NPM library, application dependency scan can come from a loss dependency checker. It can come from Nexus firewall. Build can come from Docker or image or any of the 17 tools out there to build a container image these days. Scanning container image can come from Twistlock or Sysdig or Anchor or Clare. So the point is that this single tool agnostic pipeline template can be reused for every single microservice or even every single application in your organization, but still be flexible enough to choose the right tools for the job based upon what type of application you are. So alongside these pipeline templates, you can define hierarchical pipeline configuration files. So here's what a pipeline configuration file looks like. Here's where we're going to define what tools are being used in that template for a given application. So at the top, we've got allow SCM Jenkins file equals false. That's where I as a pipeline administrator can say you're not allowed to bring your own pipeline. We have strict rules around what this workflow should be and you're going to inherit it. The libraries block is where you specify modular pipeline libraries that implement those steps in the template. So for example, Pi test here on line 14 is what's going to contribute the unit test step. That could be Maven. It could be Gradle. The whole point is that you're able to plug and play with these different library implementations. These libraries can take configuration options. So as a DevOps engineer, there's no reason that every single DevOps engineer that needs to write a pipeline should have to Google sonar cube plus Jenkins and find the 15 lines of code it takes to implement static code analysis from a Jenkins pipeline. So by modularizing our pipelines into these pipeline libraries and externalizing configuration through these library parameters, we can now configure our pipelines instead of building them from scratch for each new application. So if we head back to Jenkins, we'll see if our pipeline is done yet. So it's still running, but we've got some examples that are done. So let's talk about each of these steps really quickly and why they're important. Unit testing at this point, I hope that one's pretty clear. We need to make sure that our code actually works at the unit level from a security standpoint, application dependency scanning, we need to make sure that the materials we're bringing into our application, all those third party dependencies don't have known vulnerabilities. So there's been some pretty significant data breaches in the last couple of years. I won't name anyone by name, but a lot of times if you go look at what exactly caused this, it would be an insecure application dependency and there was no application dependency scanning being done. So the next step there is static code analysis. So application dependency scanning, if we go back and look at the slides that we had to talk about the big picture tasks that we're doing, static code analysis is cool. My raw materials, my dependencies are secure. Let's make sure my code itself is secure and SonarCube is a great tool to be able to do that. With containerization comes a new artifact in the container image. So there are a lot of tools out there today that can scan those container images to make sure they aren't bringing vulnerabilities into the ecosystem. Pulling an image from Docker Hub is the same thing as going to finding a sketchy van and like a Walmart parking lot and putting that in your production environment. So you need to make sure that those environments are secure and that they're as small as possible and only have what you need to run your application. Continuous compliance. So everything up until now has been the app layer, but we're running this app on infrastructure. We need to make sure that those underlying servers are compliant and there's a ton of profiles out there that have best practices baked into them for how to configure your infrastructure. A lot of the tools that do container image scanning can also do continuous compliance to make sure those hosts are secured. Dynamic application security testing, it's a really long way to say pen testing. So that's actually attacking a deployed application and seeing if it's susceptible to common exploitations and then accessibility assurance. So we need to make sure that the applications we're building are accessible to everybody. Obvious examples of that are we included an image but never included in alt text. So there's no tool out there that can tell you with 100% certainty that your site is accessible, but it can definitely tell you if it's not. So what this does is give developers fast feedback to be able to fix the obvious issues with accessibility so that manual testing can focus on the more complex areas of accessibility and compliance. So if we go back to Jenkins, each of the steps of the pipeline that we're generating results, we're archiving those artifacts, right? So we'll get reports from OWASP and we'll get reports from Google Lighthouse and OWASP Zap, which are doing pen testing and accessibility compliance testing. And this gives us an audit trail, right? So every time we make a change, we can then go see if what the resulting scans were for that. So if I've got organizational thresholds around code quality or security, I now have the artifacts that prove my change has met those requirements so that I can get approval to deploy to production as quickly as possible. So at the end of the day, new CVEs come out every day, even if you did all of these great kinds of security testing, you still need to make sure that you're monitoring your production environment. And there's some really great tools that can help with that. And Pop is going to walk through how we can make sure our clusters are secure at runtime. And so the term, I would say, is more runtime security than monitoring. Obviously, we have the modern capability with SISTIC. But if we're talking full runtime capability, the idea that we would do here is, Steve, if you could stop to share. So Steven mentioned runtime security being important. And that's where Falco cups into play. And so Falco is an incubated CNCF project. Hey, we're at KubeCon, right? It's a CNCF event, right? And so the way that Falco works is there's your app, right, which sits on Kubernetes. In this case, we have a GKE cluster. And then you have your runtime. And so what's happening is we have this EBPF function or Linux driver that's tapping into like Capsus P trace to be able to then evaluate at a runtime ascitation. You have a set of security rules and these can be set as YAML. We give you a bunch of them out of the box. If you'd like to contribute more security rules, they're pretty easy. It's easy syntax for you to do so. And then you can then take this and send these alerts out. Now, right now, we're probably going to do it to standard out, but you can do it out the GRSPC, Client Go, Client Rust, Prometheus, another project that we've integrated. I'll show an example of is a Falco sidekick. Shout out to the team for that. And so you have all of these pieces that you can kind of tie these things together. Again, get your runtime ascitation. You can also tap into QBaudit. So like anybody like being able to like, you know, creating a namespace or those things could be set as Falco rules that you can get output to. And then you can, you know, use this kind of notification system if you need to. All right, I'm going to be honest, right? When I heard about Falco, I was like, that's awesome. But I don't know what Capsus P-Trace is, but the way Falco has created these abstractions with YAML files to create rules makes it accessible to anybody. And like even I can write these YAML rules for Falco. And Security Hub makes it even better. So I don't have to write these rules from scratch. I can go to, you know, community sourced rules to get best practices right out of the box. Steven, it's like we've done this before. It's amazing. What's your name again? All right. So I'm going to share my screen again, just let me clear my terminals. Yellow. All right, let's go ahead and do some attacking here. All right. So here's what I'm doing. I've tailed my, the same server that is sitting where I've attached this YAML file that or this HTML file that I'm going to copy into the directory. Right. So let's go ahead and do that. And seeing there's an attachment to it. All righty. Now I've copied this in here. And you see, there's an executive pod there now. And again, it's one of the static rules. I'm going to do a refresh. And guess what? I just WTF bombed Mr. Tarana's precious pod. Look, I'm out. So with that being said, we just got attacked. And we also might have got, we use the power of Slack to be able to, or the power of the sidekick to be able to send that output out to here. And I'm saying, oh my God, Steven, we got hacked. Now we got to figure out what happened. So again, we've used the Falco tool to do that runtime detection. And then we have the capability to send those rules, the rule ascension, the rule sets that are there. Then we have the ability to send the standard out or out the GRPC or out to the other things that I showed as part of the slide there. So that's pretty much, again, in a nutshell, all the things you, all the goodness that you can do for runtime security perspective. Let me pass it over to my friend, Mr. Steven. Thanks, Bob. I'm still recovering from my feelings being hurt, but we've got one more quick demo. WTF bomb, Steven. How are you going to recover from that? All right. So you guys are really getting your bang for your buck at this session. I got one more demo for you. So we talked about application security and building a tool, agnostic DevSecOps pipeline that you can share across teams. We talked about, you know, it's, that's still not good enough. You need to do runtime security with Falco. And even that is not good enough, right? All the best security in the world is not going to help you if you still have insecure configurations. So the last thing we wanted to show you today was using open policy agent and gatekeeper to be able to bring some governance to your Kubernetes manifest, to be able to control exactly what types of things can be created, right? So Kubernetes RBAC can only go so far. I can say this user is allowed to create a deployment or allowed to create a namespace, but I can't get so granular through the regular old Kubernetes RBAC to say, this user can create a namespace, but it must have these labels, right? So to get super granular, open policy agent allows you to define policies as code that get down to the specific field level of a Kubernetes manifest. So if we take a look at some examples here with gatekeeper, you create what are called constraint templates and constraints. So if I take a look at this, this constraint template that we've got here, I'm able to create generic rego policies called constraint templates that define a new custom resource definition. In this case, it's called case required labels. And down there at the bottom in the keys and values for the rego, we have a policy that says every single object that conforms to this constraint template must have a corresponding label. But through the constraint template, it's generic, right? So we're going to be able to create multiple constraints from the single constraint template to lock down our cluster even further. So one particular example of that is if we wanted to say that every single namespace must have a gatekeeper label on it, I can create a specific constraint from that constraint template. And you'll notice that the kind of object being created corresponds to the dynamically created CRD from the constraint template. And we're going to say here that every single namespace under the match kinds kinds, whereas as namespace, every namespace must have a label gatekeeper. So right now, I have a namespace file, no labels on it, right? So this would not be allowed through OPA and through gatekeepers. So if I say cube apply, if an F a namespace, I'm going to get an error message that tells me this request has been denied, you must provide the labels for gatekeeper, right? So I can go edit this namespace file, make it conformant to the policy that we just created. So I can create labels, I can make sure that I have the gatekeeper label that is required. I can get out of here and I can try again. And now that the namespace is conformant to organizational policies, I was able to successfully go and create that namespace. So when does this come into play? Common use cases for OPA would be things like every deployment must have resource requests and limits. I get a lot of calls of, hey, my cluster is broken. Come help me fix it. And most of the time it's because people never put resource requests or constraints on their containers. So the scheduler blows up, you'll have things like privileged containers and pod security policies, right? So there's all these best practices that we've learned along the way as we've bricked our clusters that we want to be able to put guardrails on future clusters to help people deploy resources that are going to be reliable. An open policy agent gives us a way to codify those best practices and security policies and then enforce them throughout the cluster at a super granular level. So in this talk today, we learned about all the different kinds of security testing. Hold on. I want to show one more thing. I'm going to interrupt you. I'm going to hack your talk. Is that okay? Can I show one more thing? I suppose. Go for it. I want to show you this. I'm going to show you this. Notice when Steven did his creation of this testing namespaces. Another beautiful thing. Remember, I told you about the Kubernetes audit. We were able to basically look at this and see that you're not allowed to be able to be creating namespaces. So this is an example where we're using OPA and Falco together to be able to understand what's going in our environment. And we're able to use the power of OPA, for instance, to be able to have the rulesets that we're able to do for the creation of that namespace we were able to create. So I mean, pretty cool stuff. So we covered a lot today in a half hour. Everything that we talked about was how to integrate security into every step of your software development lifecycle and build a DevSecOps pipeline through things like application dependency scanning, static code analysis, container image scanning, continuous compliance, penetration testing, accessibility, compliance testing. Then we talked about configuration governance, right? So being able to have really granular control over the types of Kubernetes objects that are being created. And then we talked about runtime security. So, Pop, you want to debrief here on runtime? Absolutely. And again, using the power of Falco, it was a damage that we deployed to this GKE cluster that was rule sets already out of the box. When we terminaled into the container, when we had the example of OPA, where we were able to look at that and see that Steven went in there through using cube audit rules that we had attached to it. And then sending that out to either to stand it out on your screen or to GRPC or to Slack or any of the things using the Falco sidekick project. So all of that, again, we've done that through the power of open source. And, you know, we want to be transparent. There's a ton of content here and it can seem overwhelming. But like Pop just said, all of this is open source. The community is super welcoming. And we would love to be able to talk to you all in the Kubernetes Slack. Thanks, everyone. Thank you, everyone.