 All right. Well, thanks for coming guys and welcome to automated cloud native instant response with Kubernetes and service mesh Matt and I will introduce ourselves momentarily for now We would like to thank the cloud native computer foundation event team for yet another great cube corner My friend. Cool. So I'm Matt Turner. I'm a software engineer at tetrate We are a company that looks at helping enterprises with identity-paced application micro segmentation So we use is to an envoy in all those texts as an implementation detail to help you scale your organization And scale your security as you've got multiple clusters multiple regions multiple meshes And I'm Francesca my security engineering manager a control plane I spent a few years at university then I moved into the IT OT security spaces Then I stood up the security operation center and instant response capability for a large Satellite mobile provider work for control plane now because I didn't want to see a network cable anymore in my life control plane is a cloud native security consultancy Establishing 2017 based in London, but we have offices in North America and a pack as well Security specialists in Kubernetes container and open source in general We train on all the above as well And we focus on deeply threat modeled the security design and security fold the cloud native architectures and more recently we stood up a new service which it consists of helping customers bridge in the gap between infrastructure cloud native infrastructure and Security operations. So what will we be talking about today? We go through some basic concepts about security instant response to make sure that people in the room and they have the again the basic concept that they we speak a common language and then we'll Go through a story where in organizations more recently start adopting a more proactive approach to instant response and bring and brought some automation as well to help them counter attacking and Then the fun will begin Matt will walk you through some Cloud native technology and concepts through an incident response lens and then there will be a walkthrough literally of a response on a reference architecture on a Kubernetes cluster Cool. So apologies to the security professionals in the room, but we have to do this So I'll start from the definitions that most professionals disagree on incident and response I like these two incident an event that could lead to the loss of or disruption to an organizations operations data services or functions a Security incident is an event that may indicate that an organization system or data have been compromised All that measures put in place to protect them have failed response to such incidents a set of people process technology to identify Contain eliminate and recover from such events. What do we mean with people processing technology? I Can't stress enough the importance healthy security operations. They have they rely on Having the right people in the right seat your analysts your engineers your forensics team unfortunately managers to You should define your processes well document them well rehearse them define your runbooks to respond to incidents Make sure you cover processes like a threat intel dissemination to your sensors asset isolation evidence gathering from allegedly compromised Assets and then get the technology right Come up with a stack that works for your organization sustainable scalable Include the classic at the sensors IPS and EDR agents Include your cloud native sensors Falco or whatever your cloud service provider provides to you cloud trailer VPC flow logs in Amazon for example, and then Stood up your security incident and meant an event management platform or seem to collect all these logs and try to make sense out of them And then as we'll see later introduce some automation gently maturing your security response capability To try to take the humans out of the picture as soon as possible Yeah time out Francesca time out Okay, we're not all hackers just I had to do a little bit of research to understand what this guy was talking about So for those folks who don't work in security all the time. Yeah, just a few of the terms you're gonna be hearing So when we talk about an IOC, this is important. This is an indicator of compromise So this is anything that sort of points to the attack. Yeah, this is kind of a useful thing This took a lot of googling This is anything that will point to the sort of attack that's going on So they're there the malicious payload that's causing the the thing or like a Cisco or disk access profile from like an owned workload something like that The if we say sock we mean the security operation center Which is you know the room where the people are sat with you know eyes on glass Looking at all the information coming in making decisions a signal is any sort of security related event That we monitor that that comes into that room that those folks look at and then yeah We've got your seam which is essentially your dashboard of all these security related events that are coming in so that humans Can look at them and decide Whether they're you know real security incidents and what to do and then saw is essentially a workflow engine So it's essentially a a set of you know, sort of scripted playbooks for automated responses that we've that we've practiced Thank you Matt in terms of existing frameworks to respond to incidents there are a few Out there for this talk we will use that the next instance response framework There is another one from sans which is quite you find it quite often in organizations The next instance response framework articulated over four steps Preparation got your teams the teams they have the right knowledge about the infrastructure and they're upskilled and Then you have your playbooks and run books defined. They are implemented and rehearsed and then your your technology and infrastructure is Well observed you have that observability right so that events come in into the security operation center So step two is about detecting threats and analyze those threats once at the outcome of the analysis is basically Label an event as a false positive or true positive and some variations in between But overall if it's a true positive then you move to step three you move to contain the threat Eradicate the threat from your environment and recover from it if there was a service disrupt service disruption Step four past the incident activity how this happened how can we try to make sure it doesn't happen again? Again just to do a little translation of this because as some of these terms are used in ways that I found a bit counter-intuitive I guess as a dev not a security person Yeah, so that detection step is watching for those signals in the scene as we talked about Analysis is looking at the alerts are looking at the information that's coming through trying to work out if it's a real incident and You know working out deducing those those indicators of compromise You know what is the bike string that's being used to to try to pop me or how do I tell when a workload has been compromised? And what we're looking to actually produce at the end is is one of these indicators of compromise Which is often a checksum actually of whatever the bike string is so that we can feed it into say a firewall and just have it blocked Containment is is maybe a little counter-intuitive. So when they say containment they mean Stopping the attack going further kind of limiting its its blast radius. So that does mean stopping privilege escalation stopping lateral movement But it also means actually this is where we would actually block the attack, right? So stopping the attack going further means stopping other copies of the same workload falling Fowl of the same exploit. So this includes the you know that you might not see on this list You block the attack. It's actually included in containment. It's maybe not obvious Eradication then says okay, so we you know we blocked it Can't it can't move it can't happen again, but we have had some things that have been compromised So we have to go clear them out. That's a eradication And then recovery is about restoring the normal service because and we'll we'll come on to this It's important sort of implicit in the NIST response framework is the fact that your response is going to involve kind of turning everything off Like panicking and shutting things down and cleaning re-imaging things So recovery is about getting your normal level of service back and a lot of what we're going to talk about is is Maybe ways to sidestep that and not have too much service disruption when you're responding to an incident But that's what the recovery part means. It means turning all the windows servers back on again Correct. Thanks Matt. So as I mentioned during the agenda walkthrough organizations at some point recognized that the classic instant response process was a little reactive and Then they start moving or they recognize the need to move into something more proactive and So they introduce something called intelligence driven defense. So we drop the one-on-one in our response capability and We start adopting something called kill chain perspective kill chain step-by-step approach that identifies and stop proactively enemy activity implements intent-based response behavior based detection to get a step ahead of adversaries of course to do that It's critical to have the right intelligence such as indicators of compromise But what is actually this kill chain? I'm referring to well, it was it was broadly identified that attackers to get to your crown jewels They go through seven steps those seven steps captured in the cyber kill chain reconnaissance weaponization delivery exploitation installation command and controls and then actual actions on objectives But the tricky bit is that they have thousands of ways to perform each step, right? And as responders we only have a finite amount of resources unless you are logic organization, but still you have finite amount of resources So how do we deal with that? Thankfully, thankfully we we don't have to because some organizations like my tree came up with a great framework like attack which consists of a taxonomy of tactics and techniques talking about technical stuff that Attackers implement to actually breach your organization and again get to that actions on objective try to get to those crown jewels It's completely open source a goldmine of information again to figure out how your most likely threats the ones most likely to target your Organization are going to operate and therefore how you can break that kill chain based on the tactics and how they implement it So to summarize intelligence driven defense is all about one thing knowing your enemy which Actually is not really a new thing, isn't it because apologies for the heart of war quote, but Soon to the out of war a quote is if you know the enemy and your self You need not fear the result of a hundred battles, but also one of the greatest bands in the world Said the same I know your enemy. So they have to be right But how do you respond fast enough to attackers to me and to make sure you manage to break to break that kill chain? Well, again, you don't you get a computer to do it for you. So kill chain actually introduce some of required introducing something called sore security orchestration automation response is a platform Called a tech stack that enables organizations to collect data about security threats and Respond to security event with little or no human assistance Again that are commercials platforms open source platforms We are tech agnostic in this talk. So we won't mention anything of course sore to do that Requires access to your infrastructure to a degree via API's via service account And this is also going to be the crux at the end of this talk Right to wrap up the boring part Instant response challenges associated are can get very complex reaction time is critical Technology interoperability is a challenge and sometimes you have to deal with limited automation And this gets even worse when we are talking about cloud native response because it's relatively new There is a skills gap in incident responding team. That's inevitable. It's very fast-paced It's really hard to get observability, right back to the point before And then you have to deal with the things like volatility Scaling and so on and it's also very difficult for security operations to integrate properly and in a non-disruptive fashion With a team's practices such as infrastructure provisioning and DevOps pipelines. However Let's look on the bright side. I guess so how can responding to incidents in a cloud native platform actually help us, right? How can cloud native technology be be a bonus? So first of all, obviously these platforms are more advanced, you know They have a bunch of advanced capabilities around You know, they're just a high level of abstraction. So they restart workloads. They they auto-scale workloads They do all the cool, you know, community stuff that we know about. So we've just got a lot more levers to pull In terms of automation, you know, they're they're automatable They're automated these things were they do a lot of things for us already out of the box They're designed to be extended and to be automated. They've got nice APIs They're sort of ergonomic to develop against and then GitOps is actually, you know Probably not exclusively a cloud native pattern, but definitely goes hand-in-hand And this gives us a whole bunch of like tools in our toolkit that are really useful when we're responding to incidents So if we do our response through GitOps, then we've got an audit trail of all the things we did, right? I mean, you know, get repose a Merkel tree. Everybody loves a blockchain. So there's that But it's also sort of it's it's deterministic. It's reproducible. It's declarative It's just a nicer way of working and it also Can be an advantage when it comes to the sort of privileged operations, right? So your your script or your users don't need highly privileged access All they need to do is commit into Git and then you've got the audit log You've got all the gating that you might need any any riggo that applies and then, you know operators in a cluster Will pick up CIDs pertaining to exactly what they need to do and they'll have exactly the mineral set of permissions to do them So we can, you know, split things out and we can understand the operations in a much more cloud native way So The CNCF landscape is so big now. It's fractal, right? But there are we for this talk, we're going to be focusing mostly on Kubernetes in Istio I think this is pretty pretty standard platform these days So, you know, the benefits the Kubernetes gives us when we're building these platforms are responding to incidents is as I say The out-of-the-box behavior around rescheduling and recovering applications that support for custom operators and for extensibility the not technically Kubernetes thing but obviously, you know, GitOps is is a Big deal these days and then more tactical things like support for hardened runtimes They for in a first-class way with the runtime class resource support for ephemeral debug containers And then some more alpha stuff that's coming down the the road all of which will look out like the container checkpointing stuff And then what Istio brings us is is sort of layer 7 networking, right? So we're moving we're lifting our application our pod to pod communications into this layer 7 network Well, where all of your application layer protocols are, you know, parsed and understood So, you know, we got we can have full logging of any metadata of any bodies because the you know The proxy can can parse and understand what it's seeing on the wire and that gives us fine-grained control of the traffic Right. So rather than an older layer for firewall being programmed on, you know, five tuples, you know Source an IP of support an IP of source and destination We can instead put in firewall rules based on HTTP methods and headers and pars and all those kind of things And because we've got the sidecar and this only does apply to the sidecar model not ambient mode Because we've got a sidecar alongside every workload Then you've got policy enforcement at every hop, right? So you've got that policy enforcement point as the security folks would call it at every hop between applications So this led us to thinking right now if we're on a cloud native platform What does that change about the way we might respond to incidents? So we looked at the NIST framework and we've actually tweeted in a couple of ways and we've got a kind of new proposal Copyright 2023 a little twist a little remix of what, you know cloud native incidents response might look like and this is still a proposal We're going to walk you through An incident response using this so the couple of changes we've made if we've managed to add a Constraint step at step two so we'll come on to that But that basically says that we can actually get in and start proactively trying to stop the attack Remember this is before the analysis. This is even before we've confirmed that there is an attack We've just got some alarms going off that something suspicious is happening So normally you wouldn't roll response at this point because response tends to be very disruptive and very slow So you normally wouldn't do anything until you've confirmed the attack But with the cloud native technology we can do some quick and non disruptive things to try to contain it So I'll talk about that and then this also makes the sort of recovery as I've alluded to already Makes the recovery in step three optional Hence that I couldn't quite fit option off But we've got the brackets around it makes this optional because if we can do this response without disrupting service Then there is no recovery to do it's not always possible But if you know if we can do that it may look like a little thing, but this is actually a really big win Quick shout out to something coming in May from our long-term control plane friend Abdullah Garcia at JP Morgan Chase Will release a threat library called Kubernetes for sock again to help security professionals And infrastructure teams to get the observability right as it was fused then with the content from Field experience from you know distilled in control planes internal threat library user Thanks, so that's step one preparation like Francesco said you've got to your people your process your technology all in the right place This slide belies a lot of work right especially as we move move to a cloud native platform We up scheme skill our teams, but it's you know assuming that's done So what would what would detection look like in a sort of cloud native instance response? So we've still got people you know sat in the sock looking at the scene looking at all that information a lot of that information Now is going to come from envoy because it's as I say it's a sidecar to every workload So it's going to be Analyzing all the traffic sending all the traffic logs into the scene and it really is going to see everything that that happens on the wire So it's going to detect those anomalies Send them to the scene and then as I say you know this thing it requires a human step But this is going to be flagging sort of unknown and suspicious behavior They're all you know we're still gonna have a bunch of the traditional sensors So we're still gonna have traditional firewalls Maybe we're still gonna have like EDR essentially like intrusion detection on your host you might have you know NDR your classic traffic sniffer There's a whole market now right of XDR products So there's still all your classic sensors But we're now also getting things like cloud trail logs like VPC log flows and like envoy traffic analysis So all of this comes into the scene a human eyeballs it and if they think something's up They escalate you know to the saw and they start running those those workbooks those scripted responses So containment this is our sort of our first new step So this is about as it says buying time about preventative containment So for the purposes of this walkthrough I should have drawn a picture But imagine we have a deployment of 10 pods right on one of those pods has raised the alarm So one of those pods is is doing something suspicious or is receiving suspicious traffic and then there's another Nine in this deployment and the reason that's important is because you know They're almost certainly on the end of the same traffic route right so if I hit example comm slash foo I'm getting my requests are getting routed if I hit that from the internet My requests are getting routed to this deployment of of 10 pods You can do more complicated things in queue, but let's pretend it's hitting my deployment of 10 I'm one of them has raised the alarm, but I know any future requests You know we're gonna start hitting the other ones pretty quickly because the load balancer is just going to be spreading those requests out So what do we want to do to that suspect pot? Yeah, straight away right suspicious suspicious activity, but no attack confirmed Well the first thing we want to do is we want to freeze the orchestration We want to stop it getting you know We want to stop it getting deleted replaced essentially so we want to stop it getting scaled down We want to stop it getting updated because we're gonna want to go in and do some forensic analysis of this pod So the very first thing to do is just tell the orchestrator to leave it alone You know it's it's ours now. It's under investigation And then the next thing we want to do is block the east-west traffic So we want we're gonna stop if it has been compromised we want to stop that attack from moving laterally We want to stop anybody from pivoting through this service into another one Right, so we want it to stop it being able to call anything else But notice that I haven't said blocking north-south traffic, right? So we think we think maybe an attack is in progress. We actually want to let it continue You know the payload may come in again. It may be a multi It might be like a bug chain there might be multiple stages to it There's a whole lot of reasons why we might want to see what's continuing to come in from this attacker on the internet And actually if it has already been popped We want to see what's going out so it's probably gonna reach out to its C2 network It's probably gonna download the next stage of the malware It's gonna maybe it's gonna even start exfiltrating our data all of that is stuff We want to see so we implicit in this is we actually want to leave that north-south traffic alone But we want to block east-west so that people can't pivot through this But notice so that effectively does take this thing out of service, right? It's not gonna be as if anything useful for the user because it can't call your other microservices But when we've taken it out of the deployment, it's gonna have been replaced by Kubernetes So our service level stays the same Right, so how do we actually do this in a cloud native environment? You know I've covered most of it the easiest way to freeze the orchestration is to just remove that part from the deployment I've shown you know literally showing the command here. You just want to change the label That the deployment is matching you just change it to something else on the running pod not in the deployment template But on that one running pod change the label in this case I'm changing whatever it used to be food to food isolated the deployment controller will then leave it alone And as I say first big cloud native when it will actually it's actually not disruptive because the deployment controller will come Replate it'll say oh I had ten pods now. I've only got nine and it'll it'll bring a tenth back into life So that service level will be maintained I Did say tell us a hail every Pronounce cube control in every possible way this talk so when you hear your preferred pronunciation gives a cheer So this would be cube control patch pod I'm Too old to say it like that Right so then we the other thing we said we do for this suspect pod is to block that east-west traffic There's a few options for this. I had a slide that explained them all but we run out time So the one I chose was you know using using the Istio service measure using the authorization policy resource essentially the layer seven firewall There's a couple of things you've got it because of the way Istio models things There's a couple of things you want to do so this is blocking any traffic into the pod This is stopping anything from calling this pod that we think might have been popped You do actually still want to do this right because it could be acting maliciously It could be poisoning data. It could be it could be doing nasty stuff So you want to block any calls into it. It's actually quite easy You also want to block any calls out of it. You want to stop it calling anything else So this is how you stop people pivoting through it mostly That's not so easy with Istio because of the way their API models the world I won't bore you with the details, but you end up you need to end up basically doing something like this to tell everything else Not to accept requests from it So in turn this proactive containment what do we want to do to those other nine workloads, right? We think that deployments under attack We think they're probably going to get malicious payloads soon, but we're not sure if they're even malicious We're not sure if there's ever been compromised, but you know, why not respawn in a hardened container runtime? Because another cloud native when this isn't Disrupted to our level of service, right? We so a G visor or a cat containers or something We maybe don't run everything under these all the time because there is a there is a penalty to the performance of the app There is a penalty to the CPU cycles in the round So it's going to cost us more dollars and maybe add a little bit of latency So we don't want to use it all the time, but if we think we're under attack then then why not? So why not just respawn everything in this in this hardened runtime? So the way to do this for example using G visor It's that you would cube CTL Patch the deployment so the pod tech the pod template and the deployment to change the runtime class name This assumes you've got G visor installed and set up But again, not disruptive if you've already got that stuff set up And if you if you have tested that your app runs on the especially G visor because it's a bit weird You can just go do this So this you know so that blocking the east-west traffic if you think your pod may have been popped Blocking the east-west traffic stops it attacking other services or other microservices on the same level and this stops it basically breaking Stops container breakouts essentially right stops you pivoting onto the host Getting persistence on the host in the BIOS like they're really nasty stuff that you do not want to be dealing with So this is about preventing that So then we can come on to analysis just getting a bunch of information. So with that suspect pod We want to verbose the log the north-south traffic Remember we said we'd leave the north-south traffic flowing But we want to crank the logging right up because we really want to see which we're going to let the attack continue But we were trying to gather that indicator of compromise to know what our attack payload looks like so we can block it in The future and as I say if it has been owned we want to like we want to gather the C2 addresses and the kind of stuff They're exfiltrating so we want to verbose the log the north-south traffic We then actually want to Pretty quickly checkpoint the container because everything else we're going to do like looking at it with forensic tools might be might be a bit Invasive right in my it might it might leave a trail because we you know, we're hands-on keyboard We're like in the movies hacking red team blue team like hands-on three keyboards. So we're being pretty quick We might make a bit of a mess So we actually probably first thing we probably want to do is take a checkpoint So we've got a clean image so we can analyze it more slowly later And actually if this is a serious attack and the authorities get involved They're gonna want this before we've got our fingers all over it. So checkpointing is super important And then we do want to just get in there was forensic tools and do some old-style like looking around So how would we implement that the verbose? Network traffic logging again. There's a lot of options. They all kind of suck a bit. The one I ended up on was Envoy does not want to log every single header and the request and response bodies Because that's not what it's for and that's very slow Envoy just doesn't want to do it. The only way to make Envoy do it is to inject some custom code But luckily we can do that now with the Wazen plugins. They're really well supported So this there's a bit of config to do. This is the main resource basically tells it to use a Wazen plugin body logger Wazen blob that we've made available This thing as far as I could tell didn't exist. So I went and wrote it This is a little snippet. So this is the function You know rust memory memory safe security This is the function that that's going to log the request body This thing also logs response bodies and all the request headers and all the response headers So that as I say that didn't exist. So I've written it I won't tell the URL because it's it's not finished But it does work. I proved the point. How do we implement those container checkpoints? Okay, well, this is supported but it's super alpha. So Linux has had this checkpoint restoring user space thing for a while now Apparently works fairly well, but support from Kubernetes is still very limited So there's a cap out there about introducing this thing cryo supports it container D doesn't they're still discussing it on a PR There's just this like imperative API You have to hit on on the runtime itself. Cube doesn't even expose it yet So very alpha, but it's coming that the tech in the current well in the current or the tech in user space The cryo thing in Linux level seems to work. There's a bunch of references for anybody wants to look into it And then how do we do the forensics? How do we bring all these tools to bear? Well, they knew it newish ephemeral debug containers are a really good way of doing that So we can Cube cuddle Yeah Okay, we can keep cuddle keep cuddle debug to attach one of these ephemeral debug containers I'm using the busybox image here. You can use Cali or whatever and importantly to share the Because we've got two containers in this pod Right because we've got the main app container and the on-voice sidecar You actually need to target the main app container so that you share its process namespace And then you can do things like you know see smoking guns of malware You can see like the malware script that running So that was analysis of the suspect pod There's a lot we want to do to that because we think it's been owned. It's acting really suspiciously What do we do with the other nine pods? Not a lot But we probably again want to crank up the logging on their network traffic You know, we haven't really done anything for them except for wrap them in G visor just in case But again, we think as I say we think you know The load balancing is going to send the exploit app one of the other ones next and maybe they'll get owned And maybe they'll start calling out to see to so we just want to run crank up the logging On them instead and they're you know, the implementation for that is exactly the same as before it's your Watson filter So we can now confirm right we got all this information we got all this super detailed information So a human in the sock can go and kind of confirm a tree positive. We we know we're under attack We've got indicator of compromise We know how to tell when something's compromised and tell when it's being attacked So what do we go do about it? So we're now into containment and remember this is the NIST definition of containment So this means stopping the attack going further Including you know stopping it happening again on other pods so so blocking it which is right the kill chain That's the yeah with explosions and Michael Bay So how do we do how are we gonna do that? Well, we're gonna reconfigure the right the firewall and the WAF right fairly standard stuff But your firewall as I say used to be a layer four thing like blocking by IPs your firewall is now on boy You've got this policy enforcement point everywhere your firewall is on boy So it's a layer seven firewall so I can block requests based on HTTP attributes like path and like method If that's not in it so for like a log for shell attack That's sufficient because that JD and I string is in ahead of so I can just rejects ahead I can tell on boy to do that if that's not sufficient if we need to start matching like bodies Then you're gonna need a full WAF The reference one I guess is is Karaza, which is an envoy plug-in Which is a mod security re-implementation basically understands my security rules. So this is how you block a body. This is how you do Like stateful stateful tracking of HTTP request chains and stuff and as they say policy enforcement points are everywhere So we can also block if you've got one Java app that's been owned using log for shell And you think they're gonna try to pivot onto you another Java app using the same attack We can block that to like we can block Like those attacks east-west because it's not like we're just on a trusted subnet anymore, right with no, you know You're inside the core router. There's no enforcement. We've got policy enforcement points everywhere Then we need to eradicate so then we need to go and Identify those pods that have definitely been compromised and and get rid of them And remember we've orphaned them for the deployment. So that's actually not too bad because the attack is blocked We know it now can't reoccur So we just need to go find all the places that have been popped and we just need to we just need to Shut them down basically and you probably want to restart the remaining workloads just in case because you knew you're under attack You knew the attack was successful. So you do you probably want to recycle them? So how do we did it pods? Well, that's that's really simple I mean this slide doesn't seem like it's saying much but actually because Kubernetes is so automated such a high level of abstraction And because we've already removed this thing from its deployment nothing's gonna, you know look after it We can just Cube CTL have I not done that one? Yeah, given any CLI You know cubicle man cubicle Delete delete pod it's because it should be like I octal right And then you might optionally want to recover and I say ideally be cloud native win not needed You know, what did we do? We removed? Infested pods from the deployment. They got replaced We stopped the rest of the deployment getting popped by blocking things really quickly by running it under G visor None of this, you know, even when we changed the runtime class of those other nine pods That's a change to the pod template in the deployment. So it's done as a rolling update, right? Your pod disruption budgets are honored your minimum scale are honored So hopefully we did all of this with no disruption to service and like literally on at all of their like codified SLOs of Like pod disruption budgets, but maybe you need to do something. Maybe you need to get something back on So it's worth having this on your checklist just in case, but hopefully probably not even needed Oh Right, so what's the implementation for that what we need to restart them all but we changed their runtime class We actually have to change their service account to make the firewall work So, you know, what if we just put those values back? You're gonna get another rolling upstart Another rolling restart. Sorry. So nice side effect. Everything gets restarted I'll just quickly go to the last little bit. So I haven't really said I've said what happens and why I haven't really said how You know, maybe it was in maybe you were thinking a human was going to be doing this typing really fast Chats EP maybe you thought it was chat EPT that says a lot about whether you're under 25 Maybe you thought it was a sore run book But actually saw run books aren't ideal because though those that market hasn't kept up that text pretty old So none of them really have first-class support for calling the Kubernetes API that kind of stuff And you'd need to give your sore it's so you probably end up writing custom plugins that are just curl commands And you'd need to give order that like cluster admin tokens and it's not some horrible privilege access And that's not good. So how can we in a cloud native way? How can we automate this? So I wrote again as a pre-concept definitely not finished, but I wrote this little response program So it generates the yaml's and issues the commands that we would need to do this stuff So a lot of that stuff I showed you was declarative, you know, you do that with a by deploying a resource like the like the authorization policy a lot of them were imperative like ideas now cube cuddle Cube cuddle delete cube cuddle debug right a side note if there's any Kubernetes maintainers in the room It would be nice if more things were declarative Why can't I make a debug container with a debug container resource? Please and then remove it by removing the resource and why isn't container checkpoint a resource just like volume snapshot is But anyway, some are imperative some are declarative imagine I wrote a program that generated the declarative yaml's and issued the imperative commands. Well, I did So here it or and all it needs is the name of that deployment Right that we think is under attack and the name of in this case the one pod that we think has been popped And it just in the you know all I'm showing here is it's emitting the yaml I haven't implemented every step that I talked about but I proved the point we implement some of the yaml So we can do that But what's even but that's CLI is kind of one shot, you know, you get a bunch of yaml's on standard out You've got to do something with them. It's going to issue those commands once But if you get four to nine, you know, it's not going to try again So wouldn't it be better if that was an operator right a long-lived thing that can re-issue those commands on a loop and make sure they work and Can sort of as can create and delete and modify the yaml's in the cluster So the obvious way to do this is an operator. So that input that it took the deployment name in the pod name and now fields in a CRD So all the soil has to do all the human has to do is make the CRD saying all this deployment Under attack and I know that because this pods showing suspicious behavior make that it write that into a CRD committed to get You know, it works with your existing CI pipeline It fits into existing, you know policy authorization structure You've got your audit log from the git history of what we were doing try to respond to this incident And your sword doesn't need any access beyond the ability to do a git commit or maybe even just raise a git PR That's some security specialist and like merges So when you can do that then the CRDs will end up in the cluster through flux Right, and they'll be picked up by this by the operator version of sub responses got two binaries one's a one's a CLI One's an operator it'll get picked up by the operator that'll do the imperative commands on a retry loop and it'll render out It'll it'll patch in the declarative resources It's a bit of a whirlwind, but I think that's really what we wanted to say. I think yeah introduction to security 101 What that looked like in traditional info what it might look like in cloud native infrastructure How you would do these things in cloud native infrastructure and then actual implementation steps to prove that it all works Did anything? Not at all. Cool. That's good. So and a quick shout out to a book written by our Control plane CEO and the marketing hacking Kubernetes scan and download, please It's a goldmine of information on how to secure but also break Kubernetes clusters It really is it's like if you've got a shell in a pod like what's all the recon? What are all the vectors you can take to escalate privilege move sideways move up move down? It's actually really really good. I don't even work there well, thank you very much folks for showing up and I Don't think we have time for questions, but we'll we'll stay here for the rest of the day. So come on