 Aha, welcome, thank you so much for coming. So Meet Hero. Hero is application source code on a developer's laptop. Hero longs to be a real application, running in production, serving in users, and we're going to help Hero on their journey. Our job is to help Hero navigate hundreds of CNCF projects, choose which ones to use, integrate them with one another so Hero can live their dream. I'm Whitney, this is Victor, and together we host a streaming show called You Choose on Victor's DevOps Toolkit channel. So in our show, each episode is a system design choice, and then we gather all the relevant CNCF technologies that can do that thing. We get experts from each technology and then we give our experts only five minutes to tell us what their technology does. Then at the end of the episode, we ask the community to vote, and whichever technology gets chosen is the one we implement in our ongoing demo. That's it, right? So based on choices you've already made in the show, right now in the demo environment, we have a cluster, it's an AWS EKS cluster defined with crossplane resources. We also have Argo CD doing GitOps for us, and our application's already deployed, and we have a contour as a way for our end users to be able to access our application. So wait, it sounds like Hero already is a production serving in users. What are we doing here? It's over, we did it. Yeah, good job. So the thing is there's no security in our production cluster, and so Hero's in danger, and so are the users in the system, so we need to save the day. So this is where we need your help. Please get out your phones and scan this QR code, and we're gonna do a live vote today. We're gonna go through four system design choices, and then Victor's gonna build a demo based on what you choose. So we're gonna add cluster level policies, we're gonna do runtime policies, we're gonna manage secrets, and we're gonna secure pod to pod communication. So let's do this thing. I love how Victor hasn't said a word yet. This is great. All right, so first up we're gonna do admission controller policy or cluster level policy. Our choices are Kyberno, OPA Gatekeeper, and Cube Warden. But first of all, why do we need policy? Well as things stand, our applications could become bloated and use way too many resources, or they might run outdated versions, or they might pull from untrusted container registries. So we need to add rules to our cluster that will help prevent this and a bajillion other things that we don't want to happen from happening. So a policy is a rule that defines expectations of actions that are not allowed to happen to objects in our system. It's important to know that it's organization specific. So we could do things, for example, like add resource limits, or we could declare which versions we want to use, or we could prevent our container from pulling from untrusted registries and also a bajillion other things. So how does it work? So when we make a new rule, we commit to, sorry, I lost my train of thought. When we make a new rule, it adds configurations to Cube API in the form of an admission controller web hook. So basically if you try to make a change to the cluster that falls under the jurisdiction of this rule, it's gonna ask for permission from the admission controller. The admission controller is a piece of software that's running and if it's a validating admission controller, it's gonna answer yes or no. If it's a mutating of admission controller, it might change something about the resource. Either way, it's gonna return a response and then the Cube API is gonna act in accordance with policy. So let's talk about our tools. The first one is Kyverno. Kyverno is an admission controller that's built specifically for Kubernetes. So you can write policy in YAML, kind of. It's really a, let's say, Kyverno JSON query language. And it's meant to look and feel like YAML with the idea that people who are working in Kubernetes can reasonably author and they can reason about policy well. There's a ton of other bells and whistles. It's a mature technology, but that's not a different, those aren't differentiators, so I'm gonna move on. Next up, we have OPA gatekeeper. OPA stands for Open Policy Agents. And with OPA gatekeeper, well, let's say Open Policy Agents, you can use that with lots of different stuff all over your estate, not just cloud and not just Kubernetes. And OPA gatekeeper, the gatekeeper part pulls OPA into Kubernetes, so you can use it in Kubernetes. It has one big benefit and one big drawback. And those two happen to be the same thing. And that thing is Rego. So with OPA gatekeeper, you write your policy in Rego and then you can use it across your entire system. Your policies are reusable. The drawback is Rego is notoriously difficult to work with. So maybe, just maybe, you don't wanna write policy in YAML and you don't wanna write policy in Rego. What should you do? Maybe you wanna consider Qboardin. With Qboardin, you can write your policy in any language you choose and then that policy gets compiled into a WASM module. The WASM module's an OCI artifact. You can even write your policy in Rego and put it in a WASM module if you really wanna make your life hard for no reason. And then you can run that WASM module as part of the Qboardin admission controller. So now's the moment. Which do you choose? Please? Uh-oh. That's how much people like security. Oh, this is exciting. So I just wanna remind y'all that this is not like a popularity contest. It's not a competition. All it is is like, what do you wanna learn more about? So if there's one you already know, don't vote for that one, you know? It seems like there, it's a consensus. Okay, shall we close it? Yeah, we're gonna close it with Caverno. Let's see it. Okay, now I'm allowed to talk obviously. Okay, cool. So I'm going to copy some policy that I prepared. I will show you the policy, but I wanna copy it mostly because, why? Because it will take a bit of time until Largo CD picks it up. So I'd call me dash M, whatever, and push, push, there we go. Now the thing I pushed is here, policy is Caverno, right? Caverno, right? This is a policy I'm going to push. Three policy, or more policies to be more specific. What can we do with the cluster? Sorry, not what we can do with deployments. And this one says, hey, this policy applies to creating and updating resources and applies to the namespace production and the number of replicas needs to be greater than two. Right, a very simple policy. And then we have another one for databases. Hey, you can create databases, in this case using cross-line compositions or composite claims. They need to be small, medium, or large. And if you try to create databases specifically in production, then they need to be medium or large, cannot be small, right? It's a very simple policy, it's nothing special, just to demonstrate how all that works. Now, while I was talking, hopefully, Argosvd cluster policies cease. Argosvd probably, yes. Argosvd picked it up from Git, deployed it to my management cluster, and you can see here I have those three policies that I explained before. Now, what I'm going to do next is the most important part, and that's the time I'm going to copy CNCF demo YTT, which is my application, defined as Carvel YTT, and now if you're asking, why is Carvel YTT? That's because in the previous talk we had Nobody Knows When, people chose that one. So I'm copying my application, and I'm going to edit and commit and push. There we go, and now Argosvd will pick it up and we can watch it. So let's say kubectl get dash dash namespace production, get deployments, for example, right? No, that's the wrong thing. It's not deployment, that's deployments, there we go. Now, let me just make sure that Argosvd picks this up and it creates something. It might work, it might not work. My measurement of success is that if it doesn't fail at the very beginning, what might be happening is that forwarding Argosvd, let's say, oh, okay, yes, it's getting back up. We need to find out, there's my application, cool, right? So let's just try again and then you try it again. Anyways, my application has been deployed, but the important part is that if I do kubectl namespace production, get all, you will see, hopefully, that there is a pod, but not from the application, there is a service, there is a status, it's not from the application, the deployment is missing, right? Why it's missing? Because of the police, easy. So kubectl, let's see what's going on, kubectl, kubectl, the namespace production. It's very hard to type while this thing is sticking in my eyes, describe deployment. No, there is no deployment. Yeah, sorry for that. There is no deployment because Kavanaugh prevented it from being created, right? So what I'm going to do next is that I'm going to increase the number of replicas of my application. Do you have the prompts to the top of the screen? Yes, I can. Thank you. Here we go, right? Kavanaugh prevented it from running. That didn't stop. And what? It's still on the bottom. What's at the bottom? The prompts, like people in the back can't, oh, that thing, okay, okay. You're too demanding. That's just, and there we go. Okay. Is it enough? Beautiful. Are you happy now? Yes, thank you very much. By the way, they call her grumpy in the last talk. This is a proof. Okay, let's fix it. I don't have enough time to chat with Whitney. Vim, I'm going to fix it. Why did the values prod? I'm going to change something here. What am I going to change? I forgot, I forgot. Yes. What? Yeah, replicas, exactly. Replicas. No, I forgot what I put it. What did I do? Two, they say. Two. Yeah, but I'm not sure whether that's the value I put. Let's say it's that one, right? And let's say that I might be completely wrong here, but I know I forgot something. Okay, that's fine. That's fine. Maybe it will work. Maybe it will not. Get, there we go, push. And now it will adjust for a while. We can go to the next choice and come back to this later. Well, one pens and needles. And you want slides now? Yes, please. Okay, I'm going to get your slides. There are your slides. We tried to move the prompt. You're inspiring all kinds of confidence. Runtime policies is what we're talking about next. We're going to talk about FACO and cube armor. But first, what are runtime policies? So do you completely trust everything that's running in your cluster? Do you trust the internal applications? Do you trust third party tools? Do you trust your dependencies? Do you trust every single process that's running in your whole cluster? Well, how do you know if something suspicious is going on? And that's where runtime policy comes in. So with runtime policy, we're looking at what's happening at the kernel level of each host machine. So we're looking at kernel events and looking for suspicious behavior. So we're looking for unknown unknowns at runtime. So with runtime policy, you monitor your application, define your expected behavior, and then you can make a runtime policy that does something when an anomaly is detected. Now what that something is depends on our technology. So let's jump in. First up, we have FACO. FACO is a cloud native threat detector. FACO has a few different sources. So you can get kernel events from the kernel module, or you can get it from the EBPF probe. But you can also, there are plugins. So you can pull in like cube cuddle logs or cloud provider logs, or there are a bunch of them, 13 or maybe. So that's FACO. What happens if an anomaly happens is an alert is triggered. And so that alert is really just a text that has some sort of a priority associated with it. Then that alert you can forward to other. So you can use FACO's sidekick to forward alerts to compatible destinations. They're like 50 or 60. So like email or telegram or whatever. And then you can use FACO talent to do something when an alert occurs. So what that something will happen after the alert and it might be kill a container or get extra information. And then we have cube harbor. With cube armor, it still does the monitoring. It still does the alerting, but it does attack prevention. So this time when an attack is attempted, it fails. So it never happens at all. So cube armor also uses EBPF to hook in the Linux kernel in a safe and performant way. And then it also uses LSMs or Linux security modules which are basically access control to the Linux kernel. And that's the mechanism it uses to prevent attacks. It's a younger project, FACO talent is graduated. And it's time to vote. Okay, can I finish my previous section while they're voting? No, because we wanna... Yeah, fine. Cube armor. I see that you folk of folks are trying hard with moving but not... And they're not usually exclusive. Like I'm pretty sure you can forward cube armor alerts to like FACO sidekick. Just FYI, it's not a competition. Yeah, all right. You're up. Okay, I'm not going to implement that. I'm going to implement where I was before because I forgot a couple of things. First of all, database size should be medium because the policy said that it must be medium in production. And more important thing that would prevent me from doing anything is that I forgot to convert from YTT, carol YTT to pure YAML before putting it to get... You know what? I'm getting lazy. I'm going to copy it. I have it somewhere. Look at this. I don't know. This is a get repo that you have access to also. Yeah, there we go. There we go. There we go. Okay. And now I can get push and add and commit and push and whatever order doesn't matter anymore. There we go. And now if I go back to Argo CD and refresh it, is it working? Yes, synchronize, refresh. And then Argo CD is still thinking that it needs to do what it was doing before. So I'm going to stop it and then start it again. And there we go. Deployment was created and the database was created. And... Ba-ba-da-da. Cool. Now what was chosen? Q-Barmor. What was chosen? Q-Barmor. Q-Barmor, okay. Q-Barmor. Let's do it. What first? Cube-Cuttle, okay. So what I'm going to do is try to do something I shouldn't be able to do in a container of one of my applications. So I'm going to do Cube-Cuttle. No, Cube-Cuttle. That's the name space, name space production. I wanna exec something and that's something is CNCF demo controller. This is actually Schema Hero now. And the reason why I'm using Schema Hero instead of my application is because my application is based on scratch image and there is nothing you can do to it anyways. But I'm going to say, hey, I want to output all the files in this container. This should not be allowed because if somebody malicious comes and outputs and they can do stuff to those files, this should not have happened. So to fix that I'm going to say, I'm going to annotate name space production and say cube-armor file-poster equals block and overwrite what over... Why don't you correct me? Are you French? You don't know English. Is that the thing? Overwrite. Okay, and now the last thing I need is I'm going to copy the policies for cube-armor. There we go, cube-armor and to infrastructure run doesn't matter, here we go and I'm going to push it to git and commit and push, no, add commit push, push. There we go. And now I'm gonna see this should deploy cube-armor and I will be able to apply the rules and you can see the policy that I just applied. Here it is, that's what I pushed to git. I'm just saying, hey, this essentially, long story short, this is the only thing you can execute in that container. Anything else? Not allowed, right? The short version of all that. Now if I say cube-armor name space, name space, ah, cube-armor-name space, production, get cube-armor policies, there is no policy, there is no policy, there is no policy, infra, because Argosy did not get infra, I said. There we go. Not yet pick it up. Fill, probably, eventually. Is it there? There we go, there is a policy. No, it's not yet, wait, wait, wait. Now I'm going to execute the same command before, right, this is from my history. Remember this command, it output all the files? That's not the reason to clap. Come on. I don't know what I did wrong. I'm gonna probably annotate it in bad cube-armor, or armor-file-posh, that sounds correct, block. You know what, imagine that it worked. Let's go on, imagine that it worked. In the real world it works. When these things are not sticking in my nose, it works. It's important to know it's Victor's feeling and not cube-armor's feeling, I think. Exactly. All right, you wanna keep doing it? It still works, doesn't it? Okay, okay, go for it. You can think it through while I talk about secrets management. So for secrets management, we have three tools, external secrets operator, secret store, CSI driver, and SOPs. So with secrets management, our application and applications like ours have a lot of different confidential information they need to manage. So most of this can be done with a vault. A vault is a technology that basically stores secrets safely and it has an API to access it. That's the bare minimum. A vault, whether it's solved, is probably also doing secrets rotation and helping with remediation. But the vault is not what we're talking about. What we're concerned with, how do we get our secrets into Kubernetes? So first we're gonna talk about how external secrets operator does that. So what it does is it connects to the vault API, it gets the secret, and then it writes that secret, it makes it a Kubernetes secret, and it also manages the life cycle of the Kubernetes secrets. Then our application can get it, and ba-ba-da-da! Our application has its secrets. External secrets operator has two rad alpha features. For one, it can take Kubernetes secrets and it can push them to the vault, and it can also generate Kubernetes secrets. Next up we have secret store CSI driver. As you can maybe guess from the name, it uses CSI or container storage interface. The idea here is they think that Kubernetes secrets aren't very safe. They're basically just base64 encoded and stored in at CD. So they wanna make a secret solution that doesn't use Kubernetes secrets. So what secret store CSI driver, the first thing it'll do is mount a temporary file system to the pod. Then it reaches into the vault and gets the secret and writes the secret to that volume. And then our application can access the secret as part of its file system. Da-da-da! And then the next one is SOPs. This is short for secrets operations, but if you call it secrets operations, no one will know what you're talking about. With SOPs, all it is is it's a CLI that can encrypt and decrypt files. So here we are defining our Kubernetes secrets and we can use the SOPs CLI to encrypt the secret parts. And then there's also a decryption key associated with it. So how that works in our workflow is our vault is now storing the decryption key. It's not storing our secret anymore. So because that is encrypted, we can commit it to Git. So we commit it to Git, Argo CD picks up the change, and then Argo CD sees it has an encrypted file. Now you have to teach Argo CD what to do. So we use a configuration management plugin and then it will pull up the SOPs CLI which gets the decryption key and can decrypts the file. Then I can be applied to Kube API, we have Kubernetes secret or we can access it and da-da-da-da-da-da! And that's SOPs. So now you get to vote please. It got really quiet in here. They're all concentrated on voting. All right, it seems pretty definitive. Secret store CSI drivers, what they've chosen. S-S-C-S-I-D, cool! Okay, I'm not going to do it. And the reason is very simple. It's actually great, but it works only with your applications and I'm planning to put the secrets for my database that is used by Crossplan and any third party application expects secrets so volumes will not work, right? Great project, doesn't work for me today. So, I'm going to choose myself what I'm going to use then and I'm going to use external secrets. This is the first time I'm doing that, meaning that I am pro-choice, but not today. So if I take a look at the secret, CNCF demo DB password, right? And I'll put it to YAML and this is the secret I already have in my cluster, right? And you can see the password and if I output that password then base 64, base 64 decoded, you can see that, what is wrong? What did they do wrong? Equal in the end. Okay, okay, cool, cool, cool. There we go. That's my password. Now, the problem with this password is that it is hard coding in a manifest that I applied before I started this talk. This is unacceptable because if you go to my GitHub repository, you will find out what the password is. We shouldn't be doing that. So instead, what they have is this. I'm going to go to my application and here, there we go. This is my application right now and somewhere there is that secret. There is this secret, right? Unacceptable, anybody can see it. So what I'm going to do is open that, you know the values pro that I used before, unsuccessfully and I'm going to say that my database should not be insecure. This is false and I'm going to execute YTT again to generate the manifest which probably this time might work. Who knows? There we go and I'm going to push it and commit and push it and do all the things. And now if I do kubectl touch, touch namespace production get external secrets.externalsecrets.io. It's probably not yet there but I will refresh my application in Argo CD up, there we go. This one, come on, come on, come on. Uh-huh and now if I, there we go. There is external secret, right? Now what this did is that this is now, this now went to my external secret, to secret manager in AWS and retrieved the secret for me and if I do get secrets, you will see that there is a secret my DB password and if I output that just as before and this time remember the copy, the everything, there we go, echo dash n. This is the secret from my secret manager. It worked, this time, two out of three. Let's go with me. Let's do it, one more. So next we're going to secure our pod to pod communication, our choices for this are Istio, Linkerd and Silium. So there are three aspects we need to concern us with to secure pod to pod communication. The first one is identity. We need each of our pods to have identity and we need them to be able to verify each other's identity. The second one is encryption. We need to encrypt the connections between the pods and then finally we need policy. So pods that shouldn't be able to talk to one another, shouldn't be able to talk to one another, don't let them. And Istio and Linkerd both do this in a similar way, in a service mesh way. So here's our cluster, it's flattened so I can illustrate how this works. So basically Istio and Linkerd, they inject an extra container into every pod that's running. And then Istio or Linkerd can act, oh and then those containers act as a proxy for all the requests going to and from the pod. So then Istio or Linkerd acts as a control plane that a human could use with lots of custom resources to do the complex work of configuring how all those proxies talk to one another. So that's called the data plane, is all the proxies. So what differentiates, oh let's talk about how it relates to our three things. So Istio and Linkerd use MTLS, which does the identity and the encryption all in one step with certificates. So it's using digital certificates and X509 certificates. And then you can use policies that you create with the tools to restrict access between pods. So Istio and Linkerd, Istio uses Envoy as the data plane which is very powerful and very complex. And that Istio is a super full featured tool. If you can do it with service mesh you can do it with Istio. And Linkerd can do most things, almost all of the things but really it focuses on operational simplicity and performance. So Linkerd uses a custom written rust proxy as its data plane. And the Linkerd people will tell you that this proxy uses only 10% of the resources of the Envoy data plane that Istio uses. But if you ask the Istio people they'll say, ah that used to be true a few years ago maybe, but these days it's about the same. I don't know who to believe so I'm just presenting both sides. Then we have Cilium. Cilium this is a whole different beast. So Cilium is a CNI implementation. CNI stands for Container Networking Interface. So Cilium and I is a specification for how a Cubelets gives an IP address to a new pod that it's creating. So Cubelet and Cilium might have a conversation like, hey can I have an IP address please? And Cilium would be like, yeah totally here's your networking stack with an IP address. And Cubelet like, thank you so much. And then Cilium's like, also I've given the pod an identity using Spiffy. And Cubelet's like, I don't know why you're telling me this. It doesn't have anything to do with me. And Cilium's like, also I have this EBPF program that I'm running at the kernel level in the pod network main space. And Cubelet's like, seriously. And then Cubelet's like, also this pod generates events for everything that the pod does. So this EBPF program generates events for everything the pod does. And the Cubelet's like, I think I hear Cubate PI calling me, I gotta go. And so here we are Cilium. How does it work with the three things that make pod to pod communication secure? So first of all, we already talked about identity, it does that with Spiffy and that also secures connections between pods. It does encryption, you can do that with either Wireshark or IPsec. So now these two things are decoupled and in our other use case, they're together. IPshark or Wireshark and IPsec. Yes. And then also you can configure it to restrict access between pods. So the thing about Cilium is, as a CNI, a cluster cannot operate unless it has a CNI. So we must already have a CNI running in our cluster. And if it's not Cilium, it's not something you can just like switch out. Like you'd probably have to take down your cluster and rebuild it. So Victor, do we have Cilium running already? Yeah, it's the default anyway. It's the defaults. So there's not really any competition for a Cilium. You might not have it as your CNI if you just went with your cloud provider defaults if you're using a managed cluster. But if you put any thought into it at all, you'll almost certainly have Cilium running. So our choices are Istio, LinkerD, and Cilium, which is already running. But I can put LinkerD or Istio on top of it. So don't worry about that. Yes. And there are reasons too. Cilium is great for the three things we talk about, but in terms of all of the features that Istio and LinkerD can do, it's not there yet. I think that's it. I think that's Cilium. Okay. So, there we go. There we go. I'm going to execute a quick script that will help me set up a couple of things that are just boring to do manually, Cilium SH. And it will install. Actually, Cilium is already installed, but what I did not install in advance is Pifi, right, which gives identity to my stuff, and that will probably happen soon. We can tell them a joke while we're waiting. Why did the man fall down the well? Because he couldn't see that well. Can you do a funny joke for a change? I'm... Why did the monkey fall out of the tree? Because it was dead. Okay. No more jokes for me. No more jokes for me. No more. Okay. Isolation of Cilium is going. Do you want to open another tab and try to troubleshoot Cuba or not? No, it's not happening. Maybe, let me see. Cubicuttle source. There we go. So we do have a repo that we'll have a QR code for later, but if you don't like the choices that your peers have made today, you can go to our repo and you can explore how other choices might work together. Also, you can do it from the very beginning of the story. So this is... We've been doing this for over a year, so we've gone through quite a lot of system design choices at this point. Okay. There we go. It started. Actually, it takes a bit of time because if you install Spiffy later, some things needs to be restarted and whatnot, right? Anyways, so if I do Cubicuttle, Cubicuttle, an SPS production get Cilium end points. What are we doing? End points, yes. There we go. You can see that in my production cluster, the pods that were already running there got their own end points but they should be actually now speaking with the other things in or outside or in the cluster, actually, mostly securely. What else? What else did they want to show? No, that's not what they wanted. Yes, yes, yes, yes. Let's take a look at the policy, for example. How much time do we have with me? The red shirt. Red shirt, two minutes. Okay, very quickly, policy. Cilium, network policy, there we go. There's one policy over there that says, hey, if you go from this end point of the application in sleep, only those that are, let me actually see the policy, match label sleep. Yeah, only those that have specific label up of a sleep can actually access this specific application and so on and so forth. We have no time to show you that because I would need to spin up a couple of other applications, but there is an entity, secure communication between pods and policies that you can define. Here is more secure than they were before. Will you put the slides back up? They're not more secure, but okay. We didn't, and not every choice. There are other service meshes we didn't talk about. There are a couple other secret solutions we didn't talk about. We just talked about what we had time for. So here's our hero running in a safer production cluster. There are also lots of security choices we didn't choose. This QR code is a feedback form. Please give us feedback. And then that one is the link to the CNCF demo. And we'll be doing our last episode of you choose of just a recap and end to end demo. We'll be doing that at booth C1 on Friday morning at 10.30, which is the Tanzu booth. Don't come, it's gonna take two hours to go through it all. Come hang out with us for two hours. You must have better things to do. Okay, that's no time for overtime limit. We are getting kicked out. No time for questions. We'll be around. I'll have you choose stickers. So please do stop me and get one. Thank you so much, y'all. Appreciate you.