 All right, so a little bit about me. I've been an engineer at Super Orbital for about, like, one and a half years. And at Super Orbital, we try to accelerate our client's cloud native engineering using our tailored training and engineering services. I've been personally doing Kubernetes things since 2019, and I liked freshly baked bread and beauty game speed runs. So that's a little bit about me. The agenda for today, we're going to show how Scyllium and network policies work. We're going to have some diagrams, a little bit of YAML, sorry about that. Brief refresher on certificate generation for TLS. And then we're going to talk about the problem, right, the space on real needs and how we at Super Orbital designed our solution using all of the topics that we're going to talk about today. Then we're going to go to our takeaways and our conclusions. And any code that you see here, don't worry about writing it down, because it's all going to be available in the repo that I'm going to show you at the end of the presentation. So just keep that in mind. That said, the goals for this presentation are to explain difficult concepts using easy to understand language, provide a working example that can be used by your organization, and motivate you into trying Scyllium for yourself and see if it fits your needs. So Scyllium and Layer 7 network policies, or how I learned to stop worrying and love the EVPF. Scyllium is a CNI, as most of you are now aware, which is just an application that provides networking for pods in a Kubernetes cluster. It allows us to manage secure and monitor network traffic between pods in a cluster, and it's built upon EVPF, which allows for execution of code within the Linux kernel. You're going to hear about other isovalent products that use EVPF, such as Hubbell, which provides network monitoring, and Tetragon, which is security visibility. And the reason why EVPF is so cool is because it can extend the capabilities of the kernel without having to change the kernel source code, or having to load a kernel module, which means it's fast. The EVPF also has the capability of scanning your little code that you add for any potential security issues, so it's also secure. So that makes it customizable for whatever your specific needs are. Alongside that with Scyllium, the Scyllium agent runs on each node as a pod, the daemon set, and it does most of the EVPF heavy lifting. It loads the EVPF maps, reads the configuration from the Scyllium configuration from the cluster, and then finally starts handling IPAM. Scyllium comes with a lot of different features, such as service load balancing, if you want to replace koo proxy on your cluster, cluster mesh, service mesh, and the network policies, which is what we're going to be talking about today. So a network policy is just an object that you would install on your cluster that describes like how would you like to limit the kind of network traffic that enters a workload on your cluster. Each Scyllium agent will load all of the policies stored in the cluster, and it follows the whitelist rule. What does that mean? If traffic does not match any of the rules on the policy, it is dropped. If you have two rules and one of them is like a broader rule and the other one's like more narrow, then the policy will apply the broader rule. And then finally, if you have two different rules and they intersect, then the intersection of those rules, the traffic that matches the intersection of those rules will be allowed. The rules are split into an ingress section and an egress section. So the ingress section applies to traffic entering the pod and egress applies to traffic leaving the pod. And if both of those sections are not present, then the rule doesn't apply. So in here, I have some arrows pointing to the YAML that I provided. At the top, you have the name and description of the network policy, which just says like I want to control access to this prod dv that I have, my SQL dv that I have running on my cluster. The second one says this policy applies to pods, endpoints, right, that match the label prod dv my SQL. So any pod that's labeled prod dv my SQL will have this policy apply to them. And the third arrow indicates the ingress rules, right? And it says that any traffic that enters this pod it needs to match the port and protocol to be allowed. And also it needs to come from pods that have a specific label. When in this case it's employee type dba. So that's that's an example of a network policy. You'll notice that it operates on the layer four because you're specifying protocol and ports. So I'm saying layers, layer three, layer four. Let's talk a little bit about layers. The network policies can handle layer three, layer four, and layer seven rules. Layer three apply on the basic connectivity layer. So like the endpoints themselves, Cedars and DNS think about, you know, if I want to just limit traffic to Google.com or a specific IP address, that's layer three. Layer four allows you to specify which protocol and port you want to filter on. So I think any I want to filter on any TCP traffic that's coming from port four, four, three in Google.com. And then finally the layer seven rules, which are application aware. So there's HTTP Kafka DNS support for for network policies in Selium. And we're going to look at the HTTP layer, layer seven network policy. So we're growing really deep into the rabbit hole here at this point. Layer seven HTTP network policies build upon all of the capabilities of layer three and layer four. So that example that I showed earlier, it does all of that. Plus it adds egress and ingress rules that are HTTP aware. So you can craft rules that apply to specific attributes for HTTP, such as like any HTTP paths, any methods such as get, put, patch, delete, host headers, arbitrary headers on the HTTP traffic. And on a violation, instead of dropping the traffic, it will return a four, three forbidden back to the pod. So the capabilities of this are enormous, right? You have fine grain control to any RESTful HTTP API. And you have reggit-based rules for your pathing. So you can craft rules that really target to your needs security-wise. And just to drive point, drive home the point, I'm going to show you an example of the layer seven data path through Selium. So one thing I haven't mentioned yet is that the way that this works is that Selium injects a layer four proxy in between all of those requests that trigger that rule. Specifically, Selium uses Envoy. And when the layer seven policy is installed, Selium starts the proxy processing and all of the layer seven requests that match that rule will see its usual data path change. So by default, they enter the BPF layer and then they just go to the pod. So after you've installed the policy, you'll see that the Selium agent configures the BPF redirection and the Envoy proxy using the XDS API, which I won't go into. Then the traffic comes in and gets redirected to Envoy. Envoy then looks at the traffic and depending on the attributes of the traffic and the rules they have defined, it determines whether your traffic should be allowed or should be denied. Finally, regardless of what the results are, the traffic gets sent back to the BPF layer and then makes it back to the pod. And I know what you're thinking at this point, like, oh, this is cool. If Envoy has all this capability of introspection, like what happens if my connection is encrypted? Can it still do this? Well, HTTPS, yes, you're right. HTTPS traffic is encrypted and therefore it's unreadable by Selium by default, but we can configure Selium to intercept TLS encrypted connections. If we have this model where we have our own internal CA certificate authority that can be used to create certificates for external destinations, then we can just kind of man in the middle of the traffic and inspect the traffic to determine whether or not it meets the rules and if it should be allowed to deny before originating that connection back into the pod. So that allows Selium to inspect and even modify any data before going to the destination. So TLS and open SSL and certificates. Yeah, PKI stuff is kind of complicated. I'm not an expert on it. So don't worry too much about it. I'm just gonna kind of skim the surface of what we need to know for this solution. So no worries. Creating a certificate authority. Creating a certificate authority. So a CA is an entity that's empowered to create certificates and the certificate is just used to provide proof that whatever you're connecting to is truly who they say they are. All computers come with default set of CAs. That means that any certificates that are created by those CAs can be trusted, right? So for our solution, we're gonna need to create and add our CA to a list of trusted CAs on each workload whose traffic we'll need to inspect to determine whether or not it should be dropped. So in here, I added some maybe slightly wrong but kind of similar like open SSL commands that you would need to do to create like the CA certificate. And the CA will be then using to request a certificate from them, right? We're looking to create something called a certificate signing request, which is CSR, normally known as CSR. And the only thing we really have to worry about is that the common name or CN for that certificate matches the exact domain name that's used by the client when initiating the connections, right? And then we're gonna use our own CA and generate the certificate and then we'll somehow provide those certificates to Scyllium. I think that's all of the little bits of topics that we need to know before we go on to the real world problem. At SuperOrbital, we provide training with using hands-on experience with Kubernetes concept and real clusters, right? So each student at SuperOrbital gets a workstation so that they can work on our laps. Each pair of students receive one single code server instance and this code server instance has full access to the outside world. And you might be saying, oh, like, you know, students come from all different backgrounds. They have corporate VPNs to deal with, firewalls, ISPs, issues, they can be annoying. And also, some of our students are pretty smart and giving them full, unfettered internet access can be powerful and also it can cause you to shoot yourself on the foot. And when that happens, our lab instructors need to deal with the fact that, oh, that's why our students are struggling so much. Their workstation is just completely host. Providing a way to preventing the students from blowing up their workstation is actually something good that we can do for a lab instructor so they can continue on giving their classes. So let's create an OSHA compliant development space. No more hazardous workstations. So how do we do that? Well, we like to keep our workstations open for ingress purposes, right? They already have to deal with VPNs and we don't wanna add any more layers of like, oh, you have to like go through this proxy or whatever. But we do wanna limit egress. Some of you may have already figured out what we're gonna do with it. We're gonna say, we're gonna say layer seven policies. And just for demonstration purposes, I'm gonna just focus on access to github.com. That's gonna be the API that we're gonna target today. And specifically we want our students to be accessing only the super overall repos in their workstations. So let's connect all of the pieces. First off, we're gonna deploy cert manager onto the cluster, right? We're gonna use Helm for this. Helm is basically the app get for Kubernetes. One thing to note is that when we are using Helm to install cert manager, we do have to have the CRDs already present in the cluster. If not, you're gonna have to set the flag that says install the CRDs, otherwise the installation is gonna hang. And cert manager is there to allow us to provision certificates for our workload in Kubernetes. We're only gonna be using it for our CA certificate. There is some experimental support for CSRs, but for demonstration purposes, we don't wanna do any experimental stuff. We kinda wanna tread on the tried and true path. So this is kind of how our certificates would look with cert manager. And with these CA certs that cert manager will create, now we can create our inspection certificates inside of the cluster. So you'll notice the metadata and the spec for our CA certs is pretty simple. So now we're gonna use open as a cell to create, and bash to create a script that can generate our certificates. This bash script is gonna do a few things. First, it's gonna get the CA certificates that cert manager kindly created for us. Thank you very much. Then it's gonna generate our termination certificates. And it's gonna deploy these to the cluster. Additionally, we're gonna have to create the originating certificates by bundling our CA with the rest of the trusted certificates so that we can also provide that to the workload. So we'll just have our bash script do that as well. Keep in mind that we're gonna be publishing the all these certificates to the cluster for Selium to use and for our workloads to use. So the script is gonna have to push them as secrets in the cluster. And finally, we're gonna want to run our bash script in the cluster. No one likes running a bash script in their laptop and then close their laptop and suddenly all your stuff breaks because your bash script stopped. So we're gonna build our own Docker image and we're going to push it to our container registry and we're gonna deploy from this container image into the cluster. So how do we do this again? With the Kubernetes job. You can see in here the YAML for the job definition. We're gonna deploy a job whose sole purpose is to generate the certificate that's signed by our CA. It runs our custom script. You can see in the image, it's a little bit of a placeholder but it does run our image with OpenSSL available and the script mounted in it. And then finally, since it's a job which runs in our cluster, we're gonna have to provide it with the appropriate RBAC so it can only read and create the secrets that it needs to have access to. We don't want to give it cluster-wide permissions. Now we deploy our network policy to the cluster. Note that the ingress section in this network policy is missing and that's intentional. You can see in here that we are trying to match on github.com so any workloads that are targeted by this policy will only be able to access github.com on port 443. And specifically the last part where it talks about the rules in HTTP, we just want them to access the super orbital repos in github.com. And that's, it's so simple, right? The API for a Syllium network policy for just HTTP control is very easy to read and understand so that's why it's the best tool for this purpose. Finally, we're gonna want to give the certificates to Syllium so it can inspect this HTTPS traffic. So on the same network policy we're going to set the fields that set the originating TLS and the terminating TLS. It's gonna be referencing secrets that are already in the cluster which were created by our script and it's just the certificates. One thing to note that is that if you're installing Syllium with the Helm chart you're gonna have to set the secret backends to kates as opposed to the default flag. That way Syllium has the necessary permissions to get secrets from the cluster to perform its network policy filtering. Finally, we're gonna have to mount those certificates on the pod. Our code server pods are Ubuntu based and the default path for these trusted certificates are well known in Ubuntu. So it's just a matter of getting that secret mounted on the pods themselves and thank God we're using Kubernetes because we can just use a simple combination of volumes and volume mounts and sub paths to overwrite the default certificates in the default CA certificates in the container with our CA bundle that contains our special CA so that any TLS clients running inside of the container can trust the inspection certificate that we're gonna be creating for Syllium. Finally, we have the results of all of this work where I am pretending I'm one of our students and I am cloning from the super orbital repo and I get all the files from Git but then I try to clone from Kubernetes and I get rejected. Obviously you can add more to this policy to be able to permit more and more repos and then you can expand the set of certificates and you can create so that Syllium is able to intercept and be able to apply the rules but you get the gist of it, right? It's a very powerful tool and the difficult part is the certificate part which is why I decided to talk about this. There are some potential pitfalls. When you're seeing Syllium network policies, any interruptions to the agent pod that's running in the cluster means that traffic will not route to the pod while it's restarting. Now this is lessened somewhat by Syllium 1.14 which now deploys the Envoy proxy as the demon set and avoids this issue but if you choose not to do that then this is something that you have to keep in mind that if you're upgrading Syllium from one version to another and your pods are restarting then for a brief period of time if the pods are restarting the rules are not capable of being enforced and therefore all traffic will be stopped. It's a good failsafe but it can be somewhat disruptive and it may vary for your workloads like how much you're willing to tolerate that. Another pitfall that you can find with this solution is that any secrets and config maps that are mounted as subpath volumes in the pod will not receive any updates. So after you've created your pod with the subpath you kinda have to restart the pod if the secret changes. For us it's not a big issue. Certificates are good for a year and there's no class that we give that ever exceeds like a week. So these pods are gonna just be blown away after a week so it doesn't matter for us. But if you're gonna have some long running pods you have to make sure you have some sort of policy to restart the pods every once in a while so that they pick up any updated certificates if necessary. Finally the code on a repo that implements the solution deploys a job for the certificate generation when it would probably be better to have it be a cron job. We're using job because it's simpler and for this context it's fine but a cron job would probably be better for that. So what have we learned? We love Scyllium. It adds and unlocks the potential to add security and depth and it's very simple and really fast configuration. It's performant at a large scale thanks to like the EVPF technologies and it's 100% cloud agnostic. That's like the thing that we teach at Superurbal. We love cloud agnostic stuff and sometimes people are kind of scared about not fully understanding Scyllium's capability so they're kind of underused. It's more than the CNI, right? And we wanna make it simpler for people to learn about this so they can use it and that's what we do at Superurbal. We tackle difficult problems for clients every day and we help them implement complex solutions such as this for them. So that's all that I have for today. Please, please, please, I urge you to go to our repo at Superurbital. Huge thanks to the ISO Valentin for their support and for any information about Superurbital and what we do, just go to superurbital.io. That's it.