 All right, welcome back. I hope everybody had a great lunch. My name is Thomas Graf. I'm C2N co-founder of iSurveillance, but probably more importantly, one of the creators of Cilium and a maintainer of it. To me, I would like to share my thoughts on how we, I think, should be securing infrastructure going forward by combining network security and runtime security together. Because Cilium already has a partial answer for this, because if you look at the Cilium ecosystem overall, it actually consists of multiple projects. Most of you know Cilium as the CNI, as a Kubernetes networking solution. Some of you may have seen the Tetragon talk this morning talking about runtime security and point security, defining rules on what a process is allowed to do and not to do. We, of course, also have Hubble from a network observability site and Cilium service mesh to offer service mesh features. In this talk, we will focus on these two projects, Cilium as a CNI, implementing network policy and mutual authentication, as well as Tetragon for the runtime enforcement side. For those of you who have not seen the Tetragon talk this morning, so what is Tetragon? Tetragon is a project as part of the Cilium family and it offers security observability as well as runtime enforcement. What does that mean? It means that we can run Tetragon as an agent on any Kubernetes node and it will install EBPF programs to extract telemetry from the system such as what files are being accessed, which system calls are being done, what network calls are being done, when does a container escape from its namespace, what sort of privileges or capabilities is a process run with and so on and can export that via logs, traces or metrics into whatever observability stack you're running. And it can then furthermore take policy where you can define what processes are allowed to do. For example, if it attempts to read or write certain files, you can kill the process or you can say this process shall never take particular capabilities or maybe it should never run with root privileges and so on, that's Tetragon. I'm sure most of you are already also familiar with what Scylium can do from a network policy perspective. Of course, Scylium implements Kubernetes network policy. That's what we see on the left side here. Kubernetes policy allows us to define intent declarative labels or policy based on labels defining what pods can talk to what other pods, right? It does so on the granularity of a pod. We also have, we've heard this in the keynote this morning added mutual authentication to Scylium where we can not only enforce this on the network level by either allowing or disallowing particular packets, we can now also perform a mutual authentication handshake using MTLS to mutually authenticate the identity of pods before they allow to talk to each other. But again, this is done on the level of a pod. So we can only think about things right now, pods. A frontend pod talking to a backend pod or a fluent D pod talking to some other pod. The granularity is on the level of a pod. That's what both network policy as well as service authentication have been defined to work with. Well, the reality is that Kubernetes pods are actually quite complex or they can be complex. We often think a pod is a little bit, we see on the very left here, it's a pod and there is one container running inside and that's it. And in that world, the net or policy implementation that we have today would be fairly decent, right? But the reality is that we actually have pods with multiple containers running inside, multiple volumes mounted in, a pod has a complex inner life. By the way, this picture is from the Kubernetes IO documentation. If you wanna dig deeper into why that is, you check out the Kubernetes documentation. This is one such example from that documentation, a very, very basic example of a sidecar, a logging sidecar. So we have a workload that's running as a container in that pod and we have a secondary container in that pod that performs the logging operation. You can see the YAML on the right side on how that it's actually done. That's a very classical, typical use case. And that's even a basic one. So let's look at a bit of a more complex, practical example. And let's look at what TetraCon can offer us here, what's actually going on. And don't worry, I will now digest this picture, what we're actually seeing here. We're actually seeing the inner life of a pod. So let's jump in and look at this. We of course see that there is a namespace name and the pod name. So far so good. Then we see a process called init. That's the init process on the Linux node, pit one. We then see the container runtime, which was started by the init process. In this case, that's container D. And then you see the entire pod. So this entire blue service area, that's the Kubernetes pod, which is named crawler. And it actually has a node application running inside. But you can see that there's a lot more going on in that pod. So let's look even closer. We can first of all see that the actual node app, that's the binary running in the app, that's the actual workload that's running in there, is making two network connections. It's reaching out to api.twitter.com and it's reaching out to an elastic search service. But then we also see another line, which is invoking a shell and it is using netcat. That's the NC binary that you see in there. That's a reverse shell that was installed. So somebody compromised this workload, the node application and was able to sort a shell and then use netcat to reach out, to then receive instructions back in to do something inside that pod. That's a classical reverse shell attack. And then we also see a lateral movement attack because this shell, this reverse shell is now used, again with curl to access the elastic search service to probably extract data from that. And that's of course allowed because this curl process is running inside of a pod that's supposed to talk to the elastic search service. So if we're looking at this from a pod level, the network or the network policy construct does not make a difference between the actual workload that's running in there and the curl binary that's running inside of that pod. They're both the same path. They have the same identity. They have the same IP address. On the network level, this will simply be allowed. And then last but not least, we also see data exfiltration where the attacker is using curl to probably copy the data that they have received from elastic search and upload that onto an S3 bucket. That's kind of a snapshot of a successful attack that was performed in a Kubernetes cluster and it's showing how complex the inner works of an actual Kubernetes parties and that while using our policy is fantastic, way better than not doing anything, we can actually do better as we look into this. Hence the idea and the thought that in the future, we will combine the network enforcement capabilities that Sillim already has with the runtime context. So this visibility was extracted with Tetragon. So Tetragon has all of this context. How can we bring those two concepts together to provide even better security? So let's see how that actually looks like in practice. How could you as a platform operator or as a potential application developer actually look at this and configure this? And we'll start fairly simple here by looking at the most simple example, frontend pod talking to a backend pod, right? And we give them both IPs, 10, 0, 0, 1 and 10, 0, 0, 2. And right now they're just running one container each, the actual frontend and the actual backend application. At some point the application developer may decide, I want to start now to log certain things into FluentD. So they will actually run a FluentD container. It's great, that's one, a bit of YAML and the image is running in there. Well, in reality, you're now already running a third-party image as part of your application. You're not actually, you haven't developed this FluentD image, you're just using third-party software. It's great, but it's not really trusted, right? You're now accessing the logging pod. Well, this looks pretty innocent, but you have already created a situation where the FluentD container could also access your backend now. That's not desirable at all, right? Like FluentD could be doing whatever, it could be running whatever code you may be running or updating that dependency all the time. You have granted access to the FluentD container to the backend pod because the frontend container obviously has to talk to the backend container, right? We definitely don't want this. But let's get a step further and think about more sensitive data. So let's say your backend pod has to talk to a database. It's running outside of Kubernetes, right? And in order to access it, you need to have a certificate, right? So you need to fetch a certificate. So a very common practice is to use an init container to actually pull the certificate from a certificate server so it can be used by the backend container to talk to the database. Great, but again, this is unintended consequences that the backend container can now also talk to the certificate server and the cert init, the init container, can also talk to the database. And the cert init may just be a bath script, right? And it may not be reviewed as thoroughly as other code, but it could actually directly access the database. So we definitely don't want this. And then when the developer decides now my backend also needs to log, of course, we're adding a FluentD container. And you can see the pattern as applications on Kubernetes evolve, complexity of parts evolve, evolve, evolve, and more and more containers are being added into parts. I've seen constructs with more than 30 containers inside of a part, right? Like very complex. They're all meant to do something, right? But very thought has been given to actually granting least privilege access to these containers. Of course we want to prevent this, right? But can we take this one step further? I mean, if we could implement all of this, that's already a giant step forward, right? If we could have identity at the level of containers instead of just parts, we would be significantly more secure in general. But can we do better, right? Can we actually use the full potential of Tetragon? Let's look at that. So let's simplify our example. We're down a little bit, remove the logging part and actually look a little bit closer to what's actually going on inside of a backend container, because there's an actual binary running inside of that container. I've used slash user slash bin slash backend in this example. That's the actual binary that's stored in the container that's being executed, right? Well, what happens if that backend part, that backend binary gets compromised? An attacker might be able to spawn a shell. So this bin SH shell binary will now run as a child process of the backend binary, right? Obviously, we want to avoid that this shell can access the database or the third server. So what we want to shoot for is not only allowed to actually leverage and allow enforcement of the container level, but to actually encode the binary names inside of the policies that we can restrict with specific binaries can access which specific destination identities. So looking at it from a container level or container identity level only will not be sufficient. We need to really lock down and also figure out how can we understand when a container itself gets compromised. Very similar, the application or the attacker may actually spawn a reverse shell. Very common is using net cap and C would also run as a child process. And obviously we want to disallow any connectivity from the front end to that reverse shell, even though of course the front end container would need connectivity to the backend container just for normal operation. So that's our vision on how we want to combine the security of Syllium's policy enforcement with Tetragon's runtime enforcement going forward and as a combination of the two provide meaningfully better security in your infrastructure. If you are interested in this, this work is starting right now as part of the Syllium community. If you are interested in this, if you want to have a say in how we will develop this functionality, how we will shape the policy, what are the primary main use cases that we want to address in the beginning. Come talk to us. We're here all week. We have a couple of, we have a booth out here where you can talk to us. There's a Syllium project booth. We have an ISOvalent booth. Many of us are walking around in the hallway this week. Come talk to us. We would love to hear from you if this is interesting to you and if so, how you would like to use this going forward and what are the specific use cases that you are worried about and would like us to solve first as we implement the first version of this. If you want to learn more about Syllium and Tetragon in general, I'm sure you've seen the URLs a couple of times already but you can of course find more information on Syllium using syllium.io, the main project page. You can find information on Tetragon using the website Tetragon.io as well as the GitHub repositories and if you want to learn more about ISOvalent, feel free to visit our website. And if you want to provide feedback to this session, you can scan the QR code up there and give feedback to myself and to CNCF. And if that, I think we have plenty of time to answer questions or receive comments on the proposal made. Thank you very much. I think you can step up to the mic if you want to ask your question. Or I will repeat it for you. Yes, excellent question. So the question was looking at the pain that Assylinux induced. How are we going to avoid this? And I'm very familiar with the pain of Assylinux. I was on Reddit's kernel team for 10 years while Assylinux was being developed. I think one main difference is that we're not intending to ship a default policy for all of this along with Syllium. Assylinux very much tried to worshiping a Linux distribution and it includes a policy that exactly says what you can do and what you cannot do. And what we will do with Tetragon and Syllium is the ability to define the policies at the granularity that you want, very similar to how our policy does this today. But there is another aspect on how we will make it simple to use. I think we will, as we already do with our policy, we give tools and control at different levels of complexity. So if you want to go really deep and tie down to binary or maybe even binary arguments, then you can do so. And maybe you even want to restrict it on UID and lots of additional attributes or maybe you choose just to do it at the container level. And it is you who will be able to decide how much complexity and how much work you want to take on. So that's how we're currently thinking about this. But I would love to learn if you have specific ideas what could be challenging and actually know about this before we start implementing it. Yes? Yeah, I will repeat the question. So the question is whether we're planning to implement some sort of webhook or some sort of like notification where if you observe something we could translate into policy or maybe I didn't understand the question correctly. Absolutely, okay, now the question is can we observe what's happening right now because we know it's good to turn that into a policy and yes, we can already do that separately for network policy and also for Tetragon but not for the combined version of the two but we will, if feasible, definitely also allow doing this for these combined policies, right? Absolutely. Yes? So the question is given that we also control the network can we do something to prevent locking yourself out of the box as an example? Power over this. Exactly, using the commit and confirm model. So it's currently not implemented but it's definitely something that we would be interested in exploring and learn more about. We do have policy right now for Tetragon to protect Cilium so that we prevent others other than Cilium components to actually change Cilium infrastructure and Tetragon can also protect itself. We're not using the commit model yet in any way but that could be interesting for us. Yes, yes. Other questions, yeah, back there. The question is do we have any thoughts about tampering of binaries? Yes, Tetragon can do this already. Tetragon has file integrity monitoring where you can detect changes to files and execute actions thereof. So you can, for example, calculate a digest of a binary and then only allow execution when the file digest is unchanged. So you can, for example, calculate the file digest in staging environment or in your CI CD pipeline and then enforce a policy that only allows the execution of the binary if it's not tampered. So it's not, the Tetragon's capability is not limited to binaries. We can do file integrity monitoring for any file on the system. Do we have time for more questions? Any other questions? We definitely have time for questions. All right. But maybe everybody's happy. All right, there's one more question, yeah. So the question is, Suleyman Tetragon are currently low-level tools with amazing capabilities but requiring to write specific policies. Yes. And then the question is, have you thought about getting more into application lifecycle management or provide like a better user experience to the whole system? And I think the answer is part of, that's part of what our company sells as well. So if you do not want to write your own policies at low levels, you can ask Isabelan to help you with that. That's part of the value we provide. But I also think the automatic learning is definitely like a key aspect that we believe that policy, manual creation of policy is definitely something that will not scale long-term. So an automatic detection of what is the desired state capturing of that, whether it's in the supply chain or CITD phase or in a staging environment and then automatically translating that into policy is the way to go forward. But definitely I think in general, we're trying to not only build better capability but also to make everything easier to consume. Absolutely. Do we have time for another question? Or are we running out? I think we can take one or two more questions. All right, they're in the back. Can you repeat it? I couldn't fully understand it. Do we have any dashboards? Yes, so we have tons of dashboards. If you want to see some dashboards, we will show some of them in the Cilium session during KubeCon. So yes, there are dashboards on network policy on Tetragon, what's allowed, what's not allowed, violations, all of Tetragon's data can be fed into an SIM like Splunk or Elastic and so on and visualize it there. I simply did not focus on that today but that of course exists. We have particular very strong Grafana dashboards thanks to a great partnership with Grafana Labs and of course all the major SIM providers as well. All right, final question here. Are we pushing for unified standards? So yes, absolutely. We believe that Kubernetes is the place to standardize whether it's network policy, whether it is upcoming gateway API that's being developed, whether it's mutual authentication identity with Spitfire. So yes, we strongly believe in the standardization of all the policy that we're doing. We often tend to use a custom resource first to actually try out and understand what are the use cases worth covering and learn with our early users how policies should look like and then bring that into the Kubernetes standardization process. So standardizing through early users and then set it in stone via the Kubernetes APIs. That's how we think about this and how we'll ensure an open ecosystem. All right, very quick. Running Tetragon out of Kubernetes. Yes, you can. What you will lose is the custom resources that define a policies. You will need to use a REST API, but there is nothing specific about Tetragon in terms of Kubernetes. If it runs in Kubernetes, it will show you pod labels and namespaces and all of it. And otherwise it will show you C group IDs and stuff like this. But it's right now a Linux based technology can run on any Linux and in the future with EBPF on Windows will also run on Windows when Microsoft is ready.