 Hello everyone. My name is Martin and I'll be presenting today along with Nandor about Zero Trust and workload level identity. The title is a bit cryptic and a bit difficult, but I hope we can shed some light on what we have here. Let me start with an introduction and I let Nandor introduce himself as well. My name is Martin. I'm working as a product manager for Cisco, actually Cisco's emerging technologies group called OutShift. Previously, I was working at two different startups and I have a very heavy engineering background, so I just recently moved to the product side of things, but I kind of have a deep understanding of how this technology works under the hood. I start with a small introduction of our project and what we think of Zero Trust, what do we think the problem is within the space and how we are trying to solve it and then later on I will let Nandor go into the technical details and talk more about what we have implemented, how does it look like and he will also do a short demo about the project that we're working on. But now I'll let Nandor introduce himself as well. Yeah, hi, I'm Nandor. I'm working also for OutShift Cisco. Together with Martin, I joined Cisco through Bonsai Cloud. I'm mainly a software engineer since the beginning of my professional career, which is something like 15, 16 years ago. I don't remember. Yes, I'm working mainly in the networking and security areas at Bonsai. I was developing the BangPorts project, which was basically an infrastructure project for HashiCorp Word, making it easier to run on Kubernetes and use it more easier in a lot of applications. Now I shifted the gears and moved to WebAssembly and Canvas-based projects. Let's get started. Let me start with a simple example of a hypothetical attack. This is like a simplified example of how a security breach can happen. It's not a concrete example like an existing breach or a known breach. A better simplified example to make it easier to understand what we're aiming at. This example is containing lateral movement. Lateral movement means that today when an attacker gets into a system, then they usually spend some time there. They are trying to move laterally within the network or within the infrastructure. In the end, the actual exfiltration of data happens sometime later after the attacker goes through some steps within that infrastructure. Let me start by drawing up this attack narrative. Usually it happens by or it starts by a phishing attack. Maybe some AWS credentials were stolen. It can happen very easily to almost all of us. Maybe some employee was not paying enough attention to some email. They clicked it and pasted their AWS credentials. Well, it happens even to the best of us. But usually, and this is basically where lateral movements are starting, is that the attacker's end goal is not to get those AWS credentials. The attacker's goal is to exfiltrate some important data from the system. Maybe they don't even know beforehand what they are trying to do there or steal there. But they use those AWS credentials to start something in that system to just get there. Maybe they will start a VM inside a security group if they have the necessary permissions through those credentials. When they are inside a specific security group or within a specific VPC, they can start some kind of scanning of the internal network, see what's available, what ports, what services are available, what they can access from that specific VM. Maybe they find some user management APIs that are supposed to be internal. That is not available from outside of the network. It is only available from inside. It was meant to be accessed by other services, other microservices, especially in today's complex architectures. It is a pretty common pattern to use microservices and just have access from one of those services to an API to the other. If that API is not protected properly inside that system, then an attacker can exploit those trust relations and access this API from the VM that they've started. If they can access this API, they can start the actual data exfiltration by retrieving some kind of user information that was not supposed to be public. Actually, this is the end goal of the attacker not to get those AWS credentials. It could happen because the attacker was able to do some kind of lateral movement within that internal network, like accessing the user management API from an arbitrary VM, from an arbitrary script within that same security group. This is getting pretty common. Threads are evolving. These are some numbers that are like I think well known as of today. They say things like it takes seven months in average to actually identify a breach and contain it. It means that an attacker can spend a lot of time in that system before they are even discovered. It also says that almost 60% of security breaches contain lateral movements. This number is increasing and will increase in the future as well. On the other hand, only 4% of alerts are even investigated properly. Maybe you have a system that is configured to send alerts when it is detecting some kind of malicious behavior or some unexpected movements. Those alerts are frequent enough that they won't be investigating properly. The question occurs naturally. What can we do to prevent a breach like this? We have a lot of options. It starts from the very basic things like employee awareness against phishing. Just make your employees know that do not click any suspicious links and things like that. On the other hand, you can of course do a lot of other things like the principle of least privilege for credentials. If you have for example AWS credentials, you don't give access to all kinds of actions in every VPC. Maybe contain it to a security group to a subnet. Maybe if you have production systems as well as test systems, do not give everyone access to production systems and production VPCs, for example. That's a good principle. You can do network segmentation as well. Segment your network. Do not put everything inside one specific VPC, one specific subnet, but rather try to distribute your applications and create these segmented groups for each of your microservices. Then of course you can set up fine-grained network policies, especially if you have those segmentation things in place and say, okay, that subnet cannot reach services from another subnet or things like this. Of course you could use authentication even for internal services as well. That could help with my example. If that internal user management system is protected by some kind of authentication, it is not as easy to exploit those trust relations as before. Or you can use some kind of active monitoring for anomalies. It is difficult, but there are a lot of systems that are doing just that. If they detect some kind of suspicious behavior within your network, they will fire up those alerts. But we also have problems with these approaches, and it doesn't mean that you shouldn't do these. These are all good principles and things that you should do, but they have problems. For example, it doesn't really matter how the various employees are against phishing, it can happen to everyone. It even happened to me because I was just not paying enough attention, and someone was able to use my credit card somewhere. Even though I couldn't really believe that I would be a victim of a phishing attack once, but it happened to me as well. This privilege, it means that, okay, you can restrict it, but someone eventually will have access to those systems, and if they are the ones who are the victims, then you will have the same problem. Okay, maybe the chances are not as big as if you would give everyone privilege to all kinds of systems. Network segmentation, again, it is good, but in the end it is just limiting the scope. So maybe if someone gets into your system, they won't be able to reach all the other services, but reach only those services that belong to the same network segment. If you're trying to set up network policies, it can be extremely complex. It will probably involve some complex architecture that you need to put in place. If you want to use authentication for internal services, it can get very hard, very quickly. If you do it one by one, and you have, I don't know, hundreds of microservices, and you're trying to implement authentication for each of those services one by one, written in a lot of different programming languages, in a lot of different frameworks, then it is just hard. It can consume a lot of time and resources of application development themes. And last active monitoring of anomalies, it is even harder, like creating a system or even buying a system like that. It won't be 100% sure that it will catch all those attackers. Even if they send alerts, they won't necessarily be investigated properly, as I've just shown in my previous slide. So all these things have problems. And in general, they can be restrictions or they can be reactive solutions. But the question is, do we have something that is able to prevent all those lateral movements and all these things that I've listed? You can think about zero trust here. Zero trust is a very heavily hyped something that everyone is talking about. So what if I implement a zero trust strategy in my system? Would it mean that, okay, now I'm good. I'm protected. It means I won't trust anyone. I'm implementing it in all parts of my system. But do we really know what zero trust means? Is it like a strategy? Is it a principle? How do I implement it? How do I execute my zero trust strategy? It is not a simple question. Because effectively, zero trust is not an implementation, not even a specification. And it cannot really be bought as one product. I cannot really go ahead and buy something from the market and say, okay, now I've implemented it. I've put it in place. And okay, now my environment is protected and zero trust is implemented. It doesn't work like that. Zero trust itself is just a security principle. And of course, a corresponding strategy that goes with this principle. The principle itself is very simple. It says, I'm moving from a trust but verify model to something that says never trust but always verify. Even if it's end users, even if it's workloads communicating with each other, it is always never trust and always verify. But still, this sentence, this principle doesn't say anything about how do I implement zero trust in my system. But why is zero trust actually a big thing or why is it a hype today? The answer to that is that it's not necessarily hype. So the zero trust principle itself is it's kind of like a good thing. And it is driven by the increasingly complex and distributed architectures that are appearing today. If I have a simple monolith, then it may not be as big of a problem to protect connections between my workloads. But if I have hundreds of microservices and I have multiple hundreds of connections between these microservices, it can get complex very quickly. Maybe these services are not even deployed to one cloud provider or one network. They are scattered across networks. They are scattered across cloud providers. They use all kinds of mechanism for connections, different protocols, and so on and so on. Because of this, securing only your network perimeter is becoming obsolete. It is just not enough anymore. I just cannot say that, okay, I'm securing everything that is coming from outside, but inside, like, everything can happen. It is also because of insider threats or just like what I've said about lateral movements. Then I also, in the need of granular workload access control, just because how my architecture is becoming complex, I need to control what workload can access what other workload and just I need to control it granularly instead of just seeing some very broad rules. So zero trust is a good thing. It is definitely a thing beyond the height as well. But if we talk about zero trust, I've usually distinguished two different things here. One is probably the first one is zero trust that everyone first thinks about when someone says zero trust. It is zero trust network access and it is a way to secure remote access to an organization's applications, services, or data. It is something that could be substituted for a VPN. So instead of granting access to a whole network with VPN, granting access to services and controlling it through policies by clearly defined access policies. This is usually what someone thinks about zero trust. But there is another area about zero trust and it is workload to workload zero trust. That's how we usually call it. It means that when I have multiple services in like one VPC, one network environment, I need to control how I can access one workload from the other. And this is basically the part that we will talk more about today. For that part, there are multiple solutions today on the market. The first one is micro segmentation. This is the more common application for zero trust. It is usually present in more traditional networks. It is usually network-based. It means that I'm dividing the network into segments and applying security controls to each of those segments. And these segments can be as small as one workload that is running on a machine. It is for sure reducing attic surface, but it still doesn't encrypt traffic. It doesn't work well in Kubernetes, at least not as a first-class citizen. And it can get very complex, very quickly when you're setting up policies in a changing environment. The second solution, especially in Kubernetes, are service meshes. Because service meshes effectively implement a zero trust environment if you turn on MTLS connections between your services. But service meshes also have problems. One is that they only realistically work on Kubernetes, not necessarily outside of it. Service meshes often mix responsibilities between network and security teams. So MTLS connections and zero trust can belong to security teams. But on the other hand, a service mesh can also be used to generate telemetry and also to control your network flows, maybe do some load balancing or some A-B testing and stuff like that. It is not very granular. I mean, it can be enough in a lot of situations, but not in all of these situations. It means that a service mesh will trust everything running in the pod behind the sidecar. So if someone is able to get into your pod, then they will still be able to exploit those trust relations. And of course, it comes with the proxy hell. Sidecar proxies, non-level proxies, routing traffic through these proxies. It can be a pain and more and more people realize how complex they are, especially that they often doesn't need all the functionality of service meshes. Maybe they would only want to use it for MTLS, but it means that it comes with all the other proxy hell. So third, and I'm talking a bit too much, so I'm trying to do it quick. Third, and this is where we are trying to get is kernel level identity and encryption. It means that, okay, we can maybe take the ideas from a service mesh. We should do MTLS between those services. If I can do automatic MTLS, I'm protected against those lateral movements that are happening from workload to workload. But I don't need all the other things that a service mesh has. And maybe I can also use it outside of Kubernetes, because if I'm moving all these things down to kernel level, it means that it will work everywhere where Linux is the host. But again, even though it works on the kernel level, if I'm running on Kubernetes, I can use some kind of connectors or metadata collectors to get Kubernetes metadata and actually write policies and do access control based on that information. And compared to micro segmentation, it also does encryption. So it's not just like segmentation, but also encrypted traffic. It can also be application and network agnostic. It doesn't really matter if or how your workload reaches the other workload. If it can reach it, then our kernel level identity can work there. It can also be completely transparent to the application. If I'm implementing it in kernel space, it doesn't really matter if I'm running my application on top. You won't have to recompile your application, not even restart or redeploy your application. It will just work completely transparently. And this is basically the idea behind Camblet. This is a new open source project by Outshift to automate kernel space workload identity access control and encryption. And now we can switch to the more technical part and I'll let Nandor talk about Camblet a bit more. Okay, so I'm taking over the screen shot. Try to present what is Camblet. Okay, so as Matt mentioned, Camblet is a kernel-based solution. And first of all, I would start with an architecture diagram, which hopefully will clear things up how it wants to work on the Linux kernel. So kernel is a hard topic. The Linux kernel is a hard topic, but we would like to attack it in a way so users can consume it as easily as possible. So for example, we have a very nice installer which is a one-liner and installs the main components of Camblet and starts a probably an installation which can be configured easily. Camblet consists of an agent, as Martin mentioned, which is basically a user-based component and consists of a kernel-based component for the Camblet driver, which you can see on the bottom on the Linux node. And what we do here is the driver does the heavy work and the agent does the controlling work, like distributing policies, capturing metadata processes, and the driver does the data playing stuff like encryption, authentication, and policy enforcement, and authentication is doing to DLS certificates. I will come back to this slide later on because it can clear things up. Okay, the core of Camblet is in the kernel, the complete TLS handshake happens in kernel space. We have all private keys generated after the installation and they never leave the kernel space. We are using standard TLS, so no outside magic besides the industry standard TLS, and we use the KTLS component of the kernel, which is basically kernel TLS. It does the heavy work for doing encryption on each and every Linux host in a very efficient way, so also involving network cards, if the hardware is supported by the Linux kernel to offload TLS encryption as much as possible, so not doing heavy CPU work, which can be offloaded to a network card. Unencrypted traffic never leaves the kernel space, so basically user space components don't have to use TLS in the user space level, so you don't have to configure certificates because the TLS encryption and authentication will happen down in the kernel space, but we send the encrypted traffic there to the next host and the next host kernel driver will unencrypted and send to the user space process in a very transparent, but non-attackable way to your next application. And if your application already uses TLS, we have a basically a skip through checkpoint where basically you can blot want, encrypt, re-encrypt the TLS traffic again. We just use the policy enforcement and the handshake process down in the kernel space. Okay, so this socket, which basically a plain TCP socket, it gives us a user kernel level identity and only to those processes and only to those processes and not anymore, where you have described them with policies. Along the kernel space driver, the kernel module can't work alone, so as I mentioned in the first architecture diagram, we need a user space component, which is basically the driver, which sits on the host in the user space and communicates continuously with our kernel driver in a secure way to collect metadata about the processes that you are trying to target with the kernel. And for example, we have connectors for Docker, we have connectors for Kubernetes, and first of all, we have connectors for plain processes that are running on the bare host, so you don't have to use orchestration environments, so it's not a must, but we support them. So, as I mentioned, since we are running a kernel driver there, it works on orchestration drivers, which run on the Linux kernel, normal and Kubernetes, you name it, but we also support plain processes. So, and it's fully transparent, you don't have to rebuild and reuse your sorry, rebuild and restart or redeploy your application, since we override the TCP protocol inside the kernel to give TLS connection to your TCP socket. Okay, how we achieve identity and access policies there. Again, we are not using something new, but we are using piggybacking on SPIFI IDs, so we bake SPIFI IDs into the certificates of your applications. Each and every application has its own SPIFI ID. One application can have multiple identities, so multiple SPIFI IDs through different policies that you describe, so it can be that your application talks with GitHub with one identity and talks to your backend application with another identity. You can describe it easily in our YAML-based policies in Camblet, and all those SPIFI IDs are present in those certificates. Identities are defined through metadata selectors, so it's very easy to create a group of perfect applications with these kind of policies that you can describe in Camblet, I will show you later on. And you can include metadata from environment-specific elements like Kubernetes labels or local tags or things like that. Okay, we need service discovery in Camblet to make it possible to describe applications that you would like to connect from client applications. So service discovery helps basically the discovery of services that your client applications are trying to connect. So basically these are the server applications. Since we are on the kernel level, we know only IP addresses, so somehow we need to translate IP addresses back to application names or host names as you wish. So since people understand host names better and machines understand IP addresses better, so we need to do a mapping. So this provides us some DNS type functionality in kernel space. It defines which works as a part of the system, but you don't need to define SPIFI IDs here. This is only for resolving IP addresses. This is currently the user's responsibility to describe the system, but automatic connectors like for Docker swarm and Kubernetes are in the making because Docker and Kubernetes already provides all the information that we need. We just need to pull down that information. Since Camblet is a young project and currently we are trying to find the most easiest way to do that, so it's not implemented yet, but in the books. How do I use Camblet? As I mentioned, it's pretty easy to install. We have a website where you can find i or one line and install. You need to install Camblet on each and every host that you would like to involve in the zero trust data plane that Camblet defines through the policies. If you install it, as I mentioned, it will install a kernel module, compile it onto the target machine directly and sign it with the DKMS key. So you have a fully open source kernel module running on your machine that you can check the code and install it transparent. And also the installation installs the kernel agent, the modules agent, which runs in user space in one instance on each and every node. After installation, you need to write and distribute your policies, which I mentioned, and I will show you in a short demo, I'm just playing YAML files, and also you need to distribute your service discovery files now, but on later on, when we have the Kubernetes operator and connector that will do this automatically, it won't be a task anymore. Okay, so it's a demo time. I will go to my terminal. And here I have a terminal, which is, which consists of four windows. And we'll show different things parallel. So first of all, this is an Ubuntu machine, nothing special. Only special thing that Camblet is already installed in this machine. And I will show two different scenarios here. One which is running on the bare host, and the other one is running in Kubernetes. On the bare host, I will run an Nginx server, the plain Nginx installation, which returns index.htm on the root, and I will attack it with curl. I mean, get the code with curl. Okay, so first of all, I will do an, sorry, I will do an NGREP on the, a network rep. So you can see what is happening basically. I will start Nginx. Okay. And I will show you my simple policy here. I have a policy file already prepared because we have a short demo time only. And this is all commented out. So it just means that currently Camblet is installed, but it's not active. So since we have no selectors defined, it will not intercept any kind of traffic. Okay. I will hit the Nginx server that I'm running. And in the network rep, you should see that it's running through plain text. Yeah. I just got the return from curl localhost 8000. And this is the plain HTML document. Welcome to Nginx. And as you can see, the whole traffic went in plain text. So the curl client opened Nginx port there. It executed a plain text HTTP request. And the Nginx server from port 8000 returned to the curl client, the index HTML file in plain text. Okay. Let's do some magic there and try to encrypt this traffic. What I'm doing is I'm commenting out the selector for Nginx and see what happens. What should happen here is that Camblet already took the Nginx port. And if I try to HTTP get it again, it won't work. Why is that? Because Nginx now is on MTLS, but curl isn't. Since it isn't described by all policy. What we would like to do is that they both should communicate on MTLS together. I can show you. Yeah. So as you can see, if I open the Nginx port with open SSL as client, we already had a handshake, but since we couldn't provide any certificates, since Nginx wants to have MTLS already, as Martin described in the last way, the connection closes. Okay. So go to the policy file and create a selector also for curl. As you can see, the selector is very simple. We are having a names text, which you can find the documentation. I have a process name, which is curl. And it will Camblet will find all processes which are called curl and provide them MTLS. We have a workload ID for curl. It's called curl and simple. We have a strict MTLS mode and all those spiffy IDs, which is spiffy and or default Ruslan main and also for Nginx. Both of them are having allowed spiffy IDs. This means that the spiffy ID check is bidirectional. So also Nginx checks that curl wants to communicate with it and also curl is checking that is communicating with Nginx. Let's see if it works. Check and grab. Yeah. We got back the index HTML. And as you can see in Nginx, the whole traffic is mumble jumble. So it's encrypted now with MTLS. But as you can see, I used simple plain localhost. So if I make it more clear, so this is not HTTPS, but simple HTTP. But behind the scenes with Camblet, it got transformed to HTTPS through TLS. And we got back the file. In the policy file, we can do other interesting things. For example, in the upper part, I can show you that it works also in Kubernetes. I'm having a word and an Alpine pod. I will connect to the word pod from the Alpine pod with the same curl that I was using here. Okay. I'm commenting out some stuff from the policy file. Since I have two new components, I need to comment out the last two components that I was saved there. This is a selector for Alpine. It will provide, find the curl process in my Alpine container. It will give you a workload ID called Kubernetes curl and do a strict MTLS handshake with the allows PFI to Kubernetes world, which is the other component, which is vault in the vault container, in the world pod, having a certificate with the workload ID, Spiffy ID Kubernetes world, and the connection is still strict MTLS tracking and the Spiffy ID, it allows to itself is curl. Okay. Let's save this policy. And if I go to the Alpine pod, you will see I have a curl there. And let's try to check what is in the world. What, how is world doing on the past? Oops. I'm having some issues. Okay. I'm back. Now it's there. Okay. So as you can see, I was hitting again a plain traffic, a plain HTTP endpoint with curl. Oops. Sorry. I wrote this and it's not working. Okay. With HTTP, it's working again. So as you can see, I hit a plain HTTP endpoint from an Alpine container to another vault container, which by default doesn't have any certificates set up by itself. As you can see, it's a vault instance. And as you can see, it's, it's, it's the dev vault instance, listening with the basic one area config, and it doesn't have any kind of certificates set up. But we still got with engrap, encrypted traffic, since cumlet selectors and providing of TLS certificates also work through Kubernetes. And I think basically that was my demo. And also my slides. I will suggest you to check out a template on GitHub, Cisco OpenCamblet, and also on the Cabinet IO side where you can find the installer and the documentation. And if you have any questions, please ask them now. And that was it. Okay. In the Q&A, I can see one question. Is Camblet open source? Yeah, it's open source. As I mentioned, it's on GitHub, Cisco open, Cisco dash open Camblet. What is the trust on such assumption? HostCamblet is trusted to be not compromised. And there is no privilege escalation by container, for example. Yeah. Yes. Since we are running in Kernel space, I think it's a, it's a very interesting question. But since we are running in Kernel space, yes, the kernel doesn't can't be compromised. This is an assumption by us. And also it's an it's an upset assumption by us that since we are open source, you have the chance to verify that we are not injecting any kind of suspicious code to your, to your kernel. And the other thing is that basically, you have to know which, what kind of processes are running on your kernel. So it's a good time to browse through all your VMs and check what kind of processes you are running. I think a lot of infra teams know, but a lot, a lot of infra teams doesn't know how many interesting processes are there from unattended updates, for example, on Ubuntu, a lot of people don't know if they are running there or not. So Camblet can give you a good idea what kind of processes you are running since the locks that Camblet emits will go through all TCP connections that are going in and out from your system. So maybe it's a good feature that we could do a network map based on Camblet locks later on. We share, okay. We share this demo code. It's very cool. And I want to test it locally and show it to my team. Yes. The demo code, the policies are not yet on GitHub, but I will share it. I will put a link to the readme of the repo. Otherwise, the whole stuff is open source. Is this contributing to Camblet? Yes. We are working at Cisco and we did Camblet. So the answer is yes. Is TCP specific solution or is it also applicable to Qwik, UDP or Unix domain socket communication? Currently, it is TCP specific. We already know how to make it UDP compatible as well. So since Qwik is using UDP heavily, probably we need to test it there and find the solution for the first part of the second question. So we'll make it Qwik compatible. So UDP is in the making. Currently, it is TCP specific. Yeah. So I will share the codes as quick as I can on the Camblet page. But in the documentation, you can find cool examples like this. And also in the samples of the agent code. So I suggest to go through that in the meantime. I think that was all. Yeah. Thank you, everyone, just to add one little sentence. So yes, Camblet is implemented within Cisco. We've started the project and it is actually just starting. So it is in its early phases. You can expect some bugs and some interesting behaviors, maybe, hopefully not. But you can support us by going on GitHub, give us a star, try it out locally, run it in your dev environment, ask questions, open an issue on GitHub, maybe even fix some issues and submit a PR. So we are happy to receive any kind of contributions because we just want to make this project get started. And it would help us a lot. Otherwise, thank you. Thank you. I don't know, maybe Roy can answer this question, but you can always contact us on GitHub that may be the easiest way to do it. I think we, that is live. We have a team at Camblet.io, email address as well. But Roy, if you want to answer it. Yes. Yeah. Hi. Thanks, everybody. So I was actually a fly on the wall. I'm in the marketing group at Outshift. To your question, yes, you can reach out to us as a follow up. If I know who the person asked, if you have a specific question, we can always reach out to you, Alfred, right? So I think if you're able to share email ID, we can always reach out with the specific question and the person who can answer that. So yeah, that's, I think, I hope that addresses the question, Alfred. Glad to reach out to you. Thanks again. Yeah. Yeah. In the, in the read me of the repos, you can find links to Slack as I think, and also email addresses. So, but we will double check and put everything into the read me with the examples as well. All right. Well, thank you so much, Martin and Nanda for your time today. And thank you everyone for joining us. As a reminder, this recording will be on the Linux Foundation's YouTube page later today. We hope you join us for future webinars. Have a wonderful day. Thank you.