 Good afternoon, my name is Duffy Cooley. I'm the field CTO at I surveillance today We're going to be conducting a workshop on getting familiar with security observability using ebbf and psyllium tetragons And my name is Rafael Bonson, I'm a socials architect also at I surveillance and I'll be Awesome so to get started I'm gonna actually take you through a presentation kind of describing the concepts and the and what we're Going to be covering in the lab today, and then I'm going to turn it over to Rafael to drive the lab But we're all going to be able to go through a hands-on lab together, so I hope you brought your laptops You should only need internet access nothing special We're going to be using the and we're going to be using instruct to host the labs and that's going to be how we go about that part of it so our agenda for today is psyllium and ebbf introduction we're going to talk a little bit about like what we've been building at I surveillance as part of the Open-source psyllium project We're going to talk a little bit about tetragon, and then we're going to jump into like some examples here, and then we're going to go Into the lab so first The open-source projects that we work on at isovalent are psyllium and ebbf We are actually have a lot of kernel maintainers in working at Isovalent directly working in ebbf in the kernel pretty exciting and isovalent is a company behind it We also provide a product that's called psyllium enterprise I'm not going to spend a lot of time talking about that today, but if you'd like to know more if there's definitely a booth That will be happy to tell you all about it So ebbf that out here has heard of ebbf I like seeing the number of hands go up higher and higher like every time, you know, it's great love to see it So ebbf basically makes the Linux kernel programmable in a secure and efficient way And it's one of the analogs here is to say that it it is like JavaScript in the browser, but ebbf is to the kernel Another way to think about this. I'm probably drifting it out of the microphone. I'm sorry about that But I move around a lot when I talk, but another way to think about that is if you consider that's probably a good idea Do you want to switch to this microphone? Another way to think about that is if you think about the Linux kernel as an API Right and when I want to open a file when I want to open a socket whenever I want to do anything else like that I'm actually going to make an API call ebbf gives me the ability to instrument any API call You can make to the Linux kernel and When I instrument that API call, there are a lot of things I can do with it at that point I can modify the inputs I can modify I can determine whether or not I want that API call to succeed or fail there's a ton of capability there around how it works and And In and also ebbf is written in a way that like if I were to write an ebbf program that would be applied to the kernel The reason we say that it is a secure and efficient way is that that ebbf program has to go through a verifier To determine that that program will not actually crash the kernel that it's not an endless loop It's going to make sure that this program is safe It does a static code analysis before actually allowing this program to be injected in the kernel And anybody who's written ebbf code Couple of you nice You can probably probably have a colorful four-letter acronym for the for the verifier because it can be very difficult to work with Sometimes but but it actually does a very important job of making sure that the code that we do inject in the kernel is safe And it's efficient because it's basically running at native kernel speeds So these are some of the attachment points and in this particular Demonstration we're going to be playing with a couple of these things the the file descriptor block device things We're going to be talking we're going to be showing some of that stuff in our demo We're also going to be talking about sockets opening new connections to URLs and those sorts of things and when we get into the security Observability piece we're going to be showing you how those events or how we instrument those things and how the events that Surface from those actions can be made relevant in a security observability space so That's kind of a high-level overview of what ebbf is and what we've been doing with it one of the first things that we thought about at isovalent was Kind of thinking about we were we had Thomas graph was very invested in ebbf in the kernel space. We had Dan Borkman. We had a few other like really incredible founding engineers that were really focused on ebbf in the Linux kernel A lot of them already had some experience with networking right so Thomas and I met when he was working on the Open vSwitch project anybody heard of the open vSwitch project That's kind of me. I don't think it's aged terribly well But it was it was an intrat incredible experience in trying to develop effectively What was the first platform that defined software defined networking? So a lot of the core of our company the founding engineers come from that sort of a mindset and They started thinking well Can we apply ebbf to this ebbf originally was built to be a net filter replacement originally It was built to be to be something that actually helped Manipulate things much in the same way that you might think of IP tables And then it was extended after that because now we actually have this new capability in the Linux kernel So now ebbf actually has been extended to be able to do a lot more than that It's able to do things at the application layer all kinds of good stuff Socelium was the first effort that we put together and this is a container networking interface for the for a for a kubernetes cluster where we run a celium agent on every node and just as a regular CNI operates when you see a new pod get Created on a kubelet the kubelet makes a call to the CNI and in celium's case We actually generate an ebbf program for that pod and connect that using the TC layer To the network namespace that that pod has been has been created in and then we use You know TC calls and these and these sorts of tools in ebbf to manipulate or and enforce things like network policy Right, so we can do so if you've written kubernetes network policy before and you're saying that this application can talk to these other Applications of this other namespace. How many people have written network policy? So there are more people in the room that has heard of ebbf Then there are people who have written their policy. That's awesome So with this we can actually implement network policy in ebbf So every pod that comes up every time you make a change to network policy every time network policy is applied we're able to rewrite or extend the information that we passed this ebbf program and Implement this directly in ebbf in the Linux kernel to allow or deny traffic For its destination or even traffic incoming to the application itself, which is pretty cool Another thing that we can do there is that we can inside of a kubernetes cluster There's another component called kube proxy and kube proxy is sort of the internal load balancer for kubernetes So when your application is trying to access a cluster IP or your service inside of the same cluster The way that that works right now for most clusters out there is that kube proxy will when on the creation of the service implement in IP tables Effectively that service rep the service abstraction And so when that packet leaves your application, it's destined for that cluster IP IP tables will pick it up And it will make a routing decision about which endpoint to send that traffic to it will not the traffic and then send that Traffic off to the endpoint that is chosen In our case with silly and we can replace kube proxy and again do all of that in the ebbf program that we've written for every pod So when we did when we see a socket connection happen at the ebbf layer, right? You've made that API call. I want to curl another service within the cluster perhaps in another namespace as soon as we see that socket happen We can determine. Okay. What's the destination IP is that destination a service IP? If it's a service IP, I need to go and look at the healthy endpoints for that surface IP and make a routing decision about where to send that traffic We're doing all of this in ebbf And then finally one of the big takeaways I mean all of you that have heard of ebbf have probably heard of ebbf in the context of observability right the idea that you could Implement ebbf to get more information about what's happening I mean if you look at things like pixie if you look at things like our own tetragon or Hubble observability There's a ton of context that ebbf can give you because it has a view at what of what's happening at the kernel layer and That's again a first-class piece of psyllium and a first-class piece of pretty much everything We've written whether it's psyllium the CNI or whether it's tetragon more in the application space We're going to get to it a minute to kind of talk a little bit more about it But observability is like a first-class use case and everything that we built So this slide actually talks a little bit more a little bit more about kind of the entire product suite or the way That we think about the entire project in general Right, we think about psyllium as the CN the psyllium CNI piece here at the bottom That is basically what I've been talking about before and I was giving you a kind of high-level overview But you can see that it has a lot of advanced capabilities It could do network policy at the layer seven network layer We can also do layers we can also network policy on FQDNs again all implemented in IP tables or in ebbf We can enable encryption on the underlying nodes You can use IP stack or wire guard with load balancing you can do we can replace the Kubernetes load balancing mechanism Which I spoke about before we also have other more advanced load balancing mechanisms Like actually running psyllium as an external load balancer that can make use of things like maglev Which is a mechanism by which you can have multiple load balancers and they all understand the right path back to a Already connected to service and that sort of stuff Or you can do DSR and then in the networking space we can under we can do things like flatten the network Between multiple Kubernetes clusters so that you can create things like a global service and have that global service be serviced by Backends in multiple clusters pretty cool stuff and obviously we work in all of the different cloud environments and in those cloud environments We've been generally chosen as either a very important partner to those cloud and cloud environments Or we've been chosen as the default and depending depending on where you're looking So in AWS EKS for example where the default the psyllium OSS is the default CNI for AWS EKS If you go to Azure, we've just announced in the last cube con a very large partnership with them And when you're spinning up in a cut in the AKS cluster today, you can actually select open source psyllium as your CNI Or you can also select psyllium enterprise if there's more value to you in that way with Azure We support Ali Baba cloud. We have an open shipped operator for both OSS and enterprise Same thing for VMware and for Google cloud. We are psyllium psyllium is The underlying a technology behind the Google's data plane v2 Right and then over here on the right is this new project It's actually one of our newer projects that we actually just announced I think it was the last in the last cube con about tetra gone and tetra gone really introduces a completely different Implementation and I'm just going to focus on that for just a moment And then we'll talk a little bit more about some of the detail here So as you as as you've already kind of put together This stack on the left is really more about networking It's really more about networking and then psyllium Hubble is the observability suite. Where do we send our metrics? How do we handle tracing? Is there how do we can how can I understand the connectivity between my applications in a visual way? We're starting to implement many of the use cases of service mesh in psyllium as features Which we call service which we call service mesh So we're implementing things like ingress in the form of Gateway API. We're we're implementing things like authentication MTLS is coming soon. It's not here yet, but I'm really excited to see it Rafael and I have seen it live in the in our all hands. It was pretty exciting But over here on the right this red box is tetra gone and it's a new project that implements a completely different part of the Linux kernel Instead of implementing things at the network namespace and trying to do things like network level enforcement No sorts of things. We're actually implementing a different place where and where anything We're where it's basically the front door of Linux kernel itself so that we can see Every system call that goes through and make a decision or we can enable you to configure tetra gone to watch for Anything that happens in the Linux kernel and make a decision about what to do with that, right? Whether to send an event perhaps I want to understand any time any process anywhere in the cluster Tries to do a set you ID system call right set you ID is a system call that you would use to make an executable Executable by somebody perhaps in a different privilege set than you have right? So if you wanted to make I don't know sudo something that could be used by any user You might use a set you ID system call to change the permission of the of that binary so that anybody could actually operate or execute that binary In that context tetra gone can be configured So that it's watching for this that you might use some system call And if it sees any process anywhere make it then it will gather a bunch of information about that process Whether that process was successful whether it was not successful and it will emit by default It will emit an event whenever it sees these things giving you a lot of context about what happened, right? Where was this process was it inside of a container was that container part of a pod? Was that pod in a namespace which cluster was it a part of like all of the relevant questions? That you would need to really be able to kind of put together a story of where this happened when this happened and did it work, right? You can imagine how this is relevant because in security observability some of the use cases that we're targeting are right? You come to work on Monday Something terrible has happened over the weekend your manager has come to you and said I need to know Everything about what happened on Saturday. It seems as though Someone X will trade it a bunch of data from my environment. It was headed for this weird URL Can you show me everything you know about it? In most Kubernetes clusters this is gonna be really tough Because like you're going to be in a position where like you can the best thing you could say is like can you do it again? And then I could I could see if I can find it You know like I can go look at the audit logs But what was the IP address of that pod at the time those IP addresses are ephemeral There's like you can see the number of problems of those things represent With tetra gone you get an event stream and all of those events can be sent to your spunker to an S to a sim etc And because the way because of the way that we're gathering context around all of those events Those events become a rich body of data that can be used to answer these particular questions So these are some of the adopters of psyllium and tetra gone We've been working with lots and lots of customers both open source and and both and enterprise Agreements if you are an adopter of psyllium or if you're using tetra gone I do request that you go to our github So github.com slash psyllium you're going to find a user's document in there Go ahead and put your name in there and say that you're using it if you're using it whether it's open source That would be tremendous It's all part of our effort to get to graduation of this project So if you're a company that is making use of psyllium go put your name on the user's list. Thank you very much So tetra gone It's like as I said, it is the newest open source project in psyllium It's evpf based which means it's high performance and has zero Modifications are required to the application itself and it hooks into kernel functions after parameters are copied So as I said before when we actually surface an event that we have gathered about what's happened inside of the Linux kernel We instrument that up with that that event with information that makes it easy to understand contextually when who how why it happened Whether it was successful Now one thing I it mentions in this slide that I haven't talked about is that action of emitting an event It's just one of the actions that is available to you, right? You can also do things like kill the process So if I saw somebody running a set you ID call and I wanted to actually make it so that that was just not possible in In any process in the entirety of the cluster whether running inside of a container or running on the underlying host OS I could actually implement a tetra gone policy a tracing policy That would block the set you it is ID system call across the board everywhere If you want to know more about tetra gone, you can also go to our github there. So that's github.com slash psyllium slash tetra gone with psyllium tetra gone with psyllium tetra gone ebpf makes it dynamic That example I talked about with set you ID one of the other ways to solve that particular problem is an older Linux technology Called sec comp or secure computing Secure computing and sec comp and those and a lot of these models Actually, I'm curious. How many people have heard of sec comp? Nice So you understand that sec comp is something that you can implement But you have to do it before the process starts because you have to associate that process with that sec comp policy Before the process starts you can't dynamically change it So if you wanted to add more to the second policy or modify it in some way You have to restart that process to pick up the change to the policy that you've written But in ebpf, that's not the case Right. This is an ebpf program that I'm going to inject or insert into the Linux kernel and associate it with particular calls and I could do that dynamically Means that if I wanted to Continuously iterate on the number of system calls that I wanted to block or change in a second and a second bundle I could actually just iterate on it directly or I could wait for in the case of the set you ID call This particular call is required by most container run times to be able to create the container itself So if you blocked it globally on a kubernetes cluster, you'd have a really bad day But if you were using something like if you were using something like psyllium tetragon to make that Policy you could apply that policy after the pods had been created Which means that the system the set you ID system calls that are needed by the container runtime have already taken place And now you're saying from this point where the program application where the application is started I now want to apply this policy and from here on no set you ID calls can run from the perm within these containers It as a as we talked about before it protects the pre-existing processes use uses kernel knowledge to hook into sufficiently stable functions and Multiple and can handle multiple coordinated ebpf programs, right? So you can actually take action. This is where like for example when we see that there is a event that we want to trigger on What are the actions that we want to actually have happen when we see those things happen? And there is also some in kernel in in kernel event filtering as you're going to be able to see here in just a little while We get to the labs when you're looking at the amount of event data that we can produce It can be a lot Right there is a ton of event data that we can produce and the question often comes well Like how can I limit how much information? I'm actually going to get out of that so that I can only trust target those things that I care about and send that to Splunk, so I'm not sending everything So we do have in kernel event filtering and we also have user space filtering right so once the event has come down And we're propagating we're propagating that event into a into the stream that we're going to send to whatever Collection you're looking for you can also filter it at that point So I like this slide because it really kind of hits something that I always like to say which is context is everything context is king If you don't have context about these things it becomes very difficult to actually understand it right if you can't measure it You don't can't you can't improve it etc So in this case where the the example here is that you would write policy when you see that when you see malicious But the detector behave when you see malicious behavior detected You can actually make sure that you get an alert that is actionable And you can send that alert to logs you can you can actually have that alert taken action Etc, but the important piece is it's not just telling you hey somebody made this system call. It's telling you this process Running inside of this pod running inside of this namespace running inside of this cluster Made this system call at this time and it was successful Right, there's a lot of information that you can gather here, and it's not just system calls right you can do these things with You could we're actually just recently implemented file integrity monitoring So when we the first time we see a file get touched We take a check some of that file we stored in an ebpf map The next time we see that file system that file get touched if the checksum is different Then we understand that it's different and we can alert you on the fact that the file is no that file integrity has been changed How does it all work though? So tetragon runs much the same way that psyllium does in the form of a daemon set and just like psyllium It can also run on virtual machines or other external entities directly Tetragon does not require that psyllium is also deployed you can run tetragon only and in fact We're seeing a number of customers trying to solve the security observability use case specifically doing exactly that Because they can actually give them a lot more context a lot more data about what's happening in their runtime Without having to actually change the underlying CNI to accomplish it because it runs as a daemon set it is going to instrument the Linux kernel on every node in your cluster or every machine that you apply a tetragon agent to and Through the through the CRD in your Kubernetes cluster You can go ahead and configure tetragon such that it will be able to handle all the all of the events that you're trying to look for Some examples of the things that you can actually key on or are here in this graph So process executions just call execution file access. We can look for interesting patterns in TCP We can look for nameshakes escapes We can look for privilege escalations has somebody done a sudo inside of a container or has a process tried to Change out of the PID namespace that it was started in into another PID namespace, etc We can understand data access. We can also expose metrics Because we were able to instrument the Linux kernel directly here We can actually look at the data the data that's going by and expose metrics for HTTP for DNS and for TLS So one of the other questions that people can ask in a security observable really use case Is something like what TLS ciphers are how cipher suites are in use? This is I mean this is frequently a compliance check or compliance control that you have to satisfy in your infrastructure Right all of the workloads in your cluster cannot be you can't none of the workloads in your cluster can use TLS cipher suites of this particular of this particular type How would you audit that today right like it would be pretty difficult? I know that some of you would probably face that challenge But with this it would actually be relatively easy because we have that context already and we're producing it at the user space level So some of the questions that you could be asking questions And some of the things that you could be asking questions about is network traffic, right? We have layer 7 parsers so we can actually take a look at the network socket layer and say You know for this app this application tried to make a curl command and that curl command tried to connect to this DNS Pats to this DNS name google.com or whatever and We can actually show you what the resolved name was we can actually we have DNS parsers to make sure that We understand at the time of that execution what the resolved IP address was Whether they were trying to do like a bypass of DNS if you're doing like any of that stuff We can also instrument file and IO activities So anytime somebody touches a file any time somebody, you know, you can actually protect whole directories or just specific files any any running executables and whether new ones have been spawned at a later date and Obviously system call and and changing privileges and namespace boundaries as well Now some of the examples of how we implement this So this is an example of a tracing policy and you're going to be you're going to be playing with this directly here in just a minute So in this example of the tracing policy, we're looking for that set you idea system call and specifically So this one is interesting because we're looking for that particular system call And when we see it, we're just going to emit an event on it And this is an example of what we might see of how we might be able to query that data in Splunk right so this example looks at all of the event data It looks for the binary bin sh and it looks for what's actually have what you know When that process was started and tries to actually produce this data in a print in a relatively time So in this example, we're looking for anything that actually happened After five minutes from the time that the initial the initial process inside of that container started This is actually making use of the tetragon CLI So the tetragon CLI can actually show you it can parse those events and make them a little bit easier to understand So in this example, we're saying if you have written the policy that says anybody trying to make a Right to root dot SSH authorized keys. We should just kill that process right away And emit an event around it, right? And so here in this example are some some of the example events that you would see in that case Somebody tried to be it's well actually that doesn't quite line up with the text But you give the idea in this case it's saying if somebody actually wrote to Etsy shadow Instead of actually allowing that right. I want to kill it So you might still see the number of bytes there that would go into the file But that file will not actually be written because the process will be killed before the right can happen We can also monitor and prevent capability abuse, right? So if somebody's using in a center to move back and forth between different Between different namespaces within the same Linux kernel This is something that we can detect because it is all within our scope inside of the Linux kernel itself And that is the introduction. So now we're going to go into the lab Before I jump over before we jump over to Raphael's laptop here. This is the link to the lab ISO go dot to slash Kccn Cu cu dash tetra gone take a picture Write it down. This is where we're going. This is where we're going to spend the rest of the time here in this afternoon So this is ISO go dot to Kccn cloud native con EU Dash tetra gone Once Raphael takes over I'm going to come around and make sure that everybody understands how to get how to get there But this is the lab that we're going to start up And when you go to that URL You're going to see two labs. The first one is the open source one So I want you to click on that one and then it will take a couple minutes to start up And while we're waiting for that to happen tetra gone Oh, sorry rough Raphael will tell you a little bit about what the lab is going to cover and we're going to jump into it Don't raise your hand if you haven't taken a picture of the URL Or if you don't know what the URL is Sweet Okay, that means that everybody has what they need here and I can move on right awesome Thanks, Debbie. It's changed the laptops How does this way? Oh, yeah One more thing I wanted to point out is that there are standing microphones in the aisles So if at any time you have a question during the lab, or if there's anything you want more clarity on or anything I'm going to be walking around trying to answer your questions But there are a lot of you and so if there's anything that would anything you would like to ask You could actually come to one of the standing microphones and ask that question And we'll be able to answer it here So hi everyone you should be getting to this page if your access to the internet is working fine So we're gonna go through this These are in struck labs So they're the base not a platform coding struck that actually uses VM in the cloud in this case We'll be using VM in Google Cloud, but you don't need to have an account. You don't need to care about this This will all be in Your web browser There's three labs that are listed here the one we're going to concentrate on at this point It's the first one which is based on tetra gone open source and we'll show tetra gone from a CLI perspective I will go slowly and explain things and take the time to follow up with you Follow up follow along with you on this lab But if you want to go faster you want to run ahead you can do that totally and you can do that the next labs Actually, we have about 20 labs in total that are available at this address here Isabel in comm slash labs. You don't need the resource library. It's just a redirection so isabel in comm slash labs and you have about 20 labs here that Talk about selenium hobble tetra gone lots of things all these labs are entirely free The only reason why we provide the invitation is you don't need to go through the marketing You know giving your name and address so the three ones here You won't have to give your name and address But if you want to take the other labs and you have questions about them as it goes if it doesn't disrupt the whole thing You know feel free to do that as well. This is perfectly fine So let's get started with this first lab. I'll start it right away and I'll talk about it a bit So this is this is actually the first lab that we made with tetra gone. So just Duffy explained a little bit tetra gone is the newest of the projects within the selenium projects of the open-source projects within the selenium project and It was released during kubecon in valencia last year But it actually has a history. So if you do check out the enterprise base labs that are listed afterwards They may be actually older than tetra gone itself Because tetra gone was actually part of the eyes of aliens some enterprise Tooling before it was open source. So what was open source is part of what was part of that the enterprise offering before You can have a look at the explanations that I hear while this is loading So it takes usually about a minute one to two minutes to start the VM under the hood This is starting a new VM for you. You'll have your own VM You can do whatever you want with it if you want to trash it crash it You know just look around whatever I don't personally care what you do with the VM But don't ask questions on why it's not working if you actually trash the whole VM right so Duffy talked about the need for security observability and how selenium tetra gone solves this So as he said the idea is we can actually plug to a lot of the APIs in the kernel a lot of the syscalls the k probes u probes and so on using tetra gone and attach to events in the kernel and actually Derive information from these events or even react to this information There's a book as well and it so happens that I think this book is available at the A's of a booth and One of the authors Natalia. I don't know if Jed is around but Natalia is here and she can actually sign the book if you Want just saying so this is a book that's related to tetra gone and the this lab is actually taken from this book That was released last year All right, let's get started I think if you click roughly at the same time as me your lab should have started already So you should have a green button at the bottom at the bottom at the lower right corner You click that green button and you'll have this beautiful interface So this is the instruct interface so here we have the instructions on the right The instructions are foldable. There's foldable sections So here this are the instructions from the first section if you need you can actually resize the column on the right If you want to see the instructions that you had the slides they had before you can actually see them if you click the button at the top of the column here and And here we have the tabs so there's one tab for the terminal We're gonna do things in a terminal and feel free as well. We added a tab for feedback Which is essentially a Google form If you have problems with it and you want to report without asking me or else you can raise your hand or pick up a mic and ask a question Don't hesitate if you have questions during During the session This is perfectly fine. So let's have a look at what we have already Keep control get note. I hope this is big enough. Let's see if I can zoom a bit This interface not happy to zoom in on way. Is it big enough for everyone? Can you see I mean the screen is quite big No, it's not. Oh But you have the same thing on your laptop that better Yes, great. Thank you in the back as well Yes, great. Yeah, you have it stress free So what do we have here? You are in a VM actually you have your own VM and There is a cluster that is started It's a kind cluster and at the moment it only has one node Well, you're gonna work with one node. There's no point in actually Demoing this with several nodes in prediction. You would be using several nodes. So we only have a one node cluster with a control plane That's fine enough for our demo And what we're going to do is install cilium Us install tetragon. Sorry, not cilium. We're not going to install cilium. This is actually a Cluster that doesn't use cilium actually and as you can see the cluster is marked as ready The node is marked as ready and we didn't install cilium So that means the CNI is ready and I actually uses the default CNI in kind which is not cilium so we're going to add the helm repository for cilium which contains the the helm charts for cilium for Hubble and for tetragon and then install tetragon so updating the helm repo and then installing cilium tetragon with The option tetragon that export denied list equals empty. All right, so nothing special here, right? We're just installing using helms. It's pretty simple Some tetragon runs as a demon set like Duffy was explaining like cilium Which means that you'll have a tetragon pod on every node So only one here in this case because we only have one node if you had several nodes You'll have one part on each node and the reason for this is that just like cilium configures the node for Networking reasons and configures injects eBPF programs and configures eBPF maps for routing and network policy decisions Or services to replace to proxy in this case tetragon will inject eBPF programs and manipulate eBPF maps For security observability and enforcement reasons So let's see if tetragon is actually rolled out. Yes in my case tetragon is rolled out the The the pod is started. You can have a look here. It's deployed by default in the cube system No, it's not by default. Actually I specified it. It's deployed in the cube system namespace and I can see here I'll close this a little bit there That I have indeed a tetragon pod that is running We sweater. Okay. Oh Yeah, because you have one. Yeah, great Perfect. Yes indeed. Thank you So tetragon has been deployed it might take a little bit longer for you I don't know but normally it's fine that the networking anyways that is the cloud networking so I should be fine and Now we're going to install on our machine the tetragon CLI. This is this is optional, right? But what that will give you is that you'll have this So I put it here in user local bin Tetragon and this is a binary. It's a go binary that will be using to actually Format nicely the the jason locks that we get from tetragon So it will give us some nice colors and emojis so we understand better. What's happening? Again, this is totally optional and if you want you could just extract the jason send it to your favorite CM platform and Do stuff with it there Next now that we have tetragon tetragon itself is not going to do much By default so we'll need to add a tracing policy and like Duffy was showing before tracing policy is essentially it's It's a custom resource. So when we install tetragon and added a custom resource definition We can actually get a look at this. It's not in the description and Instructions right but you see if I describe the custom resource definitions on my cluster I now have tracing policies and tracing policy namespace the namespace one are pretty recent and tetragon and They allow you to define either global tracing policies or namespace tracing policies here We're going to use global tracing policies. I'm sorry. May I ask a question or it's I should ask somebody else Okay, no, that's fine. Oh, yeah. Yes. Yes There is there are tracing policies and it says that there are three of them But if you open the file, I think that's only two where is the third one? All right, let's get to this. Yeah Thank you So let's have a look at the file indeed So this tracing policies are giving instructions to configure tetragon So that it will create ebpf programs to to trace what's happening on our node So let's have a look at this one is provided school TCP connects. Oh Yeah, that might be that might be an issue Yeah, I think they used to be three and we reduced it to two. Yeah, so I might need to change this. Thank you So here we have a tracing policy that we call network connection We're going to deploy it again by default tracing policies are not namespace. They're global So they won't be linked to a specific next face It will apply to everything in our cluster and we have two k probes. So k stands for kernel kernel probes So we will attach two events in a kernel K probe kind of events in a kernel and specifically to calls linked to TCP connect and TCP close And we will map the arguments That that come out of these events So the first argument is of type socket here and second argument is of type socket and this is to help tetragon know How to map these arguments to the json that will result Right, so let's apply this first one There you go shouldn't take long So here I can actually list my tracing policies And you see I have one that was applied six seconds ago And obviously if I look at it, I will see the content of it any ammo fine great And the second tracing policy that we will apply is called sys write its Kubernetes manifests And again, it will use k probes and we'll have three rules here The first rule is an fd install file descriptor install Call and we're looking For calls to fd install Where the Where the argument matches etc Kubernetes manifests So we're looking for actions that will Effect this file etc Kubernetes manifest file or directory in this case Right, the second one is assist close And the third one is assist write so Closing the file descriptor or writing to the file So we're going to apply this Again, so I just copy here and paste you can easily copy and paste right that when you have instructions on the right If you just left click it will copy to your Copy buffer. So just apply this And now we're going to look at the logs that are produced So what we're going to do is we're going to look there's a container in the tetragon Pod that is called export std out and it will The the logs of this pod of this container will contain the the tetragon logs in jason format So I can show you without piping into tetragon observe actually So you'll see exactly what it looks like And here we're following so you see I get jason logs. This is great Not very readable maybe So we're piping this into tetragon observe, which is the local Common line tool that you downloaded earlier And it will actually analyze this and show us nice little icons and colors so that we can actually Not lose our mind looking at jason Right. So as you see, there's there's a lot of logs coming out of this Some logs are linked to file being open file being files being written or to tcp connect events that we added from the first rule So we have actions logs coming from the two tracing policies that we added All right, everybody's fine with this anyone having an issue with this first step No, you're good Let's continue check So if you followed properly the the different steps the check should pass fine So next what we're going to do is this is taken from from the book on security of visibility based on tetragon Like I said and we're going to look at detecting a container escape using tetragon So the idea is that we have there's this pod that's running in a privileged mode And it allowed an attacker to actually To actually escape the container And so we're going to see how we could detect this So what was going to what the attacker will do is enter that privilege pod from there gain access to the node And from the node actually create a static pod and e.g. Secret manifest so that they have a permanent foot on the machine And then from that pod do whatever they want right and run script or whatever So we're going to see how we can detect these events and could potentially alert on this or even kill them Potentially because as you saw killing is is not we don't talk about this in this lab really But it's not too hard. It would be just changing the action at the bottom of the tracing policy So that instead of logging we would just kill the process, right? So now we have two terminals in this one So pay attention when we change terminals and instructions So what we're going to do here is we're going to start by checking Events in the tetragon logs related to a pod called privilege pod And you see here when you run tetragon observe you can actually filter by pod name If your family was hobbled, this is the same logic, right? We have the same kind of option here Right now nothing's happening. This privilege pod does not exist But we're launching this in the first terminal so that when it happens we see it right away Okay, so let's switch to terminal two in the tabs up there And in terminal two we're going to apply this privilege pod.yaml. Let's have a look at it Privilege pod.yaml. You see this is a pretty simple pod In projection most likely it wouldn't be a pod like this may might have been deployed by deployment You know whatever for some reason there is this pod that is running and it so happens That it has a security context set to privilege true Which is going to trigger the issue in our case, right? So let's apply this privilege pod privilege pod.yaml Then we'll check if the pod is started and might take a little while you see it took five seconds Now my pod is running. So, you know check that your pod is actually Running now if we go back to terminal one And it's taking a little while so Let's wait a little bit. Does it does a buffer in the events so that might be the reason Let's see Let's start it again Nice Uh, this is started All right I have my true tracing policy. This is fine. This should be fine Everything's fine Are you seeing something on your side? The namespace is wrong Where? No, but that's that should fine. That should be fine Really? Hmm It shouldn't have to be restarted. All right, I'll have to check this Let's restart data gone in case Cube system so cube system Let's see It's restarting obviously now we see it. Okay. So if you have that issue, I'll post the comment here You can restart data gone. It shouldn't happen. I'll have to check why So cube control minutes and cube system rollout restart them and set data gone that will restart or just kill the pod There's only one pod Right So now when we check we actually see The process being started. So there's an engines process that is started in this default Slash our privilege pod Great or almost great, but great fine. Let's continue So let's go back to terminal two We have our pod that was started and we're going to exec into that pod And start a shell a bin bash Shell into that pod Right what happens is when we go back to terminal one we can see what's being executed here So we see bash being executed. We see that it's executed in The default privilege pod context And then in there we see everything that is linked to opening and closing files That is linked to bash being executed. So we see the ld.s. So being open All the libraries that are loaded the tty is being open and so on and there's actually an ls Because I typed ls wasn't supposed to do this but I typed ls So we see ls and all the libraries being Open and closed to make ls function Right now in terminal two we're going to take advantage of the fact that this pod is privileged By using ns enter ns enter minus t1 minus a bash and as you may know, this will allow us to bypass The isolation of this container and actually access The the machine the under underlying node Yeah, what we're doing Okay Hi So what we're doing is we're actually moving into all of the namespaces associated with pit one And because this is a privilege pod that actually has access to the host pit namespace The first pit in this one is probably an ith system, right? So this could be system d When we do an ns enter like this we're basically saying I want to be inside the network namespace So I want to be inside the file system namespace. I want to be inside the pit namespace all of the namespaces I want to move directly into that group of namespaces that the pit the pit one on the underlying host is Thanks, Duffy for the advanced explanation so Yeah, so thanks to this we get access to the host namespace Which means we're now we've we've kind of escaped from our container right, so We essentially broke out of the container if we look at the at the terminal one again Oops We can see here um, sorry In the full privilege We can see the ns center. Where is it? I see ls. I don't see the ns center This Looks weird. I hope I'm not seeing bugs here This is a I think because it's standard out. It's a bit of a ring buffer So if you see a bunch of yeah, yeah, it might just take a while because of the ring buffer Yeah, there's 99 if you exit and you go back and you do ns enter again, then you should see the event go by again Yeah Yeah, there So now I have the ls Looks like the tracing policy we have Yeah, the the events might take a while because there's there's a ring buffer. So It needs to have enough events before it gets exported So we'll see the ns enter eventually. I think it gets flushed on a regular basis though I think we may be missing a tracing policy That's exactly See the ls. I don't see an s center Oh, there I have the ns enter here, right? So ns enter here and then from the ns enter, uh, you can actually see that it actually executes bash afterwards and then Uses big groups bash geo colors and so on so we can see what's being executed In this context Right. So now that we're running bash and we're pretty much root on the host actually We're actually on on the host. How could we see this? Yeah, we're we're on the can't Can't control plane host you see here the the node name. This is not inside the the container inside the pod This is what this is inside the container, but it's the container that works as a host and a kind cluster, right? So now that we're running on the on the host, which is a container in the context of kind The next step will be to maintain a foothold there So what we want to do is create a static pod that will always be running And we'll give us access Whenever we want as an attacker, right? We're being the bad guys in this case So Let's get the the tetragon logs again from the export std out container Do a tetragon observe and we'll filter on privilege pod or hack latest or continuity runtime or prox self xa There was in a way to have a single common clean common to get this So we're just grabbing in this case. So we don't need to run a common every time So we see all the events we're interested in and if we don't grab then there'll be a tons of events. So So now let's again Get into our container Into our privilege pod. Uh, we do the ns enter again. So that we access the host namespace And now we're going to go into etc kubernetes manifest There i'll get there which obviously means that we're on the host because this does not exist and The pods that we accessed originally And in there you see all the static pods that that are there. It's the cube api server. That's because we're on the control plain node in this case And we're going to create a new pod Which is called hack latest as you can see this is just normal pod It's called hack latest. It has access to the host network And the the namespace does not exist. This is a little bit of um Of a hack here, which will make it invisible to uh Cube control in our case Because the namespace actually does not exist. This is it. This is a very interesting one also because I'm curious How many people are familiar with etc kubernetes manifests? Are you aware of that directory? Okay, that directory exists as part of the project of cube adm or kub adm And cube adm is a tool that's part of the kubernetes project Um, that actually helps you get from basically a bunch of docker hosts to a kubernetes cluster. It's a provisioning tool And the way that kube adm provisions kubernetes clusters is it uses the static manifest to host The api server the control plane the controller manager and scd on the set of hosts using these static pod manifests And it's a pretty interesting thing Static pod static post static pod manifests Are owned and operated they live and die with the kubelet Right as long as the kubernetes manifest in that predefined directory Then it will start all of those manifests and manage the lifecycle directly Even if you could see these pods like if you were to do kube kettle get pods inside of the kube system namespace You'd be able to see the pods in there And if you were to delete one of them That would have no effect on the pod itself because it's a static pod It's only managed by the kubelet. It's not managed by the api The api only reports what it sees about those pods. That's why it's a static pod If you wanted to stop one of those static pods, you would have to go to the file system where that static pod is defined And delete it or move it out of that directory And that would be like one way to stop that particular pod But you can't stop it with kube kettle and you can't restart it with kube kettle So what we're doing here is we're defining yet another static pod manifest This is the hack latest one and we're putting it in a namespace that doesn't exist And because it's a static pod manifest it's owned and operated by that kubelet The kubelet is going to try to register that pod with the api server I've created a new pod. It's called the hack latest And it's in some namespace that doesn't exist And so it will get the same error that you would get if you tried to apply a manifest in A namespace that doesn't exist It will tell the kubelet. There's no namespace called doesn't exist And then the kubelet will say okay, and it'll just keep going and keep managing this pod in silence until forever Fun times we had Yeah So because of this we now have a pod that is actually running on the machine, right? We can't see it with kube kettle because like Like duffy said if I look here kube kettle won't list it because it's not in a namespace that is known So it's invisible To a kubelet as administrator, but it's running all right We can see if we use a cri control because kind is not using docker. It's using a cri So if we use cri control ps we can see actually this pod running on the machine And if we were to remove it Kubelet would recreate it so every time we restart the node for example this pod will actually Start again So now we have a permanent foothold if we've deployed something in that image that lets us access the node again That's it as an attacker. I can come back whenever I want and there's very High chances that the kubelet as administrators won't see that pod running So it might take a few seconds, but by now I think it should have started So again if you look at all the pods you won't you won't see it Now let's look at titra gone And in titra gone we have some interesting events in there. Let's get back up a bit It's a bit So that's what I executed first when I got access to the privilege pod I executed ns enter And then you can see that I actually What is this Let's go down a bit So bash was executed here There was a cat that was executed and here you remember I did a cats with a here doc and I redirected it to the file So this is what happened here So this file was written etc kubelet as manifest hack latest at yaml So tetra gone actually saw it and you see here that we're not in the context of the privilege pod anymore We're in the context of the kind control plane node And then we see a lot of things executing that are linked to container d And then the pod is starting we get the init we get run c And so on and so forth So everything that is linked to that pod that is starting can be seen and could be traced With tetra gone. So if all these events have been exported to your favorite scene You could see in the history what actually happened and at what time it happened here We're only seeing a nicely formatted thing But actually you get the timestamp you get everything in the in the json events All right So let's click next and continue So now that we have access to this To this static pod We could do something in there So let's again look at the tetra gone logs We're going to grab for privilege pod or curl or python again You could organize things around these events in your cm to actually detect things And and search for events So in terminal two We're going to again access Sorry Sorry, is it possible also to detect as you made as kind is running? Cry Is it possible to detect with container d? For example, if it's running an invisible pod like with Like we yeah, so that's why we were seeing here can't continue to actually All you mean on the host For example, we are running gcp and gke So we have we are based on container d. Do you think it is possible that having access to the host? And listing all the pods on each host, of course, but Well, so on the host if she lists all the pods you actually see this. This is what we were seeing here Um, so here if I'm on the host and I use CRS control I can actually see This uh, how clear the spot. Yeah, exactly. What what I mean is that if because here we are running kind with cry Um, yeah, yeah, so if you were running something else where you're running docker or whatever you could see it You could see the the the container started with with docker Right, it's just q control won't see it Because it's running a namespace that doesn't exist and the way q control functions is When you do when you list the pods for all namespaces It will first get the nest the list of all the namespaces and then request the pods for each of these namespaces And because it's running in a namespace that doesn't exist. You will never see it Yeah Did I answer your question? More lies All right, let's get the container id for that For that pod For that container here. So we just executed, uh, cri control ps to get the container idea Same as you could do with with docker for example, right? And then with this container id we're going to um We're going to Exec into this container. So we're now inside the static Pod that we created earlier, right? So this is a static pod that potentially we could leave running that even if we restart the node It will restart and will be invisible to The the administrators if they're not properly monitoring the cluster for this kind of events And then from this we could typically run some curl to execute a python script, you know One several times I'm doing some activity that shouldn't happen and it will be really hard to detect Uh from the administrators Because they don't even know that the spot is running But if I look here in tetra gone by the time this actually comes here, I can see here python Uh, you will actually see the calls being done And you can see that it's done in the context of the kind control plane In fact, you should see as well the connect That will come in a while once the Again, it's the buffering the buffering thing Because the python script makes the call to the outside. So normally And it's it's executed actually it's not running anymore So we should see the connect events Coming in I'll wait a little bit Do some more cold try Try descending a curl. Yeah, you should see the curl. It may not be part of your grip So are you talking to me duffy? Yeah, it may not be part of the grip That's maybe why you're missing it. I didn't hear what you said The the challenge is that the the event you're looking for may not be in the in the grip match Oh, really? Um, oh, you're right curl. Yeah. Yeah. Yeah Yeah, because it doesn't match. No, it should match it should match curl I think it's the buffering issue again Yeah, no, let's see All right, it worked last time I tried it Great I don't know. That's wrong This problem would probably not happen in your environments because you'd be sending all of this event data to a splunk or to a sim Or something like that and you wouldn't be dropping these events In this example, we're actually sending all of this data To standard out on the container and then we're trying to grip that event in the standard out Which is basically only buffered in the in the containers standard out itself, which is still truncated And this is what we mean by like we may have missed the event because we're trying to catch it Like in time and this mean it's not what happened fast enough All right. Well, sorry about that Are you seeing in yourself? Is someone seeing it? Yeah, okay someone saw it Oh, here it comes again. This is a buffering issue. So Yeah, so there there you go. You can actually see the curl so we could detect that thing And we see it's in the context of the control plane itself Because All right, it's going for the filing show. So what's interesting here is we get this meta data as well And it would be more meta data that we could get from the from the jason output as well and and really the one Big benefit of of ebpf here as well as in the context of selium and and hobo Is that this meta data can be added directly in the context of the kernel? So when the ebpf program Processes the events that it's seen it can actually directly enrich These events with the meta data because it has access to this meta data through ebpf maps So the next step is Detection itself. So you can have a look at the book. Like I said Natalia is around from time to time and she can actually sign the book if she didn't get a whole of the book Um Yeah, um questions on uh on that lab specifically If you have questions come on up to the microphones or I can come to you Otherwise, thank you for spending your time with us this afternoon Is it was such a pleasant surprise seeing so many people show up for this event? So thank you so much We do have some uh selium and tetragon stickers up in the front if you'd like one come on get one And again, feel free to take the other labs on your own Um And give us Feedback. Oh and one thing that you mentioned earlier Just so people know where to find it to be sure If you go to the selium project github.com slash selium slash selium here I know you mentioned it There's the users.md file So if you're already using selium and you want to help the project one easy way to help the project is to Let us know how you use selium what you do with it Uh, the more companies actually list themselves here the more it helps the project to graduate