 Hey, thank you everyone. It's it's very nice to see so many people just before lunch. So thank you for coming Let's just jump in baby. We're talking about today the past present and future of tetragon What was the first production use case? What was the main security challenge? What were the first and then later stage customers? What are the lessons we learned and where are we heading? My name is Natalia Venko. I'm a product manager for runtime security at isovalent and here is with me John Tetragon lead psyllium maintainer and principal engineer at isovalent So how many of you heard about tetragon raise up your hands? All right pretty cool. That's quite a lot of people. I will still do a quick intro in a couple of slides So tetragon basically on the ebpf based runtime security observability and enforcement agent So it can run on top of any Linux operating system in case of Cubarities and cloud native environments. It's actually a demon set and then in case of bermata VMs It's actually a system D managed binary or a container So it uses ebpf to provide security observability and then runtime enforcement and then what's really important It's transparent so no changes required to your application So we use ebpf to provide the visibility and then as you can see on the picture We have visibility into process execution system call activity as 3f or connections data access file access Linux namespace and they basically capability changes as well But it's also important that all the extensive in kernel filtering and Aggregation so all the business logic is happen is happening in kernel So basically this leads us to low overhead regarding to CPU and memory and then John will talk about it later So all those Observability events are exported to to a file to a JSON file and you can integrate it with many CM systems I Listed some on the top like spang spang elastic search Sumo logic you could export it to S3 or like a block storage and then basically also you can also use graphon Okay as an integration So why is it so powerful? So it provides really synchronous BPF monitoring filtering and enforcement Completely with ebpf in kernel There is also the overall efficiency of ebpf comparing to for example existing like user space tool or kernel modules We also do like humanity's identity awareness with BPF so you could create communities identity of our runtime security policies We could monitor everything that happened that happens in the Linux kernel and then basically we also do ebpf based inline enforcement in kernel rather than for example out of So that was a quick intro on tetragon. So let's take a step back and think about like the history of tetragon So I will start from 2016. That was the year Once the ceiling project launched as a CNI Providing ebpf based connectivity observability and security to the cloud native word At 2018 ceiling reached 1.0 making the project available for generic production users and Then in 2020 the first line of tetragon were written but actually not under the tetragon name So this was part of initially part of the ceiling enterprise code base And then we had like a code name for it. It was called Hubble FGS FGS stands for fine guidance sensors And then actually the first feature set was requested from a customer It was a complex data analytics company. I will talk about it later, but basically They wanted to have observability into what's happening in their communities environment They wanted to trace all the executables or the egress connections that Leaving the cluster and then basically trace every network socket So after 20 after 10 to 10 in 2021 the first cube can talk happened About tetragon which basically included all also open source functionalities and then since then basically we are at every cube Can you and then any? And then presenting what have been working on so we had talks about how can we create least privileged profiles? audit ebpf programs and maps data look for shell and the next CVs and the next CVs and then Sensitive data patterns for example with KTL's and then so we still started to Develop tetragon still under silly a mentor price. We had the asria for conductive is it conductivity visibility The as seven attributes HTTP DLS TNS Linux capability and namespace changes and then we got customers So these were our like first set of customers the early adopters. I will also talk about it later And then in 2022 so we decided to actually open source the project So we renamed it to tetragon and then make it open source So actually the github stars that you see on the on the left side. That's actually what happened Under 2022 You in Valencia? so people like it gained a lot of traction and then that was the time when actually the first block was came out About that dragon and that was the time when actually the first or really Report came out on that dragon. So if you want to get like an order report come to our booth We will be handing out reports over there and then So in 2022 like our Contributors like open source contributors users and actually our engineering team grow because the initial Set of set of code was written by a very very small people And then basically we got the second set of customers who are kind of all like mid-stage adopters And then here we are 2023 So tetragon just reached one or two making the project available for generic production users So you can go to the you can read the blog post. There is a blog post out about it There is a press release about it and then basically on the website We have four main use cases execution monitoring file access monitoring network monitoring and Policy enforcement and we have a set of like observability policies But you can just like plug into your system and then try it out So a few github statistics, how many I so This is just pulled from the github on the one that I release and I think a couple things really stand out There even before the one that I release we have 64 contributors And what's really exciting is that? There's actually more non ice-ovalent contributors at this point than ice-ovalent contributors So we'll have 23 folks from ice-ovalent that are working on this at various times apparently on you know the Larger group of 38 here are not ice-ovalent employees. So that's great From the product side or from the project side And then the other point here we just saw like if you look at the patches per month and the PRs They get very active. There's a lot of stuff going on We have a hundred and thirty some odd patch patches that are active right now more merging patches They're being open and closed. So Definitely come to the github page We like you know, we have a bunch of first issues and stuff so you can take a look and contribute Cool, so what was actually the very first production use case? So this was coming from from a sophisticated sophisticated data analytics company and then basically they needed a cloud native or like humanities identity of our solution To replace their traditional fireworks Traditional firewalls and network monitoring tools. So the and also the EDR systems So the problem here was that their network rules were based on IPs and ports and host names And then these were not really useful in like Containerized environments where I piece I frequently changing Potsk and go and Their endpoint detection and response to suffer the same So all of their alerts were based on IPs ports and host names And then the other problem was that every time they created a new resource They it had to have this EDR agent sending this telemetry to this central light system Then if you consider like a humanities environment, that's not really like realistic either So they wanted to create like on an EDR or like a network monitoring tool Which was based on like humanities identities like labels what labels namespaces or For example, they wanted to trace API call identity And then that's how that's how they implemented it. So they use selium for network traffic control They implemented selium network policies as 3a4 and as 7 in their humanities infrastructure And they use tetragon as their observability and logging platform And then they they wanted to have visibility into what's happening in their humanities environment And then this was basically an alternative to traditional enterprise EDRs and then their alerting and detection engineers created signatures And then others to detect certain attack scenarios or malicious behaviors They actually wrote a blog post. So I linked it if you check the slides later You can actually check it out how they implemented it And then these were the very first initial tetragon features trace every executables Trace every address network connection with destination name trace every open socket and then stroll all these data in an S3 bucket for audit reasons So what are they now? They were actually the very first user of tetragon arm support And then they are looking into fine integrity monitoring and DNS drops as we speak So they are still a very active user of this So who are kind of like the early customers that we have so they all had a couple of things in common So they all had like a stable backend system that they could store all these data like either Splunk, Elk or S3 or they had their integrated pipeline into Into their security analytics platform They were also comfortable creating and maintaining their own queries They were actually comfortable with custom like customizing the filtering and aggregation and they had people They had like an engineering team who were Well, actually operated and then deployed a tetragon. I listed a couple of names here and Logos and then all the use cases were actually driven by them. So these were driven by customers So for runtime security, they wanted to monitor all the executables for network security They wanted to monitor all the network connections DNS names all the open network sockets They wanted DNS troubleshooting figure out like which process or workload was responsible for DNS drops and then fine integrity monitoring like monitor access to sensitive files or directories and And then also for the for the enforcement and deployment use cases was also driven by them So for us for enforcement, they wanted to have like Kubernetes identity of our security policies That allowed certain by at least the system cost from pods or namespaces could drop certain Linux capabilities or Block access to kernel host namespace namespaces and then for deployment The arm support and actually the standalone the external VM installation was also driven by them So a couple of signatures like these are in these are in Splunk But this this can be an elastic search or like graph analogy The syntax would be different, but the idea is the same so this is basically detecting workloads and processes with pseudo which started as root or Malicious shell execution like someone keep cataract into the workload or We could detect like untrusted DNS names So if you see some new DNS names that you haven't seen in the last week that can be also interesting or Detect outbound connections to non-standard ports like something is connecting to port 80 or not port 80 or not for 443 That can be also quite interesting So these were the the first set of customers and then we had the second set of customers later on So they had they had a common Common aspects that they didn't have a stable backend system So they didn't have like Splunk or act or even if they did like they didn't have the resources to manage or store the data Or these resources were actually limited They didn't have for example a set of engineering team to actually operate that wrong on and they were not really comfortable to create and actually maintain their own alerts or Signatures so they needed kind of like Security dashboard like an out-of-the-box dashboard That we could provide for them. So I put a couple of names here There will be more coming So I will show a couple of dashboards that we we actually created for them So the first is like detecting Linux namespace and Previous changes so for example show me all the communities workloads that started with higher privileges or root access Or gained later in the life cycle or show me all the communities workloads that started with Colonel host namespace access or gained it later in the life cycle And then this can be for example one dashboard on the top you can actually see the pods which started with higher capabilities and Then basically in the middle you can actually see the JSON events in a table So you could see like the source namespace a source pod All the capabilities that it had and then on the bottom Basically, these are the pods which had access to for example Colonel host namespaces like network or PID that it shouldn't have The second is like fine integrity monitoring. So this is basically a dashboards like showing Sensitive files and directory access. So which binary path from the operation which application which namespace Did it have a root access which team was responsible for it? Another dashboard showing sensitive files binaries that pair from the operation all the JSON events in a dashboard and Then basically the last one is around tonight is data exfiltration So we could answer questions like which workloads send out the most egress traffic in the cluster is it even suspicious? If it's suspicious like which process initiated it which team which workload, what was even the destination? So with this dashboard we can actually track the top received bytes per pod and then the top Outbound talkers and then if we could investigate later We could find out that for example Ubuntu and nginx pod installed some libraries with APT And then that's why it received so many bytes or for example there was like a login agent sending out data to Splunk So these were like the second set of customers and then in Sure. Yeah, so then The next kind of piece we'll talk about some of the things we learned about writing ebpf security tools and some opinions or Principles as we call them here that we developed So so the first thing that we found out really early with it was very powerful from a tetragram side is to have Most of the state that your security tool needs in the kernel itself And so what this allows you to do is do really interesting mappings in the kernel such as map all the files back to their binaries and map all the binaries back to their Pod or their workload ID, which is what kind of a generic version of a pod This lets you do that all inside the kernel, which means you don't have to send data up to user space Which allows kind of CPU performance reasons. That's really great, but also for enforcement It means you can enforce start enforcing things in the kernel So by keeping the kind of a lot of the state that you need of your program inside the kernel in the ebpf You get a kind of a big win in this ability to kind of link different operations inside the kernel that would otherwise Not be you wouldn't be able to make these connections For example sockets are a good one files are if you want to know what binary is attached to what socket It's just is attached to what packet you can make that entire chain What what pod or label it has in a kubernetes contest if you can do that all in the kernel You're you're often running with a really good base platform to start doing interesting things The next thing that we did That became apparent pretty quick is if you try to filter all of the events that a kernel is generating in user space It's going to be very expensive You're gonna have to wake up user space every time you want to tell it about a socket or every time You want to tell it about a file read you're gonna end up waking up user space every syscall You're gonna have to wake up user space. So kind of building on that first first point where we have all the context inside the kernel in Bpf in tetra gone. We also added filtering in the kernel And so what this allows you to do is they actually I don't want to really care about every read or I don't care about every socket I only care about sockets that are outside of my cluster for example Maybe I trust this namespace any connections inside my namespace are fine I just want to know about connections outside of my namespace or Most of the reads inside my home directory I don't need to monitor because the application can read its own data But if it's going to try to write a file into the at-c directory Or maybe it's going to try to write a file in the user bin or slash temp Those are things that I want to alert on and so what you've done is taken the case where you have to Monitor everything push the filters into the kernel and now you're now you're looking for things that are that are that are Important that are rare or somehow significant and the value here is really really quite significant because now you've Removed that need to wake up user space all the time and start using lots of CPU So you can do this kind of in line in line in the kernel And then sort of as a technical Corollary here is there's actually a buffer between the kernel and user space and so now by pushing all of that data Into the kernel you relieve this extra interface that you now don't have to worry about drops on or overhead From so kind of a secondary wind to putting all that state in the kernel then you add filters Improves the system overall And then you can get kind of benchmarks like this which are really what we shoot for in tetra gone On the one on the left Your left. Yep We have the kernel build so this is basically a stress test and building a kernel on 16 cores 32 cores is actually a fairly good stress test because you're Executing the compiler over and over again for every file in the links kernel. You're doing some operation on it You're opening and closing files. There's a lot of actual work going on And you what you can see on the left is is no tetra gone running there with the 549 dot 271 At the baseline and then you can see it with tetra gone the overhead is really quite small less than 2% So that's really great And then if you add some additional json logging it goes up to two and a half percent So that that's if you want to export everything over json. So that's basically one percent of your CPU going to writing json files but But that's still a pretty good number very minimal overhead and then on the right side What we're showing is if you want to report on on certain files So we found some files that we we didn't think anyone should write to On a consistent regular basis you can think of again example Would be we don't expect our users to write into user s been nobody should be writing new executables into my bind into my pod I don't accept people to execute on slash temp That's usually some indication of something quite bizarre going on and what we show is that if you on the left side if you Do kind of those first two principles we were talking about put the state in the kernel put the filtering in the kernel The overhead is very minimal versus if you try to go to user space for everything You get a much larger bar on the right there. So these are the kinds of things we're looking at with Tetra on From observability and kind of security side the next the next point that we Kind of came to the conclusion is that you can just monitor syscalls But there's actually a lot more interesting stuff in the kernel than just syscalls You know the syscalls are just the top layer that the user interface is with if you want to know about the socket State if you want to know about the TCP state machine if you want to know what's going on your networking and in the OS side You really need to dig into the kernel and so tetra gone has the ability to hook almost any function in their kernel syscalls included but not limited to syscalls and you also get a really nice benefit from that is a Technical security reason we don't hook syscalls a lot of the times is the syscalls are using user data You have a pointer to user data and as a security tool. You just can't can't trust user data in our model so I'm that that leads into this slide really quickly so you'll be really tempted when you're building your tool to do a lot of These things. I know we were you probes are super awesome. I love you probes But as a security mechanism, they're just they're working over user data So if your security model is to not trust your users You don't trust the pod for example You really need to be careful when you're using you probes I mean for not familiar you probes let you hook user space basically instead of the kernel But remember that's that's user data the user owns the data that can work around your you probe They can change the data after you read it like I mentioned syscalls there and then the other one is the Kernel has many ways many ways to do the same thing So if you think about syscall hooking you can hook open open ads Every version of open that you can find in the syscall spec or you can hook FD install Which is the kernels version of creating a file descriptor and it's used once everywhere so This is just goes to the point that maybe hooking syscalls everywhere is not the best option Usually if you go into the kernel you can find kind of the root the root operator inside the kernel The next thing that I'll just mention that you get from this is by pushing all the station to the kernel Allowing you to hook outside of syscalls is you get a really interesting story around enforcement So rather than Reacting where you would say I observed an event sent the event up to user space my logic in user space Decides this is unacceptable and then enforces by stopping the pod or stopping the that action may have already happened It's kind of asynchronous to the system By pushing all of the state into the kernel getting below the syscalls in many cases when you need extra data You can enforce synchronously in line with the kernel and this avoids a race so you the kind of Most obvious case that somebody writes to a file Right, you don't want to enforce that after the fact after the file has been written to you Really want to win stop the right from happening same way with networking You don't want the network connect to happen and then some data to be exfiltrated and then sometime in the future You stop that from happening. You want to stop that before the data is even leaving the system, okay? So when we support a couple models around that we actually smart both models by the way There's definitely use cases for reactive security as well But the enforcement is really the strong suit here for for writing kind of Kubernetes native security policies So if we think about where we're heading in the future I mentioned these policies we we put a few in for the 1.0 release and They're on the web page the folks are on the team here are working on putting a few more up there That's just kind of the beginning. We have a lot of examples actually in the code itself in the examples directory But they're really cryptic I would say, you know, they're written by engineers This is an attempt to kind of level that up a little bit so that users kind of more of a user-facing policy library And what this would allow you to do is say if you want to do you want to monitor eBPF There's a there's a link to check how do I monitor eBPF and here's a file that I apply So it's to be very simple quick to use get some basic policies in place You see networking on outbound connections is at the bottom. I think I alluded to that one earlier Some things that other folks are doing not necessarily myself We've seen some folks use tetragon for S bombs The idea being Instead of using s trace you can use tetragon gives you a lot more information than just syscalls For example network connections all this kind of data you can roll that into an S bomb we've seen a few folks use tetragon to Protect the system that's building S bomb building the S bomb to make sure that the S bomb integrity is is intact a Couple of interesting use cases from that side that I hadn't anticipated Whoops, and we have a few dashboards, so there'll be more dashboards coming in the future until you show some of them I don't think they're I think they're on the web page, but probably not highlighted in the docs yet So in the code, but not in the docs And with that we kind of covered all this today and Here's how you can contribute and go to the github page Um Natalia mentioned some of like are some of the people that are using tetragon and their use cases a lot of those actually come from pull requests So if you even if you aren't going to commit code, but you have a use case and you think it's interesting create a pull request We do read them. It also allows other people to you know see them and say oh that I have that same use case too So we like that of course if you want to contribute docs and examples and code, you know, we would we would love it There's a bunch of first issue tags people can look at We've done a lot of work to get to 1.0, but there's just a ton more stuff to do that's super interesting So come find me if you want to chat come to the booth file go to the github page and file a PR and whatnot And with that, thank you Thank you very much, and we have time for a few questions a question, but I want to say big props for the tool that you built I worked talent here, and I was one of the first people that's rolled this out like hub live gs cillium Pretty interesting Pretty complex to it. I have a few questions in the future Would you consider having some sort of learning mode where you observe the behavior of applications and that way to construct automatically a policy for a Yeah, yeah, absolutely. It's come up a handful of times. It's you know, it's not in the one that I release It's definitely something that we've considered adding to the roadmap If we've got some other folks to help work on it, we would be you know, very enthusiastic But yeah, they know it makes a lot of sense because we use you see the right now It's mostly done manually by the users right they get a pipeline of the observability You have your security expert and read those and then they create the policy Definitely that could mostly maybe 90% of that could be automated, right? Or you can make a pretty cohesive policy just automatically from that. That's right What you have now is kind of more like a cluster-wide policy rather than a Okay The your SBOM was interesting. I have you guys considered also container scanning as part of your offering or I know a lot of Competitors are out there. So maybe it's not worth it So like I add it to the container Once it comes in you scan, you know make sure it's allowed to run, right? Yeah, so there's been some talk about this where You know kind of part of the pod deployment would also pull in the tetragon policy and they would come together So you do a kubectl apply you get the pod you get the policy and I think kind of goes out Again, I kind of love that in the same category as the automation. We got 1.0 out I think we have all the fundamentals to do these kinds of things and you know, it's a matter of getting the Get it on the roadmap and making it happen at this point All right, thank you. Yeah, cool. Thank you Thank you both of those insights are kind of right on track like where people are thinking about in terms of what to use Tetragon for Okay, it looks like we don't have other questions people are people want to go to lunch one time