 Okay, so let's get started. My name is Duffy Cooley. I'm the field CTO at iSovailant. I'm here to talk to you a little bit today about psyllium, grafana, and golden signals. I'm actually kind of curious about the term. Like if people understand what I mean by golden signals today, like just in troubleshooting and understanding like how your applications are behaving in those sorts of things. What do you consider to be a golden signal? What's that? Errors. Errors, right, yeah. There's like, you know, there are a couple of competing ones. There's like rate and use, and like there's a bunch of different ways to actually measure different types of applications or different types of capabilities that you're looking at. The first thing I'm gonna do is I'm gonna do a little introduction into psyllium and EBPF, and we're gonna jump into that, and then we'll talk a little bit about observability, and then I'm gonna talk, I'm gonna show you some of what we're doing with psyllium. So up here. Who here is already familiar with psyllium? The open source project. I love it. How many people are familiar with the term EBPF? Can't tell if that's actually more or less, but I think I figure it's about the same. So psyllium is an open source project. It's actually been donated to the CNCF. It is, I think we're gonna try and graduate this year. We'll see how it goes, but we're seeing a lot of really great adoption. If you are a company or a representative company that is using psyllium, and you would like to participate in that, if you go to the psyllium repo, github.com slash psyllium psyllium, you'll see an adopters page. Feel free to put in a pull request saying that you have adopted psyllium within your infrastructure. That'd be a tremendous help for us. I work for iSyllium at the company behind it, and primarily we focus on EBPF and technologies around it. psyllium is implemented in EBPF, and to kind of give you just kind of a sort of 10,000 foot view of what EBPF is, if you haven't already kind of thought about it in this way, EBPF basically makes the Linux kernel programmable. And a good way to kind of conceptualize that in your head is that you can think of the Linux kernel as having a very large API surface, right? And those API calls can be system calls, they can be socket, opening a new socket to another external endpoint. All of those things could be considered API calls. With EBPF, we can sit on the other side of that API call and intercept those, and determine like what, we can pull down information about like what that call is happening. If you said file open, what was the file? What file system was that file part of? If you had said socket open, what was the destination address? Like where is the traffic headed to? We have a lot more context about what's actually happening because we're running right there in the Linux kernel. Right after that call has happened, we get notified that an event has happened, and then we can determine like what we wanna do with that information. For the most part, people are using EBPF to do things like observability, being able to, you know, instrumenting function calls or system calls in the Linux kernel, and then persisting that information down into user space so that you can see like what's actually happening when your application is doing stuff against Linux kernel. But as this diagram describes, there's actually a ton of different places and parts inside the kernel that we can hook into. Right? You can hook into any system call that has access to the file system. We can hook into network interface cards with XDP. We can hook into basically just a regular socket call and all those sorts of things. And when we hook into those things, we have the choice of basically just reporting what we found or, you know, providing context about what's actually happening in the Linux kernel or we can actually also make changes, which is one of the things that we do with Cilium. So with Cilium, we're a CNI primarily, right? So if you're using Kubernetes as your container orchestration system, you can use Cilium as the CNI piece of it. Now, a CNI is a container networking implementation and its goal is to basically provide networking for all of the pods and all of the different workloads that you deploy into your Kubernetes environment. Cilium does a lot more than that since we're actually, again, instrumenting inside of the Linux kernel using EVPF. We can do things like replace kube proxy. And we can do that by basically writing an EVPF program for every workload that you have on every node and determining exactly like what the world should look like from the networking perspective and when we see a new network call come in and we see that socket call come in, right? This application is trying to make a connection to a Kubernetes service. Just like you would in IP tables, we can actually intercept that call. We can say, hey, this is going to a service. I understand the healthy endpoints behind that service. I can actually do NAT right there in an EVPF program running natively in the Linux kernel and determine what healthy endpoint I want to route that traffic to, manipulate the packet before it actually proceeds. And then when that packet comes down into the routing layer on the underlying host, the NAT has already happened, right? So we're not even, we're not using IP tables for NAT. We're not using IPvS for NAT. We're able to do NAT directly in EVPF pretty well. And of course this also gives us kind of unprecedented visibility, right? We're able to persist down into user space, effectively that event stream that describes what applications are connecting to what other applications and how and whether that was allowed or denied and whether there was a network policy decision about it or whether there was not. So there's tons and tons of stuff that we're doing with Cilium to kind of solve these problems. We are, again, kind of, if anywhere you can run Kubernetes, you can run Cilium as your CNI. We have great integrations with different cloud providers including Google Cloud. Their new data plane V2 is Cilium based. We're working on a great partnership with Azure where they're actually allowing for new AKS clusters to be created today. Like if you were to create an AKS cluster right now, you would have the option of using Cilium as the data plane. We have a similar thing with AWS. If you're using AWS's EKS anywhere, we are the underlying CNI choice for that. So it's pretty widely adopted. And these are kind of the areas that we've been focusing the product on in general. So we've been, you've heard me talk a lot about the networking piece. There's a lot more to talk about in the networking piece. I'm not here to talk a lot about that today. But some of the pieces that I do wanna share with you are what we're doing with like Hubble observability and specifically what we're doing with service mesh. One of the things that we've been doing forever, I mean, if you've ever used Cilium and you've ever explored Hubble, you realize that we've kind of provided you insight into what's actually happening at the network layer for quite a long time. And we expose that insight in the form of metrics or in the form of the capability that lets you see what's actually happening at the application layer. But what we haven't done is we haven't actually exposed all of those metrics and all of those things in a way that a service mesh does. For the most part, when people think of golden signals, they think of, I want to be able to instrument my application in such a way that I can understand for the last, much traffic that went through from this application, how many of the HTTP requests from this application to some dependent service returned to 200? How many returned 503? How many returned like particular HTTP status codes? Or if we move it to like the TCP state, I wanna know like how many connections are going through, what's the rate or the utilization of that link? Are there any errors associated with that link? These are the golden metrics that we've been talking about and we've been surfacing that information for quite a long time, but we haven't actually presented it in the way that a service mesh does. What we're doing now and what we have shipped as part of 1.12 is the ability to actually see those same metrics in that way. So it makes it really great for solving the observability use cases or the challenges that we see in just by leveraging Cilium, CNI, and Hubble Enterprise, or Hubble, I should say, TetraGon, Cilium and TetraGon together can actually give you all of that data. So again, this is just sort of a way of describing like where some of the other existing mechanisms are sort of falling short when faced or confronted with this cloud native environment. A lot of the kind of standard existing solutions kind of focus on like a five tuple sort of use case where you have the source address, the source MAC address, the destination address, the destination MAC address, and like the direction of the traffic or like what the protocol of that traffic is. A lot of those things are, I mean, those things are useful when you think about like troubleshooting a specific link between two points, but when you're trying to actually understand this, what's actually happening in this age of distributed systems, you need a little more context than that, right? You need to understand it with this pod at this time or this workload, talking to this external FQDN. This was the actual connection that was being made and I want to have a lot more information about what's actually, whether that was allowed or denied and a lot more context about what's actually happening. And that's primarily what we've been focusing on, being able to really kind of make everything identity based, being able to give you a really good view into what's actually being presented or what's actually happening at that network layer in a way that you can easily understand and use to troubleshoot or understand what's happening there. So you've heard me mention Hubble. Hubble is a suite of tools. Hubble is a CLI that lets you actually investigate what's actually happening at the network layer. An analog Hubble is basically the ability to understand what, you know, people think frequently when they see Hubble, they think, oh, this is like TCB dump, right? Because I'm able to see the source workload and I'm able to see traffic going from that source workload to a destination, maybe to the DNS server or maybe to another workload. This is kind of like TCB dump. What's amazing though is that this is not a view that looks at two particular points. This is a view showing you like the context of what's happening on the network across the entire cluster, across all of the workloads and you're filtering all of that information down to only those things that you actually are interested in troubleshooting at this time. But you have effectively TCB dump across all workloads, across all nodes, across all traffic, moving back and forth between any workloads within your community cluster. And if you were to do something like cluster mesh where you can actually join multiple clusters together, then you could expand that domain to understand all connectivity between all of the clusters. It's kind of wild. So it's TCB dump, but you know, galaxy brain TCB dump, which is pretty cool. And what we're doing, and what we're working, and this is the UI component of it. So instead of actually treating it like effectively a searchable event stream of what's actually happening on the network layer, we can also give you visualization tools, right? Describe what ports are being, what workloads are dependent on other workloads, what traffic is moving back and forth between those things, whether that traffic was allowed or denied, what the identity of that traffic is, all of that information is available to you in the UI. So moving on, like this is again kind of like how we think of the world, right? This is how we fit into that puzzle. So when a connect call, an application, or a workload within your Kubernetes environment, or even on an external VM makes a connect call, we're able to actually start immediately and determine a number of really interesting pieces of information, right? We can say this connect call has happened, do we want to allow or deny that traffic, even before it becomes a packet, right? Even if this isn't about packets per second, this is like when we see that socket call happen, do we want to allow that traffic to egress that workload before it even hits the wire? We can make that decision, that network policy decision, before it even, before it proceeds from the socket call perspective. We can also make a decision about whether, you know, this traffic is actually a kind of call from one application is going over local hosts to another application in another container and that was in the same pod. We could do socket layer load balancing and allow for very fast path connections between two applications within the same. So you wouldn't have to actually traverse the TCP stack across local hosts to do it, right? We would allow the connection to become established and we could actually just shortcut that traffic between those two processes. Again, because we're operating at the kernel layer, pretty amazing stuff. And then finally, the observability piece can actually be extended beyond what's happened.