 All right, so we are back. This time it's with little bite-sized pieces, so we're gonna be having a bunch of lightning talks. So next up, we have Zayn Asghar, who is going to be talking about Pixie. Hi, everyone. Today I'll talk about data expiltration on the Edge with Pixie. So a little bit about me, I'm Zayn, and the general manager and VP at New Relic working on the Pixie and OpenEcosystem teams. I was originally the co-founder and CEO of Pixie Labs, which was acquired by New Relic. And I'm also an Adrum Professor at Computer Science at Stanford. So before I get started, a little bit of a disclaimer. I know I'm in an audience full of security folks, and I'm not a security expert, so there are probably a lot of holes in this, but I just wanted to put that out there. And the contents of this talk are not meant to be used in production. Our goal is to demonstrate some ideas and sorts of discussions, not to try to push this towards a production use case. So what is the data expiltration risk? So we think that data expiltration is a huge risk, and that means that you're leaking information outside of your cluster. So for example, you might be sending credit cards, social security numbers, phone numbers, and other identifiable information outside of your Kubernetes cluster, or even within your services, where it should not be happening. And ultimately, this can come back and cost in terms of, probably in terms of money, because of all the data loss and potential customer trust loss that you'll have. So the question right here is, when did it be great if sensitive data leaving your cluster could be found in a transparent way? And we say that it also has sort of observability, mostly because Pixi is originally a performance observability tool, and we're trying to extend it to do some more use cases. So what is Pixi? Probably not a lot of folks in this community are familiar with it, but we started out with our goal of performance debugging without manual instrumentation. So we do all the basic stuff, like CPU, memory, network, grab all message fans and latencies, and you're probably seeing where this is going now, and things like performance profiles. But the three characteristics of Pixi, which I think make it very useful to do things in the security space. So the first one of them is this concept of zero instrumentation. There's another session going on next year on EVPF, but Pixi basically leverages EVPF to allow you to monitor your applications without doing any manual instrumentation. And this level of instrumentation is pretty deep, and not only allows you to capture things like, here's the HTTP message data, but it can tell you what commands are getting executed, what's actually contained in those message bodies, and it can do all of this without having to change your application, so you know that the observability always exists. The other characteristic that we think is useful is that we have a distributed architecture, which means that we can take a look at a lot of data. Since you can deploy this on every single node, you don't have to worry about having bottlenecks for looking at what's actually inspecting the data. And the third thing is this concept around a scriptable interface. You can actually write scripts, which can look for data loss. So very quickly, I'll talk about very high level diagram of the Pixi architecture. So at the highest layer, you have our APIs, UIs, and CLIs. There's a cloud system to help orchestrate all of this, but most of the heavy lifting happens down here with the collector and aggregator and the actual Kubernetes nodes. We deploy this thing called the Pixi edge module on every single node. There's a data collector based on EVPF, that collects information across all your pods running on Kubernetes. And then we store this information in like a ring buffer so we can query it later and run analysis on it. And everything inside of Pixi is scriptable with a language designed to do data analysis and machine learning. So it's basically valid Python, valid pandas, and you can essentially operate on data frames. Not gonna go into too much detail about this, there's more information available on our website or GitHub repo. So quick note on how can observability catch data leaks. So the first thing we're gonna do is use Pixi to trace all the traffic on your Kubernetes cluster. We can actually do this for both encrypted and unencrypted traffic. Board information about that on our documentation. The second thing we're gonna do is run a script to find messages that have sensitive information. For example, credit card numbers, social security numbers, things like email addresses. And then we're gonna filter the traffic to things that are egressing your Kubernetes cluster. And lastly, we're gonna look at the egress of this sensitive data to see if it's actually legitimate. So here's the demo scenario. And we're almost done with the talk. So we have a Kubernetes cluster running a legitimate pod that's making SSL requests to the Stripe API. Then we have two malicious pods, one of those making HTTP requests to the Post-Test server and another malicious pod that's making HTTPS requests. And we're gonna basically try to see if you can find that in our Kubernetes cluster. So with that, I'll switch over to a demo. I don't know why I hit refresh there, but anyways. So I have my demo cluster pulled up. So Pixi takes about five minutes install and since we don't require you to change any code, you'll immediately be able to see all the data. If you go down here, we have a list of our namespaces and I'm running this data exfiltration demo over here. You can see that there's some legitimate Stripe egress going to some IP. And then there is some malicious egress pods talking to some other IP. But this isn't entirely that useful. It's just telling us there's some communication happening. I think I mentioned earlier that Pixi works in this concept of scripts. So we have a script over here called egress. So I can look for PII egress. This is a beta script. So if I run this script, we're seeing a ton of HTTP traffic going to various IPs. We can turn on some better DNS resolution because Pixi traces DNS traffic. And with that, we should be able to see that there's some HTTP traffic flowing to Stripe. And then this yellow bar means there's mixed traffic between HTTP and HTTPS flowing to PTSD2. If I go over here, we should actually be able to see an example request. So an example request is leaking the name, credit card information, and phone number. If everyone's freaking out with the data that we're looking at, we do have capabilities to obfuscate that so you can't see it in the UI. But essentially, we found the request and we can trace it regardless of, like I said, oops, sorry, I meant to go down here. You can go over here and see what type of protocol it is and get some more, oops, sorry, misclick. Get some more details about what's happening. That's all I had for the demo. One of the things I wanna say is with EBPF, this is really only the beginning. There's lots of other things you can do and other capabilities built in. So one of the things like monitoring file accesses, system calls, process execution, look at what files are being accessed and which information is leaking and all of that can be built in at the BPF layer. And with that, that's all I have. And please check out our website and get help for more information. Testing, all right, there we go. The question is what is the performance overhead of this? Yeah, so that really depends on what level of things you're scanning. The performance overhead of actually collecting the data, our target in Pixie is under 5%. Since we're mostly capturing the stuff at the kernel level, it's very efficient. But yeah, our target's under 5%, we usually try to keep it under two to three. Any other questions? Okay. Hello. So with all this sensitive data, are we just sending that all into the cloud or can you host it on-prem, kind of keep it in-house as well? So two things, one is you can host everything on-prem, so everything is available, both the cloud side and the stuff that's deployed. Jumping back to this, we actually don't send any data to the cloud, even when you're running scripts, all the data is being run within your cluster anyways. And if you do view stuff in the UI, it's end-to-end encrypted, so the cloud can't actually see the data. All right, thank you.