 Hello everybody. Thanks so much for joining us today to talk a little bit about relational observability for cloud native security and data science. My name is Terrell and my colleague Fred who called the demo guide. He's going to do some demonstrations at the latter half of the talk. And again we're going to talk about relational observability. So a little bit about us. We're actually both from IBM research. We joined about six and a half years ago around the same time with a passion for security. We are open source maintainers and contributors of two projects. One that we're going to talk about today which is the sysflow telemetry project. We're also contributors and maintainers of certain projects for Falco security. So we're also present on both of these communities have Slack channels. So if anybody's ever interested come say hello. We're happy to talk and collaborate with different folks on those avenues. We're again both security researchers. We do a lot of work in cloud-based security focused on visibility also in systems and software security. So we've actually done kind of met up doing some cyber deception research. So that's a passion of both of ours. And we're a lot of with this project around security engineering and data science. So let's talk a little bit about runtime observability. What we mean by runtime observability is really sort of being able to get visibility into the workloads in our systems, in our clouds and being able to identify what's going on and what they're doing. And so we can do this in various ways. The way that we do it here what we're talking about is this idea of monitoring how a program interacts with its system through the system call layer. So we can monitor the various system calls and this can tell us how a process interacts with the file system, how it interacts with the network and how it interacts with other processes. Now the challenge of recording system calls is we typically do that between the user space and the kernel space. And this can generate a lot of data monitoring all of these calls. And so it makes it very difficult to be able to do any analytics on it. And so you get a lot of projects out there that do a lot of rule-based stuff on this type of data and it can be impossible to store and analyze. So the goal of the Cisco project is to enable data science on top of this type of telemetry. And we do this by focusing on data abstractions. So essentially taking all of this chatty data and lifting it up into behavioral telemetry stacks that allow us to do more analytics and store more state the higher we lift it. And so through this we wanted to create an open stack for system security and data science through that. And so really looking at this picture here we talk about this idea of multi-level data abstraction. We start with the system calls at the sort of I guess the left part of your screen. And we can think about it as trying to doing sort of this type of work as trying to put a great through a straw, as a colleague often says. And so what we're trying to do is you can either increase the straw size or you can shrink the grapefruit. And we're trying to shrink the grapefruit. And we do this with different data abstractions. One of them, which is called sysflow, is the idea of basically creating an object relational format that allows you to show the behaviors of how processes are interacting with the system. Sysflow, as we'll go into a little more detail in later slides, is still a very sort of stateless type format. But we can build upon that with these behavioral graphlets, which allows us to store more state information and do a lot of different analytics which we'll talk about later on. Some of these analytics are, for example, doing things like TTP tagging or doing machine learning based analytics through our framework. And we can essentially pass things out through the end of the straw as raw telemetry or alerts or with a lower frequency. So just to start off with talking about the actual sysflow pipeline itself, it really is made up of all of this that we're going to talk about today, pretty much all in open source. And you can see it by going to sysflow.io. But it really comes out with, I think we've got about five projects right now. And it starts from the left side of the screen, which is an agent that you can deploy on your end hosts. And what it does is it sort of hooks passively into your kernel and it can record all of the system calls that your processes are making into the container and then pass that up into user space. For this, we actually use the Falco Libs library, and that's how our sort of connection to Falco is. But that passes up the very granular system calls, which we then have what we call the sysflow collector, which is a project that transforms that data into sysflow. From there, we can do all sorts of different things with it. We can pass it out raw to an S3 in chunks, or we can pass it into another component of the framework, which is called the sysflow processor. And the idea behind the sysflow processor is that it's really sort of like an edge analytics pipeline, and it's designed for people to be able to make sort of multi-threaded and chain up custom plugins that they can plug right into the processor. And each plugin can do a particular analytic or an aggregation and then pass it on to the next plugin. The processor comes with a set of built-in plugins, one of which is actually a policy engine, which can take the streaming sysflow and essentially apply Falco-based, we utilize the Falco-based language to do sysflow enrichment or alerts and send those out. We also have a graph engine, which we're going to talk about a little bit later on, which allows you to create those graphlets, behavioral graphlets that we talked about. And then also you can create sort of your own analytic plugin. The processor is built in Go for performance, but you can do any sort of machine learning analytic that you want to do on the edge. If you don't want to do things on the edge, you can also pass them out and export them. And we have different supports for S3, Elastic. Right now we're working on something to be able to store these graphs inside of something like Druid. There's also a set of processing APIs and SDKs that allow you to process sysflow and build graphlets in Python, Go, cc++. Data is exported in Avro, which allows you to essentially whatever is your favorite languages, if you like Rust or whatever, you can build bindings. We have the AVDL for all of that and do it in whatever language you want. We also have an analytics thing to be able to show some of the different things we can do. Fred's going to show through a Jupyter web page in a few slides later in the talk. So first we'll talk a little bit about the sysflow data format. So if anybody who's familiar with NetFlow, there's a duality there. So the idea behind NetFlow is that it's insanely difficult to collect packets and store packets and do analysis on packets for very large networks just because of the amount of data that's generated. So there's this idea of NetFlow which is able to take those packets and create sessions, session summaries of those packets into something that's much smaller that can allow people to do more analytics. So that duality is like with the system calls are essentially like packets in a way where we're monitoring them and we can essentially summarize those and getting semantic compression into sysflow. And so basically sysflow processes, control flows, file interactions and network communications and can link those to the processes but also to the containers and to the pods inside of Kubernetes that you do. So you can create these really nice graphic things for analysis. We'll show some of those a little bit later on. So this is the idea of an example of sysflow. Sysflow is really made up of two types of objects. One are entities, so things like processes, containers and files. And entities in this picture would be green. And then you have two other things. One are events or flows which are here shown in purple. So in this particular case we have let's say a process 1882 or 1822 that is going to clone another process 1823. And so that would create a process event. And then essentially would clone a process event. And then we have sort of an exec if let's say the process changes from bash to I don't know LS or something like that, you'd have the exec. We also have this idea of flows. And so flows are essentially aggregations or summaries of multiple system calls together to be able to sort of convey a behavior like an interaction on the network or reading a writing to a file. And so what they can do is they essentially summarize a whole bunch of system calls and can record how much data was sent or how many operations were part of a part of the connection. So in this particular case, process 1823 is interacting with the network. And so it creates a network flow to a particular endpoint. It may interact with a file. Maybe this particular process is vulnerable and has been exploited. So it's going to drop an ex fill.py file that's going to be written. Later on it may be executed and we can sort of see that chain down to the execution. And then maybe it's going to set up, you know, some sort of, you know, communication or command and control thing. So let's give you sort of an idea of kind of that graph like function that we'll take a look at later on. This gives sort of another example. Actually using we have a bunch of a set of tools that we can use on sysflow. One of them is called sysprint, which basically takes the sysflow file and it is able to put it out in this sort of tabular form or in JSON. And so the idea here is we have like a little example, let's say a server communicating with a client. So we get a whole bunch of information about all of the processes that are associated with monitoring the system. So we get the, for example, the PID, process ID, the thread ID. Each line represents a flow or an event. Events will have, and each event and flow has an operation flag, which is like I guess the fourth column. And that describes what the event is or flow is doing. It's actually a bitmap. And so for an event, that bitmap will have one bit set to indicate the operation that it's doing. So for example, in the first line there, you have the exec, which indicates that the exec, the server process was exact. And it has a particular time because the event is a point in time operation. And then we have, you know, some other things here, some file flows, the next two I think are file flows. And so they have multiple operations together. And so the O and the C there, for example, for opening and closing the LD, LD.OS.cash file. And there's a time range for the start time and end time for that as well. And, you know, closer to the right there, you can see the container ID and various summary statistics about that particular flow. And if we move down to the client, so we've got the client as well. And here is the sort of the expansion of a network flow for the client. The C, W, R, and T indicates that the connection was made, the C, there was a read and a write on that particular network flow. And then the client shut down before it was closed. So we have a T for a truncate. But this gives all of the different attributes that we have, source IP, destination IP, and the summary information. Now CISLO is an object relational language, or sorry, a format. So it actually point references to various different things using an object ID. So the flow here points to a process. The process object has its own ID. And then it has a set of attributes that are around it. And then it also has a container that it's associated with. And so this is one of the ways that we can further reduce the size of the record. So this is just the format. Again, it's an object relational format. You can see the entities, we have the file entity, the process entity, and the container entity. And then essentially we have different types of flows and events. So we have the process event, the process flow, which stores information about the number of threads created or exited over a time period. The file flow, which again will show you the number of files or the files that have been read and written to by the process and it will point to a file. The file event is something that is a point in time thing. So something like directory has been removed, a file has been removed, those types of things. And then we have a network flow and a network event. More recently we've actually added to the right of the screen there pods and Kubernetes events. That's something that just came out in the most recent release of the project. So just to give an idea of a little bit about the reduction in data that we can get from CISLO versus the system calls. In a lot of cases we can get over an order of magnitude reduction. And it really depends on the process that we use, whether it's databases actually really get really good compression and then there's certain ones that like web servers where it's less. But overall we get over an order of magnitude reduction. So there we talked about CISLO. Now CISLO again, it does reduce the data but it's still very stateless. And so we actually just released this into open source, this idea of a behavioral graphlet. And so the idea behind a behavioral graphlet is your typical graphlet for recording processes would essentially create a new node for every instance of a process that is being created. Here what we do is we actually coalesce those instances into a single particular behavior and create that as a node. So for example, in our picture here, those that are familiar with an Apache web server, an Apache web server typically has a root process. That root process will create a set of worker nodes that you can see there on the far right in the second level. And so what we can do is these graphlets can sort of recognize that behavioral pattern of those creations and coalesce those into one single node with the underlying instances recorded within that node. And the same goes for the file flows and the network flows that are associated with that worker process. And can give those summarizations. And Fred's actually going to show this in the demo so you get a really good idea of how these work in an actual use case scenario. So one of the use cases that we have for the behavioral graphlets, we'll just quickly go through this, is the idea of doing rate limiting of system events, using rate modulation. So we do this, for example, for SIMs that can charge a lot for every event that they collect if they do on a particular thing. And also to reduce this idea of event fatigue or alert fatigue that's coming at the user. So the idea here is that we have essentially the incoming system that comes in, we can generate those behavioral graphlets and use that state information to limit the amount of repetition that we send out for each process tree. So we have a set sketch, instead of set sketch filters, which allow us to accumulate those repetitive events and then release them out at specific times in summary. And then we can forward new and unique events out as we see them, utilizing rate limiting bucket. And this can actually give staggering results in terms of event reduction. So here we have an example of a 12 node, worker node, k8 cluster. And we can see on the left hand side there that with sysflow, we're generating, within the 100 minutes, we're generating 4 million events. But when we use the rate modulation on the far right, we actually only forward 27,814 unique events. And the other 4,100,000, we can sort of accumulate together and merge and summarize. And so we can reduce that down to 44,000 events. And so we don't actually drop any events, but we're accumulating the ones that we're seeing, you know, heavily. And this allows us to drastically reduce the number of events that we're forwarding. And then of course, the number of alerts that we send out. Overall. So I will pass it over to my colleague, Fred, who will go over another use case and also talk, show a demo. Hey, everybody. Can you hear me? Yeah. Cool. So Tara so far presented all these data abstractions that we have built to sort of curtail the state explosion, even fatigue, and also help us understand what's going on in those endpoints. And one question that we have asked during our research when we were working as part of this DARPA project, for example, was if we could use those abstractions to essentially automate the understanding of the different behaviors that we have, for example, in a large Kubernetes cluster, right? So that's basically the use case. And then we thought about maybe, okay, since we now have those graphs that are very semantic in very contextual, can we apply techniques borrowed from graph representation learning to essentially try to see if we can actually identify salient behaviors or perhaps anomalous behaviors that are happening in the runtime of a Kubernetes cluster. So in doing that, for those who are familiar with the process of graph representation learning and more generally machine learning, those algorithms for learning, they usually don't take a graph, right? They have to sort of embed those graphs in some sort of vector format. And in doing that, you are essentially going through a process of overworking data from a very rich and abstract graph form into some sort of linear sort of vector of attributes or features. And that's kind of this low dimension reduction that you have to perform. And most of the work that has been done around graph representation learning usually takes topology into consideration. So they try to sort of vectorize the graph and sort of capture the topological aspects of that graph. But it turns out that our system telemetry has lots of semantics that actually carry out, lots of metadata that is carried out within the nodes themselves. As you saw before, those nodes that capture things like user information, command line information, they have things like the number of bytes and operations that have been, that have happened in that session or that file that has been opened and things like that. And that's actually lost if you use an approach that only captures topology. So we research a few different approaches. And one of the good candidates in our research that we have tested is this graph-ropper sort of algorithm that is able to basically use a modified kernel for a short path that not only captures topology information when it vectorizes the graphs, but also supports this idea of node attributing embedding. So you can take things like the number of bytes, as you see here in the second line, the number, the port numbers that are associated with network flows. You can take into consideration the return value of processes that's actually very indicative of problems. For example, when we have errors as return values. So those are also captured, the operational flags, also the open flags that go into interacting with files. And very importantly, the command lines, right, when you have programs running that also it's very indicative of behaviors. So our approach for embedding takes those two things, both topology and node attributes. And then once you have those vectors, each graph basically gets embedded into a vector of different attributes or features. You basically want to perhaps apply clustering to those, to that vector space. And you can have different clustering algorithms that you can apply. In our case, we use some unsupervised clustering algorithm that doesn't need knowledge a priori of the number of classes, something like DB scan. And what you have is essentially each dot in this low dimension of space. Now it's representing one of those graphs, those system graphs that we have built. And what you see here for example in step four, what you would expect is that those graphs, because they're encoding some behaviors, would like similar behaviors to cluster together and different behaviors to be for a part. If the embedding is good and representative enough. So in our experimental setup, what I'm going to be showing here, we considered one Kubernetes cluster that has, you know, 12 nodes and it had like a heavy workload. That's what actually part of a red teaming exercise that we did in the company where we had an external red team that came and did pen testing against this web server that had production software in there. And it also had automated regression test suite. So you actually have actual meaningful data going on in this cluster. And that was a day of data. So it's about 3.8 gigabytes of sysflow data that was converted here in the right hand side into 3,500 graphs, which on average each of those graphs, and that's the beauty of this kind of behavioral coalescing, counter flowing data flow coalescing that we do. They had about for a large Kubernetes cluster as I'm saying about 546 nodes on average and 628 edges on average. So they're a pretty small graph for the type of behaviors that we are actually monitoring. And we apply, you know, a principal component analysis to basically this embedding that we created. And we noticed that with just two components, we were able to actually capture and retain a lot of information. So we went with that and we performed like a clustering of those graphs. And the interesting thing is that for this Kubernetes cluster, we recognized like three big clusters. I didn't mention that, but those, each of those graphs there are notated with the TTP taggings that comes from sysflow. So sysflow has this ability to tag individual behaviors with TTPs. So we are able to actually sample those graphs or those dots that corresponds to graphs and then associate them with low severity or high severity TTPs. And what you notice here on the top right side of this graph is actually a cluster of the high severity sort of graph behaviors that were observed with corresponds to the actual pentesting attacks that we were observing. So we actually sample some of those graphs. And we observed that X have corresponded to the attacks that our penetration testing were basically executing in this environment. And then when you sample a little bit on the bottom left, those corresponded to things like the infrastructure services that we had running that Kubernetes service. Things like the Qubelet and the different services that Kubernetes provides. So in general, what we observed is that this is useful to try to understand very large environments when you want to understand where the salient behaviors are, where things are closing together. This provides good insights. So this is one application of machine learning on top of that abstraction. I also have another actually a live demo where we're going to show you how those graphs are constructed and the kind of sort of contextual profiling that you can perform with those graphs. I put here the link and a keyword code for the demo that is, you can actually access in Google Collab, which is basically an open source platform where we have our basically this notebook hosted. So I'm going to go over here. I'm running it using this React.js sort of powerpoint, sorry, presentation, but it's all running in Jupyter. So the quick summary of this demo, first I'm going to do a very quick overview of, you know, some of the CLI tools for sysflow. And then we're going to go into actually doing this provenance tracking using those graphs with sysflow. So as Sarah mentioned, sysflow, the sysflow collector creates this basically data abstraction, this relational data abstraction. And at the core of the sysflow collector, there is this open source library that we have released called lib sysflow. So it also allows you to create your own sysflow collector. If you don't want to use our open source sysflow collector, you can actually integrate, you know, a sysflow consumer directly to your own, you know, open source projects, for example, using our lib sysflow library. And the interesting thing about this library is that it's actually very easy to use. So we actually put a lot of effort in actually making the APIs, I guess, consumable. It's in C++, of course, that layer is only C++. But essentially what you need to do is instantiate, you know, the sysflow consumer and the driver and then, you know, all run. And there is a number of configurations that you can pass, but in general, the default options work for a vanilla system. But there is, like I said, there is a number of different parameters that you can set for custom systems that use different custom run times for the containers, if you're collecting Kubernetes information, for example, and things like that. So back here, I already talked about the specification. And there is a CLI, right, basically command language interpreter that you can use to read sysflow if you want to debug or if you wanted to see what's going on in your data. So essentially, you can, for example, type sysprint is the name of the CLI. And here is a sysflow trace that we have previously captured. And you can see here, for example, you know, the different sort of information that you have. The T here corresponds to the type, so it's a process event, a file flow, again, a network flow and so on. This is the command of the name of the program server client. There are no arguments passed to this program in this case and so on, right? And if you want to know all the attributes that are actually captured in sysflow, like a quick way of finding that out is to call sysprint minus L. And this will actually give you, you know, essentially a list of all the different attributes that, you know, you can actually read from a sysflow trace. So there are, you know, process attributes as you can see here, including user ID, command line, you know, if it is interactive, TTY or not. And the environment variables, which is very useful, we're going to see in the demo, also file information, attributes and metadata, network metadata, and so on, right? Kubernetes metadata and so on. So back here, like, how do we build those provenance graphs that we are talking about here, right? So one thing before we start, we talk about that, sysflow like I mentioned before, we're going to report TTP tagging, so basically enrichment or decoration of those sysflow records and graphs. So here is basically an example rule, right? So declarative rule, where essentially you give it a name and then you have a condition, that's the main part of the rule. And the condition is a logical condition that's going to match and if it is matched, then you get this basically tag that is added to that node or that record. And the first example here is a shell shock attack against the batch web server that is in CGI mode using a vulnerable batch. For those who remember, shell shock was a very nefarious vulnerability because it allowed attackers to actually gain remote control against web servers that operate in CGI, the common gateway interface. And all you need to do really to exploit shell shock was to craft HTTP requests that injected a little payload like this with an empty function and then followed by the command that you want to execute against a remote server by basically setting that HTTP user agent. So very easy to actually pull off and exploit. And here we have basically done a sysflow, basically a shell shock attack against a server that was monitored by sysflow. And just to show you how simple is to use our APIs, here we have essentially all you need to do is basically call this graphlet construct this graphlet. Here we're using our Python APIs. So you pass into this graphlet the data. We can also read directly from database, but here we're passing the data trace. And we also pass in the definitions of a EMO file that contains the TTP rules that I was showing that I was just talking about. So once you construct this graph, then you can call into view passing different attributes here and you get this basically graph object here. So let me reduce here. Let me see if I can do that. So again, what Terra was showing before, you have the patch web server that is operated in basically CGI mode. Here you have the exact that happens from the vulnerable CGI mode. So essentially the attacker was able to inject an HP variable and do this process into calling an interactive shell. And you see here that you get the labels like the TTP labels for this kind of behavior for control for hijacking. You know, these nodes encode lots of data, lots of information, like such as the user, the user ID, the group ID, TTY and things like that. And eventually what you can see by following this graph is that the attacker through that interactive or reverse shell calls a cat into et cetera password which contains basically an enumeration of users. So you can actually see the file flow in which tells that et cetera password was actually accessed. All right. So this is one thing. But we can do more, right? This is one interesting way of understanding quickly understanding what's going on in that system. But one interesting thing is that I mentioned, right, this will also capture the environment variables. So we can actually look at one of the nodes, the nodes where there is that exact that calls into the bash. And we can just expose two attributes of, you know, that node, the exactable name and the environment variables. And what you see here is that here is actually the CGI binary that is being injected with that malicious payload. And what you see here is the exact payload that the attacker used. So you see here the user agent. So there's that empty function. And there is actually the beam bash minus i that is actually being piped into the TCP socket for the remote server that the attacker controls. So that's essentially the simple payload that is injected. And this will give the visibility all the way to that payload. So other things that you can do, of course, you can traverse the graph traversal. So you can actually enumerate Q chains by just traversing the graph. So if you do like Apache.TTP, sorry, that Apache is the name of the graph that I just constructed and calling to TTPs, then you can actually have a partial ordering, like an ordering of the different TTPs that happen, right? From the abuse of the native API, the exact API, to hijacking of the execution to the account discovery. So you start to have actually adding semantics of those behaviors by connecting to MITRE which basically has this ontology of different known tactics, techniques and procedures. So this is useful as well. So without any analysis, I can just build the graph from any data, like just flow data, and then calling to the TTPs and you have an enumeration of, you know, potentially malicious behaviors. You can recover the data from those TTPs as well, like I'm showing here. So those are the specific records that are associated with those tags. So that's very useful when you're like Leo in the haystack sort of searches. You can also, given a graph that has been annotated with TTPs, you can call into the mitigations function. We'd actually connect to the MITRE via CTI and actually give you potential mitigations that you can deploy against that particular set of TTPs, right? And you can expand those by calling associated mitigations. So it's going to actually map what type of mitigations it could be deploying in your environment to mitigate those different types of attacks. And this is again all borrowing from this infrastructure, this knowledge base that you have from MITRE. You can also connect to MITRE Defend which, you know, gives you other types of mitigations. And now talking about a use case, like a larger use case against a Kubernetes cluster, I'll try to go quickly through this use case, but essentially it's a supply chain attack that where, you know, this user was duped into basically install some piece of malware. This client server, sorry, this client has access to a production server where you have your Kubernetes cluster and you have applications running, and essentially it basically dupes this client into installing a package.jzone that is my issue. So it gets pushed into that server, gets deployed via CICD, and you have this flow monitoring the server. So we're going to be analyzing the behaviors that happens in this server and that actually this attack lens to, you know, an exsiltration to a Twitter API. So we're going to be analyzing that behavior through the lens of, you know, this flow monitoring. So first thing we do here, that's the data, right? It's in this directory here, open source submit, and we have, and again you can actually run those things in your own leisure by accessing the collab environment, but this is the TTP YAML, again the same TTP YAML that I used before, it's also publicly available, and you construct the graph and when you get, when you call that TTP function, the first thing you want to basically take this nitpick on is what are the potentially known TTPs that this data contains and actually contains a lot of them. So there's something interesting going on in the server, right? And bear in mind this is a data collection of many, many several minutes, but we can quickly tell what's going on without actually going into the underlying data, and this is useful. And here we're going to get the first TTP and actually give me the data that is baking or backing that TTP and that's the data, right? So let's take a look very quickly here. That's a process event so it's an execution of a process. I don't care about the PIDs right now but I care about the process that is being executed in this S copy process. So there's an S copy happening in a Kubernetes environment. That's of interest. I want to know what this remote copy program is doing, what is it copying to? So it's copying something into this folder and I want to know what it's doing. So how do I do that? So essentially I can create another graph using the same data, but now I have this indicator of compromise. I know that S copy was executed so I will basically use this little expression language here that I can actually express that. I can say hey give me the the graph that corresponds to the process whose command line contains this S copy program. And lo and behold, you actually got that node, right? There is this process being executed. You actually know its parent. That's the nice thing about provenance tracking. You can actually quickly relate to parents and the things, the resources that are being interacted with. So here you know that it's being invoked by SSG but most importantly, let me show here in the bottom, you have this file here is of interest. So it's actually dumping or copying this package.jzone file. And for those who know, package.jzone is the file description for packages that usually goes into node.js. So now I have, through my knowledge, my domain knowledge of this, I want to know is there any process descending from node in this machine? So that's my second indicator of compromise. I'm going to build an undergraph from the same data but now projecting over this query here. And again, you see most of the TTPs associated with this node, this Kubernetes cluster, are actually associated with the behaviors that is pulling from node.js. So we are getting closer to narrowing down where those TTPs are coming from. So let's move forward here. So again, those are the records that are underlying those TTPs, are backing those TTPs, and they are actually being spawned by node, and you have WGETs going on there. So I'm going to skip over this one. Again, you can see, you can actually trace the behaviors that are going on. So you have like a WGET going on there, you have a Chimod, you have lots of things going on there. And here that's the thing that I want to show you. If I build the graph using essentially that IOC, I want to see all the descendants, or all processes whose descendants contain node, and whose PID is in the set of the PIDs that I have associated with that particular behavior, you actually get a graph like this. And you see on the top, you have your node application with benign. It's the production application that is running on that Kubernetes host. It's containerized. But eventually what you get is actually the attack that CVE allows for remote code execution. So you get this basically shell that is spawned by node, which is a typical which is calling to a netcat that is piping back into a reverse shell to the remote client that is controlled by the remote server that is controlled by the attacker. And from there on you actually have a chain of events that calls into bash and the interesting thing is that you can actually this is all like temporally ordered. So you have for example here in the bottom, you have a WGAT so the attacker is doing a WGAT, so downloading a script called tweet inside that machine or that container. And then it chimods that script to be executable right, so that's the chimod here that is going on. After that it actually executes the tweet script, so it's actually running it and you see that by the exact here. And then it removes it, right, it has to erase its traces. So that's the chain of events that you can tell by just creating this graph. So now you know what attack Q chain kind of looks like. Now you want to know also perhaps the impact of this in terms of what has actually done has it actually moved data around and things like that. So that's the other thing you can do, right, so you can because it's always relational so it's not only captured process control flows but also interactions with the network and the file system. You can say, hey now that I know that tweet was executed let me look at this indicator of behavioral compromise. So the process containing tweet in the name and I want to also see all the file flows and network flows whose flows are, you know, greater than zero associated with tweet. And here you get, you know, basically the tweet process. Again, I didn't change the data all the same data. I'm just projecting over the data into different ways. And here I see the tweet process that is actually reading a lot of things from the file system and also doing a lot of things in the network and we're getting close to the end here so very interesting so what I want to show you is basically I'm projecting into this node here you see the node ID of the graph so I'm going to say, hey give me a plot in time of egress and egress traffic for the network flow and lo and behold what you see here is a lot of, you know, sort of command and control so you see like information coming in back and forth from that tweet script but also lots of information being written out send out to the network by that tweet so something is being moved out of the cluster so that's interesting and how is that being moved out of the cluster and that's what we can do also with this so you can actually switch and basically build those flow diagrams using the same data again and show specifically for example this is an interesting thing about capturing lots of metadata you can see that from 165 which is the IP of our object storage we know that those dark green flows are the flows that contain lots of data moving we can see that the flow is going from this all the way passing through our Node.js container and flowing through those 104 IPs which belong to the tweet API so here you see that indeed there was like a flow of information on that excitation going on through the network so that's pretty much I think what we have to say here I think we can pause for questions Have you considered Rust instead of C++ to write it just as C? We have I think it's probably on the backlog I think there are advantages to going with Rust especially in terms of simplifying some of the code down and stuff like that so at some point we'll probably look at it we want to get more features before going back The stack that we build upon today is also all C++ full leads it's all written in C++ but as the kernel modernizes and things we'll consider Second question taking the system framework and graphless and everything is it possible to use different types of records from different domain like obligation of their ability like using this graph to visualize the stack trace and create our transactions Is that possible if you thought about it? Yes, in principle anything that has control and data flow in principle we could use the same algorithms that we use to build those graphs based on system calls on system flow we could apply them to other domains Is that an Apache license? It's an Apache too Cool, thank you The processor itself we did get into it in this one but it's designed to allow you to write different drivers so you can take in different data sources you can combine them together it comes right now with the sysflow but you could do other types of things like we're actually one of the things we're playing around with right now is enriching sysflow with other data sources so you could do that also You're not bound to the sysflow as your input to other drivers to other data sources essentially for that telemetry pipeline They left the sessions for the last time Thank you You're welcome We also have limited editions This is the remaining survivors it's the limited first edition of our little logo so 1, 2, 3, 4, 5 and like you say we have a slack for the sysflow community so if you go to sysflow.io you'll see that also people that like to use Falco we're also there so if you want to come say hello we have questions We're both in the Kubernetes Workspacing Slack if you go to Falco we'll find us and also there is a community workspace in Slack and also reach out to us so both projects alright