 Let's wait two minutes or one minute now, and then I'll go go. How's everybody doing? Yeah, thanks for joining me. Appreciate it. Who here knows what morphology is in general? No. Cool. Okay. Yeah. Okay. Yeah, well, that's the, that's the thing I'm trying to kind of use as the example for how we should start to think about these things. So I talk about what it is at the beginning of the presentation. So that's good that nobody's familiar because then it's not going to be repetitive. And also you won't know if I'm right or wrong on that topic. So you trust me, that's dangerous. That's dangerous. All right, we'll go ahead and get started. I'm Jared Atkinson. I'm the Chief Strategist at Spectre Ups. And this is the malware morphology workshop. Make sure I could actually change. So just a little bit of my background, I was in the U.S. Air Force kind of got started and I was just like, I'm not sure what to do with that. change. So just a little bit of my background, I was in the US Air Force kind of is where I got my start. Hey guys. And this is a picture of the Ops Center that I worked in. So as a secret facility, but they brought in some people take pictures of it to kind of show it off in the in the Air Force newsletter or whatever. And so I was able to find it online and share it. So I actually like wrote this this like ASCII stuff up here spent a lot of time making everything pretty with those boxes. But I want to start off with kind of a story that illustrates why I think this type of analysis is really important. So when I was in the Air Force, we were doing this exercise called cyber flag and it's in Las Vegas. And basically all the different US military service services come together the cyber divisions. And they kind of work on this exercise. And the idea is that they bring in the NSA red team and the Air Force red team and all these kind of big military red teams, and they'll attack the network and our responsibility was to be threat hunters and find them right. And one day, the white cell, which are kind of like the referees, I guess you could say of the of the exercise they came to us and they said, Hey, the red team has a golden ticket. What are you guys going to do about it? Right. And this was in 2014. And I don't know if you're familiar with golden tickets, but golden tickets was an attack that came out at at Black Hat in 2014. And so it was right after it had come out and we're like, we're like, Oh, we're going to go find that golden ticket, and we're going to take care of it. And then they left. And we all looked at each other. And we're like, does anybody know what a golden ticket is? Right. So this is where we were at, right? We had the golden ticket. And that was the moment that we realized that it's like, you can't detect what you don't understand. Right. Because what like, where do you even start if you don't know what a golden ticket is? You have no chance, right? And so the idea is that you can't detect what you don't understand. And there's levels of understanding, right? The more thoroughly you understand something, the higher the ceiling is on your ability to actually deal with it, right? And so then then that breeds the next question of how do I start to understand something so that I have the maximal capability of actually trying to detect it. And that's what we're trying to go into in this workshop. So we're going to do an introduction to morphology, like three slides just to kind of talk about it. Morphology is generally speaking the study of the form and structures of things. So the places where you're going to see it most frequently used are in biology. So the structure of organisms, right? So like anatomy, for instance, we're all probably familiar with anatomy. That's an example that's a subdiscipline of morphology. And that's how we start to say things are similar or different, right? So you might look at the size of an organism's brain and say that organisms that have similar sized brains might be similar in how they operate. Or you might look at the fact that we're bipedal, right? We have two legs. And like a dog has four legs. And so that would be use of morphology. But it's also used in linguistics. And so the structure of words is a study of morphology, right? And so the idea here is that generally speaking, morphology can be used to identify similarities between things, right? So two organisms, chimpanzees, for instance, and humans that both walk on two legs might have some sort of similarity in their, what they call the phylogenetic tree, that they're like evolutionary background, as opposed to things that live underwater, right? Because they don't have lungs in the same way. They may not be very closely related, right? And so you could say that like we're more different from whales, for instance, than we are from chimpanzees. And you could you could tell that just by looking at the two different organisms. Okay. And so what we're trying to do is figure out how can we look at malware samples morphologically to then evaluate the similarity between the samples, right? So the idea is that the more morphologically different two samples are, the more different they are going to behave basically when when they're executed. Okay. All right. So biologically, you're probably familiar with this picture, right? So there's this idea of how it's called the Linnaeus taxonomy. And that's how we organize species into groups, right? And so for instance, we're homo sapiens, right? And like dogs, this is actually, this is actually wrong. So dogs are a species called Canis familiar, familiarities. And then you have wolves, which are Canis lupus. And so that like, the way that they start to organize these things together is at the species level. But then there's, for instance, dogs and wolves belong to the same genus, right? And the genus is like one level higher in being abstracted, but it includes both dogs and wolves, because they're very similar. And then if you go a level higher, you would have dogs, wolves and foxes, right? And if you go a level higher, then then we're talking about carnivores, right? And so carnivores are going to be dogs, wolves, foxes, lions, but lions are cats, right? So they don't belong to the dog category. And then you have apparently mammals, and I didn't know this, but maybe the big distinction of mammals is that they, they feed on milk as infants. Who knew that? I mean, I knew that that happened. I just didn't know that that was the main thing that classified them. And so you have, you know, rats would be included. And then you have, you know, animals that have backbone, so vertebrates, right? And that would include birds, for instance, and so on and so forth, all the way up. And the idea is, is that the, the lower down in this, in the structure that you can group two things together, the more similar they are, right? So for, for instance, you could say a dog and a wolf are more similar than a dog and a lion, because the dog and the wolf share the same genus, and the dog and the lion share the same order, right? All right. And then you have linguistics, and I'm not even going to attempt to pronounce the French version. I speak Italian, so I know, I know how to say that, but you have this word herba in Latin, and that, that gets inter, that, that became the, that was the root that influenced the development of many different languages, right? So all the romance languages are based on Latin. You all probably know that. But every, every child language of Latin had a slight difference, a different way that they took that word and implemented it, right? So, for instance, Italian is herba, right? So they just dropped the H, the H became silent. And in French, from what I understand, the H is silent as well, but you didn't get rid of it, you just kept it around, right? So, and then the, the E is silent, I guess, I don't know. Yeah. Adam, is it Adam? Okay. So, so the idea is, is that you could start to, to look at this, and it's actually, actually pretty cool, because in linguistics, what they do is they're actually going the opposite direction, right? So they're saying, let's look at all the romance languages, see how they say certain words, in this case, grass, right? And then they'll, they'll be able to link that back to what the kind of root word was in the parent language in Latin. But then you could actually go even further back and start to reconstruct language, dead languages or lost languages. So there's a language called proto Indo European, which is the language that Latin and German and Russian and a bunch of bunch of other languages came from. And nobody has any written record on that. Nobody speaks it. Nobody even knows that it exists. But they know what certain words are based on certain rules and certain trends that they can observe in the existing languages. So one really common trend in languages is that the H sound tends to drop off over time, right? And so, so you see this in every single one of these languages, the H dropped off. So you have Edba in Italian, you have Edba. So Bs and Vs, they have similar sounds. So B is like a buh, it's a like a voice or a non voiced. And then you have a V, which is and that's voiced. And they're essentially the same consonant. They just have the way that you push out the air is different, right? And so the idea is, is that you can actually start to analyze and say which of these words is more similar to the parents. So for instance, Edba in Italian is the most similar to Latin Herba. But then, for instance, Romanian, which is according to this yard, but I don't speak Romanian. So don't don't judge me is is the most different. It's the one that changed the most from from the source. Now you could then go to other languages. For instance, in German, you have grass, right? And that's similar to English grass. And in Norwegian, you have grass. And you're like, okay, well, those all are obviously similar to each other. But they seem different from Herba, the Latin version, right? And so where did that come from? And then you have Korean, for instance, which I just Googled this. So John D, I guess, and, and that would be even different, right? Because that probably very unlikely shares the same higher order root, right? And so just the idea here is that you can in biology and linguistics, it's very well established that you could use morphology to start getting an idea for how things work. Now, the interesting thing, like I said, you have this language, Proto-Indo European, and both German, right? And Latin, both derived from Proto-Indo European. Olaf, how do you say grass in Dutch? Grass, yeah, okay. Yeah. Okay, there we go. And so what they what what you can find is that all of these words come from, I have no idea how to pronounce this, but grass, maybe. And that's the that's the base word that Latin then changed to Herba in some some miraculous fashion. And, and in German, they changed it to grass, right? And so like both those words, even though they appear different on the surface, they have the same parents. And so you can say at a high level, they are relatively similar, as opposed to say the Korean version, which is going to be very different and is not related to Proto-Indo European. Okay. So what does that mean for us? Well, there's two different type types of taxonomy. So we had that Linnaeus taxonomy, this one here. And that is a what we call like a scientific taxonomy. And that that was built based on morphological analysis. So you would look at these, the structure of these organisms. And some of it is just visual. But then you also have kind of scientific scientific type analysis. And you could develop taxonomies based on that. But you also have what they call folk taxonomies, which are these taxonomies that are essentially cultural or like old school before before science was really a thing, right? And so this would be based on like obvious tendencies of things, right? So you could just look at an organism and kind of classify it with things that appear to be similar without doing like really, really scientific analysis. And so for instance, one, one question that might arise is like bats, for instance, right? Bats, if you just were to look at a bat, it appears to be more similar to birds than to mammals, right? And so it wasn't until people started kind of analyzing bats scientifically that they realized, okay, hey, this is, you know, we're looking at its habits, we're apparently looking at whether or not it's, you know, children come like coming eggs or whether they're birds in different ways. And or like, I don't know, apparently bats, maybe they maybe they feed their young milk, I don't know. But there's different things that they started to look at bats. And they're like, okay, well, that's even though it appears to be a bird, it's not actually a bird, it's more like a mammal. And so similar thing for whales, right? So you might say that a whale is more similar to a fish than a mammal, but whales are also mammals, right? And so what you find is that the overlap between folk taxonomies and scientific taxonomies are actually quite large. But the problem is on the edges is where you start to get divergence, right? And so that's, that's an interesting thing. I like one of the ways that this becomes relevant is I think that for instance, mitre attack, the way that they organize different things into techniques, that's more similar to a folk taxonomy. Right. And so what you're going to find is that most of the time, it's going to be accurate, the classification of things and the groupings of things are going to be very accurate. But sometimes you're going to find, hey, that doesn't really like make that much sense. Or I don't know how to fit this thing into the existing taxonomy. That's because it's a folk taxonomy. It's not fully a scientific taxonomy. All right. So, so there's some fundamental questions that we have to ask. So imagine that you're an attacker, and you're trying to laterally move from one machine to another machine. And there's, there's a number of different tools that you can use to do that. So there's a bunch of different techniques you could choose. And maybe you choose to laterally move through a window service. And then when you make that decision, there's a bunch of different tools, right? So there's a tool called sc.exe and you could create a service on a remote machine that way. You could use PowerShell's new service commandlet. You could use a third party tool called sharp sc to do that. And so the question that, that you could be asking one, one important question is, how similar are all of those different tools? So the attacker is asking the question of, do any of these give me an advantage over the defenders? If I were to use one versus the others, right? Because from the attacker's perspective, they all create a service, they all get lateral movement. But maybe one does it in a certain way, that's going to make it advantageous. And so the question is, how similar are they, right? Then there's this idea of techniques in the kind of minor attack kind of idea. So service creation for lateral movement, that would be an example of attack, credential dumping from LSAS would be would be a technique. But then there's tools. So sc.exe sharp sc. And but there's this question of, are there just like we have, you know, species, and we have like the the class, right? So imagine that the class is mammal and the species is Connie's lupus, right? What is there are there layers of the taxonomy that exists between the tool and the technique that we don't know about? And the idea that I'm trying to present is that the answer is yes, right? And then are there degrees of similarity? So just like if two things are the same, right? Who here, if you wanted to look at two, two different binaries on a computer, two different files, how would you determine that they're the same? Does anybody have an opinion on that? hash, right? And so hash is like, absolute similarity, right? So if you change one bit, then then they're not the same anymore. But could you say that there are when when you're talking about things that are different, could you say that there's things that are more different than others? So imagine like if I changed one bit, that's still pretty similar, right? But if I changed a gigabyte, right, like that would be very different. And a hash doesn't allow you to evaluate the relative similarity. If that makes sense, it's only absolute similarity. And so we go into that. And that's so you answered the question right before the next slide, which is perfect. And and so then the question is, is how do we start to evaluate whether things are slightly the same or very different, right? Because if something's very different, then we don't have to worry about that. But if it's a very small change, then it's probably relevant for us to be aware of it, right? And so we started doing things like piecewise or fuzzy hashing. And this is where you hash sections of a file. And then you compare them. And it's like, if a lot of the sections end up being the same, then we can say, ah, that's mostly the same thing. You also have imp hashing, which we'll talk about the import table and portable executable files later on. But the idea is is that portal, like binary files on Windows, they have dependencies on functions that the operating system makes available. And the idea is that if two binaries have the same dependencies, then they're probably doing similar things, right? And so you could say, you could look at the import table and hash the contents of the import table and say, if these things are the same, then we could say that these two files are basically the same. Now, those are kind of like two of the ways to start answering the question of like, just how different are these things? But there's there's more questions. So I never do a presentation without having a little philosophy built in. And so this is Plato's theory of forms. Who's who's familiar with like Plato's cave? Yep. So Plato's cave was a the allegory of the cave was representative of this theory of forms, generally speaking. And so the idea, the idea, and this actually has like a object oriented programming kind of sim like, there's there's a way we could talk about this in the context of object oriented programming, right? So who here's a programmer? Anybody? I know, Simon, you said you're a developer. Okay. Okay, cool. Well, we'll talk about we'll talk about in both ways. So imagine that everything that you see in the world is what what Plato would refer to as a particular, right? So this is a table, right? But it's a it's a unique instance of a table. That's a table. And even though we would say that like this table, or maybe these two tables that are connected right here, those are the same table, because you would, they may be like, the vendor sells them as under the same skew or something like that. So they're the same table, but they're not literally the same table, right? The fact that there's two of them tells us they're not literally the same, right? And this table is slightly different than that table, right? Because it's wider than this table. And so there but for whatever reason, we're able to classify them under the same category table, right? Even though they're different and tables come in all kinds of shapes and sizes. But at the same time, we're always able to see we're able to look at it and immediately say that's a table. And so there's this weird thing that that allows us to group things together. And so Plato said that these are particulars and then the concept this perfect, perfect concept of a table is the form, right? And so that's in this like kind of middle form. So he, for instance, there would be a form for a house, a form for a horse, a form for a rose, right? And those are the forms are these perfect representations that this would be the similar in object oriented programming, if you care, to a class, right? So you have a you define a class, and then you create instances based on that class. And so the instances would be the particulars. But then there's even higher levels of forms like beauty and justice, they call these the transcendentals, right? So beauty, justice, good, so on and so forth, right? And all the forms are imperfect representations of those higher forms of beauty and justice and good and all that kind of stuff. And so one of the things that I want you to constantly be thinking about is when you're interacting with tools, like, Oh, here's a tool Mimi cats, right? Mimi cats is a particular credential dumping from LSAS is the form. So the technique is the form. And the tool or the malware sample is the particular. And that's, that's kind of a frame of reference that we should keep. And just because two particulars implement the same form doesn't mean that they implement the same form in the same way, right? So if you're talking about tables, you have a rectangular table, and you have circular tables, and they implement the same form of a table, but maybe they're not interchangeable in the same way every single time. And so that maybe like, if you build an AI model to detect to visually detect tables, you might you might first build a AI model that only detects rectangular tables. And then you have to kind of go back to the drawing board. And so one of the problems that we need to fight is let's not build the AI model that only detects rectangular tables, let's build an AI model that detects the form of tables or any particular table. Then this is kind of the last frufy philosophical idea, but I'm happy to go into that if people if people like it. So there's this idea called the map is not the territory, isn't anybody familiar with that? So I'll just kind of read it. And then I'll I'll talk about it. So Alfred Korzybski was a Russian mathematician who wrote a paper in like 1931. And he he came up with this idea called the map is not the territory. And he said, a map is not the territory that it represents. Right. But if correct, it has a similar structure to the territory which accounts for its usefulness. Right. So it's impossible for a map to fully contain the territory that it covers, because there's an infinite number of amount of detail. Right. And so in order for a map to fully represent the thing that it's representing, it would have to in the perfect amount of detail, it would have to be as big as the territory. And then it would not be useful. Right. And so we summarize, we summarize the territory in the map in a low resolution form. Right. And but the the idea is in order to be a map, for instance, I'm from California, so I'm going to use California as an example. Fresno is in between Los Angeles and San Francisco. Right. And so if you build a map, even though it's not a perfect representation of all the roads or all the territory between in California, you would expect that Fresno would be roughly between Los Angeles and San Francisco. If Los Angeles was between Fresno and San Francisco, it'd be a useless map. Right. It actually be not just useless, it'd probably be negatively useful. Right. And so the question is, is how do we start to build out as we're building out this taxonomy, the taxonomy is a map. Right. So as we start to analyze tools and start to understand the form we're building a map and that map is always going to be an imperfect representation of what the techniques that we're interested in are. The question is, is how do we make sure that it's useful? How do we make sure that we include enough details to make it useful for us? Part I'm coming from like a detection engineering perspective, but you could also come to it from a number of different perspectives and information security. So, okay. And then there's an idea that a lot of us have this idea of like, as you're working through projects, you might be like, oh, yeah, I've seen something that does something that does this attack techniques. We're going to talk about token theft. Right. So I've seen something that steals tokens before. And I think it did it kind of in this way. That's valuable knowledge to have, but it's not explicit. So the idea of making a map is that you're making your understanding of something explicit. Right. And that's, that's a useful, it's explicit, explicit, geez, explicit knowledge is significantly more valuable than implicit knowledge. Okay. Alright, so I don't know what all your backgrounds are. So I'm trying to kind of give a little bit of an idea of what this might be useful, who this might be useful to, to, if you're a detection engineer, generally speaking, this is going to really help you understand as you're building detections, have I scoped out what the problem actually is? Well, and how do I know that I've actually built a detection that covers that problem? In the first place, if you're a red teamer, this helps you to understand the value proposition of one tool versus another tool, and also allows you to help facilitate discussions with defenders about why they may not have detected certain things, right. So it's like, it's one thing for a red team to say, hey, you miss this, it's another thing to say, hey, you miss this because I use this variation or this, this tool that did it in a certain way that probably would have caused you to miss it for x, y and z of reason. If you're a CTI analyst, it helps you to kind of measure the similarity between samples and then get more granular categorization. And then security architects, detection engineers are kind of like, in my opinion, junior security architects. So generally speaking, it's almost always better to create a preventative control than it is to build a detective control. Right. So if you could stop bad things from happening, that's better than knowing that bad things happened, because if you know it happened, you still have to respond to it. And there's opportunity for failure there as well. And security architects are going to be the people responsible for implementing these preventative controls. Right. And so a lot of these things, if you could detect it, you potentially can prevent it as well. Or at least you can think through whether that's, that's possible. Okay. So we're going to start looking at some tools. I think I've already shared the GitHub repository, but feel free to check that out. So it's github.com, Jared C. Atkinson or Jared Katkinson, if you like, and then malware morphology. And we're going to look at token thefts. Does anybody, everybody knows what token theft is? Or so just in case, token theft, generally speaking, if I'm operating on a on a machine, right, what I can do, I'm operating as Jared, right, and Olaf who's in the back, maybe he, that's his computer. So I'm a red teamer, I've got, I've got access to a computer. Olaf is the admin, right? So I'm running under the context of Jared, because Jared double clicked on my phishing payload or whatever, right? If Olaf logs in, he now the operating system creates what they call a log on session. And with every log on session, there's an access token, and that kind of represents Olaf's identity as he moves throughout the computer. And what I can do is I can get access to Olaf's access token, and I can impersonate it. And that would allow me to begin operating in the context of Olaf's user who might be a domain admin or might be, you know, server admin or something something that gives me access to things that I previously did not have access to as the Jared account. And so what we're going to do is we're going to look at six different samples that ostensibly do the same thing, they all steal a user's token, right? But they do them in slightly, well, they do them in slightly different ways, right? And what we're going to do is we're going to kind of go through a process which allows us to analyze them morphologically, which then allows us to compare and contrast this, how they're different or how they're the same. Okay. So as you as you're kind of, if you wanted to do this after the class, right, and do this type of analysis, you can just choose any tool and run through this. There are some reasons why certain tools are going to be more complicated than others. The general the general rule is the first thing you should look at our open source tools versus close source tools. So you want to look at open source, especially as you're getting going just to like kind of build up the capability and get used to the process, just because that removes one step of complexity, right? So if you have access to the source code, you don't have to reverse engineer what's going on. And so it's just going to be easier, right? You can you can just follow through. But just because you have the source code doesn't mean it's easy. I don't know. Has anybody looked at the Mimi cat source code ever? Okay, it's a hot mess. And it's well, if you speak French, it might be easier. I don't know. But it's very, very difficult to to follow. And I like, actually, I think I haven't pulled up. So let's just let's just look at it. So I'm just going to show us the first sample. We'll dig into this in a second. But we're not even going to look at it to figure out what's going on. But look how simple this is, right? It's one one page, 22 lines. Easy, right? Here's here's Mimi cats. So like you can see that there's am I not sharing? Oh, I am sharing. Okay, I'm not sharing. Okay. Okay, so Mimi cats is significantly more complicated than than it has to be, right? So there's there's easier and harder. And so one thing that you look for is what I call simple versus complex tools, right? So there's there's tools like out mini dump. There's tools that just do one thing and one thing only, right? And so like, you run it and it does the thing that you expect it to do. There's like no command line parameters, you don't have to specify any options. It just does that one thing. And then there's Mimi cats, which will allow you to create a golden ticket to dump credentials to steal tokens to, you know, Kerberos to do like 5 trillion different things. And like, Mimi cats is going to be harder to analyze as a result of that because there's a lot of shared code base and there's much different files and it's more complicated. And so, generally speaking, we want open source tools that are simple. And this is just as you're kind of building up that skill set. And then the programming language that the code is written in is going to be important, right? And so I recommend people start with C or C plus plus tools. And the reason for that, that that may seem actually like kind of contradictory, because I would have thought the opposite. So PowerShell, for instance, which is like a programming language, I like to write things in or scripting language, depending on how pedantic you want to be. But the the idea with PowerShell or C sharp or Python or Go or Rust or any of those types of languages is that they they have a lot of they're like managed or they have a bunch of libraries that like are built in. And so you actually have a bunch of layers of obfuscation between what's actually happening and what the code says, right? And so like in PowerShell, there's a commandlet called get process, right? Get dash process. And that does a process listing. It's like a task list, right? Just tells you what processes are running on the system. Well, there's an API function that that undergirds get process. But in order to figure that out, you have to go to the dot net reference, you have to go to the PowerShell source code, figure out what, you know, dot net classes is get process calling. Oh, it's calling the system dot diagnostics dot process class. Okay, well, let me go to the dot net source code and go see what that's doing. Oh, that's calling the NT query system information API function, as opposed to NC, it would just say NT query system information. You're like, Okay, got it. And so C is always going to be kind of like the easiest place to start. And that kind of leads into what, what, like, what does it even mean for something to be a malware sample? So have you ever heard somebody there? There's this famous black cat booth, kind of like advertisement that was car is carbon black. And they said, we stop Mimi cats. And that's like one of those things that could mean it's like could be interpreted in an infinite number of ways, because like, does Mimi cats mean the like pre compiled version of Mimi cats that's in the GitHub repo? Or does it mean like any version of Mimi cats, no matter how it's changed? Or does it mean things that do things similar to what Mimi cats do? Like, who knows what it means, right? And that that that could have been by design or it could have been through ignorance, who knows? But the the idea is that it's not obvious that because you have a tool that it does one thing. And so asking what it does is kind of a weird, a weird idea. And so there's, there's kind of these three different levels, I think of tools are standalone tools, like that one, that example, new service. What that does is it creates a new service. And there's only one thing it does. There's Mimi, there's like complex tools. So like sc.exe would be an example of that, you could create a service, you could delete a service, you can modify a service, you could do all kinds of service type tasks. So there's potentially multiple things that it does. And then you have C two platforms. So like saying, what does Cobalt strike do? It's like, well, it could do literally anything, right? And so, so it's just, you get one file, the problem is, is you get one file, right? So it's like, oh, we have a beacon, a beacon agent, right? Well, that beacon agent is the agent for Cobalt strike. That's actually a component of a C two platform that is extensible. And it can do a bunch of different things. And so that's not the same as if you got a file for, you know, this out mini dump PowerShell script that only does one thing. And so you have to be kind of cognizant of, am I looking at what the file does? Am I looking at what the module does? Am I, am I looking at what this particular command does? And so we want to break it up into the atomic parts as much as possible. So in the class, we're going to look at it's the, it's all going to be standalone tools. So they're all going to do one thing and one thing only. And so the, what does the tool do? And what is the module or whatever do? Those are going to be synonymous. But that's not always the case. So it's just something to kind of be careful for. So just want to give a little before we're going to start looking at API functions for the Windows operating system. And if you don't know what API functions are, or you don't have any familiarity or experience with them, that's fine. We're going to kind of dig into it and we're going to start slow. The idea here is that generally speaking, when you're developing code for an operating system, you have to interact with the operating system in some way to get it to do things like show, show Windows or to, I don't know, authenticate or do all everything that you do. And to create a file, so on and so forth. And in doing so, it's valuable like most operating systems will create what they call an API application programming interface, and that API is going to be your ability, it's going to give you a bunch of free compiled capability to interact with the operating system in different ways, right? And it's very useful because a lot of times it hides a lot of complexity, right? So for instance, if I want to create a file on a computer, I don't want and like in my application, I don't want to have to understand how the NTFS file system works. And then I don't want to understand how the FAT file system works. And I don't want to have to know how to differentiate between NTFS and FAT volumes, for instance, right? That's all a bunch of stuff that is literally irrelevant to me. And so I want it just to work. And so what they did is they created an API function called create file that I call create file and it does all that magic for me behind the scenes. Okay. All right. And then there's this concept that I call the function chain, which is basically when you create an application, you call different functions, right? And the the idea is is that the different functions operate in a chain. So in category theory, there's this idea called composability. And the idea is that you have you have a multi step process, but each step in the process produces an in an output that can be used as an input to the next step, right? So like you can't just do step three, you have to do step one and step two before you do step three, because step one gets output something that's important for step two and step two output something that's important for step three or required for step three. And so what we can do is we can start looking at how these functions are called by applications, especially if we have the source code, and we can start to build out that chain, that relationship between step one, step two, step three, and that becomes valuable for this this type of comparison. So going back to our similarity question, right? So if I change one bit, right, that means that these things are no longer the same in a literal sense. But would you say if I changed a thousand bits, that would be more different than if I changed one bit? No? Just like excluding the hash, like in like would you think that generally speaking, if I changed a thousand bits, it would be more make more of a difference to the functionality of the program? And if I changed one bit? Okay, that's perfect. Yep. So that's the answer we're going for. So it depends on what the bits do, right? So bits or usually probably bytes is more is going to be a more accurate for this conversation. But bytes are symbolic, right? So bytes can be representative of string data. They could be representative of op codes, right? And they could tell the computer what to do. And so there are certain bytes, right, that are going to be more significant than other bytes, right? And so like generally speaking, if I changed one bytes, it would be less of an impact than if I changed a million bytes, almost certainly. But on a byte to byte comparison, it could be drastically different, depending on what that byte actually was meant for, right? And so like the best example is a knob instruction, right? So if you've ever thought of a knob instruction, does anybody know what a knob does? Yeah. Skip. Yeah. So it does nothing, right? And so that's as low significance as you possibly can have, as opposed to say a call instruction, which calls an entire subroutine. So it's like one, like two bytes or three bytes that represents a whole subset of other bytes, right? And so changing a call instruction, you could say would be more significant than changing a knob. Okay. And so a call instruction is essentially a function call. And the most significant version of function calls are APIs, like operating system APIs. And so that's where that's kind of where we're going with this, right? So it's like you, if you change one bit, you make it different. But then, you know, depending on which bit you change, certain bits could be more important than other bits. And so what would be like kind of the most significant change you can make on a kind of bit or a few byte perspective? Well, that would be function calls. And so let's go look at the function calls and see what's going on. Okay. So this is the source code I just showed you, right? So this is the first sample and it's available on the GitHub. And we just for time savings, we kind of made it extremely simple. There would probably be more in real life. There would be like error handling and things like that. But these blue, blue parts are going to be function calls. So not to insult anybody's intelligence, but can people tell me what the names of the functions that are being called are in order? The first one. Open process. Yep. Next one. Open process token, duplicate token, set thread token, close handle, close handle, close handle. Right. So that's, that's what it is. So we could build this function chain and we can say, hey, here's our function chain. You call open process and the output of open process is this H process variable. And you see that that H process variable is used as an input to open process token. Right. And then open process token, the output is actually this H token here. And you see that that's being used as an input to duplicate token. Right. And then duplicate token outputs H duplicate. And then H duplicate is passed in as a input parameter to set thread token. Right. And so you can see that that's, there's this relationship to where you can't call set thread token until you've called duplicate token. You can't call duplicate token until you've called open process token and so on and so forth. Right. So there's, that's that composability thing. Right. You can't just, you can't just call these in whatever order you want. And then you see close handle closes the H duplicates close handle closes the H token. Right. And close handle closes H process. And so there's a little bit more variability in the ordering when we get to the close handles part. Right. Because you could close these in whatever order you want. And and it's not like this isn't necessarily dependent on well, like this one, which is the last one is not dependent upon set thread token. It's actually only dependent upon open process. Right. And so there's a little bit of like kind of ordering there that can that can be there. Now, this is a PowerShell script. And I know that we said don't look at PowerShell, but I actually like kind of cheated and simplified it a little bit so that it you don't have all that redirection and net and PowerShell and all that kind of stuff. And so can people tell me kind of what you see, what you see with this one? What's the first function call that you see? Open process. Right. Right. It's the same. Right. So but we could say that 100 percent sample one and sample two are different. They would have different hashes because one's written in C one's written in C plus plus. So they like or in PowerShell. And so they have to they're just literally different. And you have things like, for instance, the process ID is being passed in as process ID here. But here it's called PID. Right. And then it's called. Let's see. H process, H token, H dupe token instead of H duplicate. Right. So there's there's just some like small differences, but those are not important differences. Right. And so what we could do is we could build the function chain again. And what we find is that, as you mentioned, the function chain is exactly the same. Right. And so now what we could do is we can say that these are literally different, but they are functionally equivalent. Right. And so the idea here is that they're different tools written in different programming languages, but they call the same functions. And so what we could say is that the differences between these must be only in the bits that are least significant. Right. The most significant bits are all the same. The least significant bits are the things where changes are happening. Okay. And so this means that if, for instance, I build a detection looking for set thread token, it would work on this one and it would work on this one because they're the same. And so this is a this is always an interesting use case, right, because it's two particulars, right, two instances that are different, but if you build it, so if you build a detection and you can detect, now, this is, okay, this is, these are mislabeled. This should say sample one and sample two here, but anyway, the, if I could detect sample one with a detection rule, but I can't detect sample two, what happened? Right. What does that mean? Does that, does that indicate anything to me? Yeah, better detection, right? So what I'm going for is that, that generally will be an indicator that you have a, you have a signature, right, a tool, a particular, like an instance specific signature, right. And like, this is the easiest, if this is the easiest type of situation for you to expand your detection and say, like, hey, these two things are doing literally the same thing. Why can I detect one, but not the other, the, the most extreme example of that would be my detection depends on the hash of the file, right. So I'm not resilient to any change. The idea is, is we want to make our detections as resilient to as large of a change as we possibly can, right. And this is a relatively small change. And so we should be resilient to it. So there, there was, is anybody around when PowerShell was like the real hot thing and like red teaming and attackers. And so then, man, when was this 2018 maybe Microsoft came out with a blog post called PowerShell loves the blue team and they came out with all the like PowerShell logging and script lock logging and transcript logging and all that kind of stuff. And then all the attackers would, would change all their tools from PowerShell to C sharp. And I don't know if people are familiar, but PowerShell and C sharp are both based on net. So changing your, your, your PowerShell code to C sharp is like, it's literally the same thing. Like let's look at a, I'm going to share my entire screen so that for the recording. So give me a sec just to kind of get that going here. All right. So I'm just going to show this real quick. Okay. So there's this. Yeah. Out mini dump. Okay. So we have this tool out mini dump, which I talked about a second ago, and then we have sharp dump. Okay. And so out mini dump is this tool written by Matt Graber who's now a red canary. And what this does is it allows you to create a crash dump for any process. Right. And that crash dump general, like the reason why they do that as attackers is that you could crash dump LSAS. And then if you crash dump LSAS, you can then find passwords basically in there. Right. But how many dump is written in PowerShell? Right. And so here's, here's what it looks like. So you have mini dump, right. You call this function mini dump, right dump and you, you're executed and you, you get all this stuff. Right. Okay. But then when PowerShell kind of fell out of vogue, because now there was a lot more telemetry and a lot more visibility into what PowerShell was doing. Another colleague of mine named Will Schroeder or harm joy, he wrote this tool called sharp dump. And he literally says sharp dump is a C sharp port. Port just means I rewrote it in a different language of PowerShell's out mini dump functionality. Right. And so if we go into the source code program.cs Let's see. Here it is may dump write dump. So they file stream. I didn't point that part out, but in how many dump they create a file, right, using this system.io file stream class. He also does that here. So file stream. And then they call mini dump write dump. And so we should be asking serious questions if this change from PowerShell to C sharp, where all they did was change the language. If that causes our detections to fail, we should be asking serious questions about why that is right. And we should be very critical of our detections at that point. Okay. So morphology as we talked about is kind of popular with biology. That's where that's where it kind of came about. So what I like to do is kind of assign each sample an animal, right. And so in this case, I have an Australian shepherd at home. This is why I chose that breed of dog. But this sample one is going to be represented by this very happy looking Australian shepherd. But sample two, which has the same, it's literally different, right, but it's functionally equivalent is also an Australian shepherd. It just is a different one, right. And so like one's a little bit more brown, the other ones a little bit more black, but generally speaking, they're the same dog breed, right. And when you think about dog breeds, for instance, or you have, you have these different levels of analysis, right. So a dog breed, if I know that your dog is an Australian shepherd, I have a certain level of expectations of how that dog is going to act, right. Shepherds heard, heard things, right. They're very energetic. They could jump high. There's certain things that you know about them. But if I just told you I have a dog, right, there's a different level of expectation that's going to be more generic, right, because you dog represents both Chihuahuas and Great Danes, right. So there's a wide range of variation. And so I just know that dogs generally are playful. Maybe if they get angry, they bark, I have some ideas about what to expect, but it's not as specific as Australian shepherd. And then if I told you, hey, I have a mammal, right, that would give you like you would be like, what the hell, like that's the weirdest thing that anybody's ever said to me. Nobody ever talks at that level, right. But that that would also be an accurate, technically accurate statement to make, right. But you would have no like it could be a cat, it could be a brown bear, like you have no idea what I have, what I have at my house. It could be kids. Yeah. Yeah. The worst of all, the most terrorizing. Okay, so what we have, do people actually want to do labs or is it fine to just, the labs are available. You can go through and look at them. It's probably, I think it's better if we just kind of do it as a group. And this one is particularly kind of simple. So I don't want to be insulting. But here's a third tool, right. Sample three. So sample three is potentially doing something different. Does anybody see the difference in sample three from what you saw in sample one and two. It's also written in C. So sample sample one was written in C. Sample two was in PowerShell. This is again written in C. So it should be most similar to sample one, but there's some major differences. You said what? Okay, so it's not cloning the token. So that was the duplicate token call. And then do you notice that here we call set thread token, which is one function. And here we call impersonate logged on user, which is just a different function. And the problem is I have no idea how different that is, right. But the one thing that Olaf pointed out is that it's not calling duplicate token. And so it's like, why does impersonate logged on user not have to duplicate the token? Why does it just get to use a normal token? So we could build the function chain. I'm going to take a drink of water real quick. And there's I never have like good chances to good. I don't know how to pause gracefully and drink water. It's a weird thing. Okay, so we have another function chain. And if we again, compare the now the now the labels are correct, the function chains of sample one and sample two, which are the same. And so the cool thing about things being the same at one level of analysis. So you have the literal level of analysis, which is the hash, right. And they're different at the hash level. But if we're talking at the function level, which is just a level of abstraction higher, I could ignore all the differences and I could just treat them as if the same. And so now I'm only using one function chain, right? This one with the smiley Australian shepherd. I'm using that to represent both samples one and two. We actually don't even have to talk about the differences between sample and what we don't have to talk about sample two ever again, because it's the same at the functional level of analysis, right? But now what we're doing is we're saying, hey, this is sample three, and it's functionally different, right? We could we could build the function chains and see immediately that they're not the same. And so what is that if we were to do some morphological analysis? Okay, all right. So this is where we get into this idea called the function call stack. And this is where we get have to get a little little more technical. But the way that the function call stack works is when I call a function, so create file, right? And in this example, I'm calling create file and there's actually this whole like set of abstraction underneath, right? So I call create file and then there's a bunch of calls that happen subsequently, right? And so here we see this is create file and create files implemented in a DLL called kernel 32 dot DLL. And what that does is that calls another function called create file. So that's what this pink thing down here represents. And so I talked about import hashing earlier as just a way to try to see similarity. And so every portable executable that's like a exe file, a dot sys file, so system driver or a dot DLL file is going to have two, well, it has lots of things, but two of the things that are worth highlighting are the exports table and the imports table. So I talked about those APIs, right? The exports table represents functions that one binary is making available for everybody else. So this is very common in DLLs, right? So DLLs, the whole point is that they have their shared libraries, they share functionality. And the way that they share that is through functions which they export. And so you could go, you could load a DLL into a disassembler or into portable executable file parsing tools and you would be able to look at the exports table. In this case, we're looking at kernel32.dll and we would see that create file W is exported. But we also like, I didn't even know that this existed, but there's this LZ create file W that's also exported. And so that's a different way to create a file that's that may be similar and may be completely different. I don't know, but we we could look at that and see what's going on there. But then you also have the import table which is which is basically this table that says, hey, I'm using functions that are shared by other libraries and here's where my dependencies are, right? So it's telling you about dependencies. And so when we see this pink one right here, so this is, if we were to just double click, this is the export table, right? If we just double clicked on create file W, it would take us here, right? And then we start to look at it and then it's like, hey, I have an import. I have a dependency on some other some other DLL basically. And I'm requesting this create file. So then we go to the imports table and we see, hey, we're importing that from something called API-MS-WIN-CORE-FILE-L1-1-0. Just rolls off the tongue, right? So it's just very easy to say. This is something called an API set which is essentially this mechanism for as Microsoft extended the different types of operating systems. So they have like server core, but they also have like HoloLens and Xbox and there's a bunch of different operating systems that will sometimes have the same DLLs and same architecture, but sometimes won't. What they had to do was they added this like extra layer of redirection that allowed it to do checks to make sure that when an application requests something that they knew where it was going to be because it's not like the location of a certain file on HoloLens might not be the same place as on Windows and so they needed a way to be able to do that lookup. And so there's this there's this PowerShell module called NT Object Manager which we is in the it's at least in the workshop description that says that that's something that we're going to use but it has a function called git-ntapi set. So this was written by James Forshaw at Google Project Zero and what that does is it allows you to specify the API set and then it will tell you hey, if somebody refers to this that API set on this machine is located in this in this DLL it redirects to this DLL and so it's kernel-based.dll so we are in kernel32.dll now we're in kernel-based.dll and you can't see it in the class because this thing's in the way but we're kind of building this vertical chain now so we have create file in kernel32 we have this API set we have create file w in kernel-based.dll and we kind of keep going right so then we look at create file in kernel-based and we see that it calls this function called create file internal right and then we look at that and that calls nt create file which we look up is it's an ntdll.dll and then that makes something called a syscall does anybody know what a syscall is? a syscall is a basically a transition from user mode to kernel mode right so when you make a syscall you what you do so here's the syscall instruction so just like you have nops nop instructions you have move instructions test instructions jump instructions you also have syscall instructions and that's telling the processor I want to transition from user mode to kernel mode right but then it's like okay well where do I go in kernel mode well there's this there's this table that's a lookup table and what it is is it's based on certain numbers which are indexes in the table and in this case it's saying I want to go to offset 55 or the 55th entry in that table and that happens to be a function in the kernel called int create file so that's that's what's happening and what what ends up happening is we we transition from create file w in kernel 32 all the way down to this syscall called int create file and that's in the kernel right and and so the interesting thing is has anybody heard of like direct syscalls at all there's this probably the most most famous tool is a tool called dumpert which is a LSAS dumping tool and written by guys at outflank and in the Netherlands and what that does is it basically instead of calling the equivalent of create file w from kernel 32 it calls int create file and the reason why they do that is some EDRs or some security tools are focused on looking for this but all of these do the same thing right because if I call this then I call this like by calling this you you are telling the computer to call this to call this to call this so on and all the way down the stack and so what they found is well we could just skip the majority of the stack and go to the very bottom and make the direct syscall and that allows us to evade any naive EDRs or security products that are focused really high up in the stack right and that that was really popular like a year or two years ago most most good EDR vendors have kind of like figured that out and that's not not a big problem anymore but it's still something that we have to be aware of right so that means that as an attacker if I wanted to create a file I could call any of these functions right they're all equivalent okay and so what we would do is do a lab but I'll just kind of if I could find there we go what we could do is we could do a lab and so I'm going to look at set set thread token just to kind of give us a start on this and so I'm just going to go into windows system 32 here and set thread token just to kind of show you how this works so Microsoft has documentation for most API functions so here's the documentation for set thread token and what you could do it tells you this is what it gets used for right it tells you these are the parameters this is kind of like how you how you would call set thread token and then there's this means that there's a parameter called the thread parameter and then there's a parameter called the token parameter and then it tells you hey for the thread parameter that's going to be a pointer to a handle to a thread but it could be null if it's null then we're just going to assign the impersonation token to your current to the calling thread and so when I run when I run code if I want to impersonate myself then I would I would say I would leave that null and then the token is going to be a handle to an impersonation token right okay and then we can scroll down to the bottom and it will tell us where that function is being implemented so advapi32.dll so we could go into our computer and we're in system 32 and we could look up advapi32 and we could just drag and drop that into into Ida and that's going to get parsed and this is what we end up with you'll see right now it's not not super pretty it's still working one of the kind of tricks is that you'll see this number kind of keep getting bigger and then it will stop and then show you what they call the graph view so this is this is just kind of the way to look at it I remember we talked about the import and export table Ida kind of makes that nice and pretty for us and so we could literally just come over here click on exports and now we're looking at the export table so just to give you an idea of these like advapi32 or kernel32 the amount of functions that they export it's a lot right so there's a lot going on and some of them have really great and creative names like system functions 007 you know I don't know I wanted to make a golden eye reference but I don't know how I don't know how it will land so I'm a little self-conscious but maybe that function you know gives it a little bit more time before it blows up the gas tanks you know what I mean Brian I hope you know that reference yes okay or maybe that was 006 okay come on Brian Brian we got to be a little faster Brian I used to work together so it's it's I feel comfortable giving him crap and he just walked in so okay so what we're doing is we can we can just kind of search for set thread token okay and here we are we just double click on it and it looks like it's really complicated right so it's literally has one instruction and it jumps into this imp which stands for import so telling us to go look at the import table into a function that's also called set thread token which is can be confusing when you first get started it's like set thread token calls set thread token but it's not the same not the same one so we go to the imports table and we can do the same thing just search for set thread set thread token and we see that it's implemented by this ms or api ms windcore process threads l1 1 2 and so we could open up PowerShell okay I'm just gonna it's I'm pretty good at remembering these but it's still sometimes it's better just to have it there so I could read it and we're going to run James foreshows get into api set and we're going to specify the name which is api ms windcore process threads l1-1-2 and then there's a little trick that we'll kind of run into later where you you want to expand the host properties so did I do something wrong there and you would post this try it oh oh okay gotcha thank you yeah okay cool so what this is telling us that there's actually two different types of redirection and so the first one the way that we interpret this is if you're calling from if the call was made from kernel32.dll then it redirects to kernelbase.dll but if the call was made from star.dll that's not kernel32 then it goes to kernel32 so there's actually like two levels of redirection which is a pain in the butt and so what we could do is we could then open up kernel32.dll there will be a function called set thread token that function will then just call the imports which is the same api set which would then mean that we're calling from kernel32 which then would redirect us to kernelbase.dll so that that would just waste a little bit of time but it would look exactly the same and so we're going to jump straight to kernelbase.dll okay where are you okay I don't need that anymore so what we're going to do is we're going to drag and drop again so this is a process that takes a little bit of time especially when you're first working on it but you'll find that it becomes very very fast and things tend to do it the same way every single time and so as you do it more and more you'll kind of know what to expect before you even do it all right so kernelbase is still processing again we don't have the graph view so there it is okay so then we could go and look for set thread token okay and so there's a little bit more going on here okay not not a whole lot but the way to the way to interpret this is it starts here and then there's a jump which is a branching instruction so you could go one way or the other and depending on kind of the result in the situation it's going to go either okay I was like oh crap that's not good it's going to go one way or the other right and so but eventually both branches will call this function int set information thread right and so now we could go to the import table and we could look for int set information thread and one thing one thing to notice is there's int set information lots of things files processes tokens threads virtual memory object keys so on and so forth but we see that it's implemented an int DLL so we do the same thing let's just load the yeah of course preload file okay it's like hey we've already done this before do you want us to do it and I'm like yeah that'd be nice and it's like I can't actually psych I was trying to save us from having to sit here while this parses okay so we could do the same thing just without waiting int set information thread okay and we click on that and see it's not in the nice block but what we see looks like it's getting ready come on there we go what we see is that there's this syscall instruction right so that means that we're at the point where we're calling the syscall trans transitioning the processor from user mode to kernel mode and then we're going to call the D and so we could I think we just push H and that gives us the decimal version so the hex version which is the 13th function in that table now these numbers are going to change from operating system to operating system so every every version of the operating system that table changes slightly and so that's actually one of the one of the hard things about making those tools that do direct syscalls is that you have to be able to handle a bunch of different operating system environments okay but so that that kind of builds out our structure all right does anybody have any questions about that that flow okay so that would be that the lab walks through that in like excruciating detail so it's like literally step by step and so I I highly recommend that if you're intro if this intrigues you by the end do the lab and then there's going to be we're going to talk about probably 15 different functions and in theory you could do the same thing for each of those and the lab serves as kind of a guide so it'll be specific for this one but you just kind of replace the words and and it it will work so it'll take a few to kind of get used to it and then you'll you'll you should be pretty good so what we do is we then build out the function call call stack so we have the remember the function chain is this horizontal thing and then vertically we have the function call stack right and that's this that's going from the high level function that was called which is this red circle set the red token from 80 V API 32 all the way down to the syscall and so the idea again is that the attacker when they write their malware could call any of these and they would be interchangeable and so you can't just assume that they're going to call the function okay now what we have is this is like I pull it out of the oven right so if you were to go through all the different functions that we had interacted with right in the in samples one and two you would have open process open process token duplicate token set thread token close handle close handle close handle now the reason why this is interesting is because attackers can choose any of these functions and so while they chose this top row here right that's only one variation that's represented by this graphic right they could choose the direct syscall approach which is this bottom line here but they could choose any arbitrary combination right so you could do this in a bunch of different ways turns out there's 158,750 different combinations of functions that are possible and so if we try to represent every function chain that's possible we'd be screwed basically right so there's an infinite number of particulars right so of tools that are literally unique there's an infinite number of them but there's just for this one this one's stack there's 150 I think 158,000 variations I'm not building 158,000 different detections that's just not it's not practical right that's still too many and so we got there's like more abstraction that we could add into this and it'll be it'll be fun now the next question is is what's going on with impersonate logged on user remember that was the one that was like they skipped duplicating the token and it's a different function well let's kind of do the same thing okay I'll try to do this one a little faster so impersonate logged on user let's just go and look at the documentation so again impersonate logged on user function lets the calling thread impersonate the security context of a logged on user the user is represented by the token handle okay if you scroll down to the bottom we see that it's also implemented in 80vapi32.dll cool thing is is we've already loaded that up and so we could just go back to it okay all right so here we go we're just gonna we're going to the exports table right impersonates logged on user double click on it same thing right so just this this was the same for create file it was the same thing for set thread token it's the same thing for impersonate logged on user it just is a jump so they call this a stub right and the stub is just a jump into an imported function we go back to the import table impersonate logged on user and we see that it's implemented by the apimswinsecuritybase l1-2-0 so I did close this but let's just okay so then we just do get into api sets okay there we go and we do the same thing apimswinsecuritybase-l1-2-0 and then we'll do select expand property posts okay and so we see again it's implemented in kernel base so this time it doesn't have the two layers of redirection it's just the one but we go back to kernel base and we can look at the same thing impersonate logged on user now impersonate logged on user is a little bit different so you get one thing that one little trick that you can do in idah is come look at the graph overview and like if you I'm just going to go to set thread token real quick just so you see here's the graph overview for set thread token and here's the the one for impersonates logged on user you can just see that it's a little bit more there's a little bit more going on right and that's kind of a good idea for the relative complexity of things but what we see is that there's some there's multiple api function call so what we see here is we see a call to int query information token right we see a call to int duplicate token right remember we skipped the duplicate step but now we see that the duplicate step is actually self-contained it's contained inside impersonate logged on user we see int set information thread do we remember seeing that previously right so set thread token called int set information thread impersonate logged on user also calls int set information thread right and then we see int close which is equivalent to close handle right so remember we we saw close handle previously so this is what we call I what we call what I call a compound function right so this this is a set thread token is a simple function it just has a vertical line from the high level 132 api function all the way down to the sys call impersonate logged on user actually has branches because there's multiple functions being called within a single function and this happens to be a really simple version of a compound function so again we have four function calls int query information token int duplicate token int set information thread and int close and if we were to go look at all those each one of those makes a sys call in int dll right so just a kind of show like I don't know which one so we already looked at int set information thread so it's int query information token same thing right so we see a sys call and this one it's 33 right is the sys call number and then we could do int close and we see same thing this one is 15 right so it's a you should see that there's like kind of a pattern going on so it's you can start to make predictions about what you should expect to see and hopefully that will when you look at it it'll be right right that's kind of the the general idea but if we go back to our our slides we now have this is a function call stack for impersonate logged on user right and so you see that it gets all the way down to kernel based dot dll but then it branches off and it calls a bunch of different functions okay and and so we have numerous sys calls that are being made int query information token int duplicate token int set information thread and int close right so that gets into this idea of okay well and that so the idea that a single function can do can call multiple functions internally and the idea that we had that 158,750 different variations that that tells us that maybe there's a way that we could represent that more abstractly just like we have the Australian shepherds what if we could just call it a dog and and detect dogs instead of detecting Australian shepherds right and so you know it's called a dog catcher not a not an Australian shepherd catcher I guess and so that that's this idea called that I call an operation right and so operations are abstract categories that basically contain interchangeable individual function calls so the idea that I call int close or close handle and they're interchangeable I should just ignore the difference between those two things and give them a label that labels the entire group of functions that are interchangeable right and and so there's this idea to where the entire stack could be represented by a single by a single node on the in this graph or whatever now there are certain situations to where you might have two independent stacks like for instance there's a function that we saw called called duplicate token and then there's another one called duplicate token ex and neither one of those function like they don't they don't end up calling the same thing right so they're independent stacks but they do they essentially do the same thing one just allows you to change some details about the token and the other doesn't and so those even those two individual function call stacks they would probably be represented by the same operation is kind of how we'd go about it now the way that we define an operation is an operation is an action taken against a secureable object so that maybe introduces the need to explain what a secureable object is you all probably know secureable objects even if you don't know them by that name but a secureable object is essentially in windows anything that you can get access denied from so who here can think of something that you've had access denied trying to trying to do there's like a really really common one and then there's some more esoteric ones a file right so the file is like the archetypal secureable object right so you say I want you know person a to have access to read access for instance and I want person B to have no access right and so they would get access denied but you have registry keys processes name pipes threads tokens services there's all kinds of different things that are secureable objects right and all of those things there's a certain number of actions that you can take against them so for instance you can create a process you could terminate a process you can read the memory of a process you could write to the memory of a process you could do all kinds of different things with the process so the action is the for instance in the statement process create processes the secureable object create is the action registry key read registry keys the secureable object read is the action okay and this is actually how we talk so I don't know if people were in for the panel discussion that we that we did before lunch one of the things that I talked about was this idea how the this idea of snake detection theory and this idea that the theory posits the idea that evolution has helped to shape primates vision right such that we see the things that we need to see in order to survive right so the idea is is that our ancestors who didn't see the things that they needed to see to survive didn't they died and then they didn't reproduce over the kind of long span of history and so the ideas is that we tend to see things that we need to see because otherwise we would have been eaten or we would have fallen off a cliff or we would you know there's all kinds of different reasons and so so then there's this this idea that the way that we talk about things is potentially evolutionarily adapted and so when you talk about your event logs for instance you say I have a process create event right I have a registry key read event I have a file delete event that's how we talk about the event logs already right and so it's kind of a natural it's a natural thing that nobody really thinks about but it could be expanded much further than the way that we currently do it okay and so the idea is is that operations allow us to ignore the differences between say kernel this is this is the note this is actually interesting thing so there's a certain notation we talked about how you have set thread token and kernel and advapi32 but you also have set thread token and kernel base dot dll and then it's like okay they have the same name there's actually a way to represent what dll you're talking about and that's you write the name of the dll and then you do an exclamation point and then the name of the function so that's the kind of convention for writing how that what you're talking about and being more specific but if I want to ignore the the very minute or less significant differences between kernel 32 open process and int dll int open process I could just say that's a process open or process access event right if I want to ignore the difference between say advapi32 set thread token and advapi32 impersonate logged on user the question is is can I do that or what is what are the implications of it and so in order to do the naming convention at first we just kind of like did the naming however we felt and then we kind of reverse engineered like how do we come up with the names that we came up with and does it actually make sense or are we just like sticking a finger in the wind and hoping that it works and we use grammar heuristics so at least the way English works is you have subject verb object I think French is probably the same right all right are most people French speakers I kind of like him look no everybody okay okay well then the way that we speak is through subject verb object right and so the way like if you were to talk about the thing that you're interested in seeing in your event or whatever what what an event is telling you in a full sentence you would say the process named beacon.exe enumerated the sub keys of the registry key named HKLM system current control set services that's like the full sentence right but it if you were to break it up the process right is the subject what did it do well it enumerated something and what did it enumerate the sub keys of a registry key right with a specific name right and so what you could do is you could say a process enumerated a registry key right so this was something that Roberto Rodriguez the guy who did the keynote he kind of like first introduced this him and his brother and so what we do is we say we we take an objective point of view so when people say like oh that's subjective versus that that's objective like like I'm making an argument it's like hey this is objective I'm not like you can't argue this what they're what people tend to do is they use subjective to say that's just your opinion bro and they and they they use objective to say you can't argue with me because it's like this is scientific or whatever but what that really means is what point of view you're looking at the you're looking at the problem from and so are you looking at it as the subject of the experiment or you're looking at it as like as the object that the actions being taken upon and so what we're doing is we're we're taking an objective point of view right because we're saying it's a registry key enumerates event like that's the name of the operation well the registry key is the object of the sentence it's not the subject you wouldn't say it's a process enumerates because what did the process enumerate right the process can enumerate tons of different things and like the process has to like the thing doing the enumeration is always going to be a process so that doesn't give me that much that much information right just like this the process named powershell.exe created the file named c windows temp a.txt right we call that a file create because process is the subject created is the is the verb and then file is the object and so we take the the object file and we take the the verb create and that's how we come up with the name that's the name that's like the heuristic for how we do that now the the other problem with the subjective point of view is that there's actually numerous subjects right so there's the subject which is the process there's the subject which is the user right so a process powershell.exe is running as a user Jared right which is then running on a computer computers like workstation a right and so simultaneously you could say that the process created a file you could say that the user created the file you could say that a computer created a file and so the object the file that's being created there's only one way to interpret that and so that's that's why we take the object of view all right so there's potentially some more ways some more complicated ways to start getting into not every single function call stack is going to lend itself to being understood as an operation just through that grammatic grammar heuristic and so what we are going to do is to kind of talk about a methodology for how you would go about determining what the object of the of the function is right so on windows there's kind of two special classes of functions there's open functions and so there's like open process open open service open open process token I can't think of any more off the top of my head but we could actually literally just go look so those are typically implemented in kernel 32 which happens to be the one that we don't we could look in kernel base so we go to kernel base and we type in open we see open event open file by ID open file mapping open thread open thread token open state open semaphore open process token open process there's tons of different open functions right and so those are kind of like meta functions because they're just opening a handle to a process or not or to a to an object they're not they're not actually performing an action so the way that a lot of objects exist in kernel mode right so like processes exist in the kernel they don't exist in user mode and so you can't just like query processes from a user mode application from like a random application or malware that you're running what you have to do is you have to open a handle which goes through a authentication process or an access control process and say hey I want to kill this process and then what happens is the computer says okay well you're running as Jared and Jared does not have the ability to kill this process to terminate the process and so what we're going to do is we're going to give you an access denied error that's what that's what happens and so if I want to do something with a secure object I have to first open a handle which allows me to go through that access control and then I can subsequently do the operation the the action that I want to do so but you also have these create functions right so create directory create I don't even know what some of this stuff is create enclave create fiber so you have processes then you have threads then you have fibers or like smaller threads who knows who knows what's going on there create file create mutex create name pipe so on and so forth so create process so there's there's some some trickiness with this for instance create file there is no open file because create file both creates files and an open file so it's that's just like some stuff that you learn over time but generally speaking you can assign open as the action to anything that has open in the function name and you could assign create as the action to anything that has created in the function name yeah and these are going to be where your access control happens right so secureable objects they all have DACLs discretionary access control lists and those are going to specify who could do what with with the object and the access control lists are going to be checked upon the calls to opens or creates basically okay and then once you have the handle you can then use that handle and pass it to subsequent things like terminate process if you want to terminate a process there's a function called terminate process if you want to read the memory of a process there's read process memory right for a token there's get token information if you want to know stuff about the token there's also duplicate token which we saw we saw called here okay so there's kind of this now for all other functions set thread token is an example like what does anybody have an idea for what the operation for set thread token might be this is like okay how about duplicate token what do you think the the operation name for duplicate token would be that one's a little bit token do boom yep yeah so it's like a trick question because it's so straightforward right you just like take the first word and replace it with the you know transpose them I guess okay so the the steps for doing this and kind of like a little bit more of a fixed way or a repeatable process is if the term begins with open then the the action is going to then the target object will be the object that's being opened right so if it's open process and the it'll be a process will be the object right if the function only takes one parameter of type handle right so you want to look for handles but if it takes one parameter of type handle then the handle usually will represent the target object right so one example of that would be what do I let's see terminate process so we'll just look at terminate terminate process good so if we come in here remember this is the thing that tells you how to interact it's the syntax right so the way to read this is this is the name of the function terminate process this is the output type and the output type is a Boolean so Boolean means true or false basically so it's either going to say true the process was terminated or false the process was not terminated these are the parameters right so there's two parameters the first parameter is named h process and it's of the type handle right and it's being it's something that has to be passed into the into the function the second one is called u exit code and it's a uint which is just a basically a integer right and so the there's like this naming convention to where h just tells you that that's a handle right and then u tells you that it's a uint right but this this is an example of a function that only takes one handle and the handle happens to be a handle to a process and it says that more explicitly here I handle to the process to be terminated and so now we know that the object of this of this function or this function called the associated function call stack would be a process right okay then there's situations to where there's they take two or more handles so that's it's easy when there's only one handle because you just say what is that handle okay that's probably the object but what about when there's two two handles how do you choose which one so duplicate token is an example of this right okay okay so duplicate token also outputs a boolean but we see that there's a handle here called the and this is this one is a little bit easy because you'd probably get it right if you even if you just guess because they're both tokens but this one is a handle to a token the existing token handle and this one is a pointer to a handle to a token which is a duplicate which is called duplicate token to duplicate token handle right and the way that you would figure this out on this one is by looking at this first column here right in versus out in means that you need to pass the value into the function out means that the the value is going to be passed out of the function as an output right and so in this case the input is going to be the object which is a token right because this is the this is actually the result this is the duplicate that you that you get okay and then you also have situations to where so the idea here is is that if you have two or more and only one is an input parameter then the one that's the input parameter is your object type and then you have the of course they they make it hard to where there are some that have two or more and all of them or more than one of them are going to be inputs and set thread token happens to be that example to where there are two handles being passed in one's a thread and one's a token so you can't just guess and they're both inputs and they're actually both optional which is kind of a a pain in the butt and so how do you how do you figure that out right one trick would be to basically say hey this one's first so that that might be everyone that I've looked at the first one happens to be the object I don't know if that how far that will get you but you could also look at the syscall that's being made right into set information thread right and so generally speaking for these this type there's like int set information int query information int write information so there's always going to be and then there's some object right so in this case it's a thread object that's being set just kind of means writes right so there's set get query and a couple other ones but set kind of means right and so in this case you're writing to the thread you're not writing to the token right and so in this case we would see that the handle of the this is the the syscall basically right the kernel the kernel function is a thread and so we would we would figure it out that way okay so there's kind of like a hierarchical way to figure this stuff out okay and so here read process memory is just an example of the first type where it only takes one handle right H process so we know that that's interacting with a process object this is duplicate token the one that we just talked about and then we have set thread token which was the the third example and we went down and looked at the native API or the the syscall NTSET information thread and it only took one one handle which happened to be a thread handle okay okay so then we have the action right so we we kind of have a have at least a little bit of a method of figuring out what the object is but how do we figure out the action so one way to would be to visit so this is the steps I guess visit the the documentation for the function in question set thread token duplicate token whatever it is find the description for the objects handle type parameter that we just did that right which handle do you look at identify any reference to a required access rights or access to required access or access rights right so say something like a handle to a blank opened with blank access it's kind of that what you'll look then you can google objects so token or process or registry key security and access rights and visit the page find the reference access type and then match it as closely as possible so just like you did if it's token duplicates you would just say duplicate would be the action okay and so we'll kind of do that real quick with duplicate token okay okay so here we are on duplicate token remember we were looking at the existing token handle here and here we see the existing token handle is a handle to an access token right that's our object open with the token duplicate access right and so now we could go and do token security and access rights okay and this is what we would get as the first result and we could come in here and here's all the different types of things that you can do with tokens right so you can adjust adjust the default settings adjust the groups of the token right so you could add the token add additional groups you could adjust the privileges so if you've ever heard of like se debug privilege se backup privilege all those types of things that's all housed inside of the token you could adjust this session ID but you could also duplicate it right and so then we would take that as our action and this actually serves as a guide for the different types of actions that you could perform on a given object right so if you have a token object this is kind of the set of actions that you would expect to see on on tokens we do the same thing for like read process memory all right so you have read process memory the first one is h process a handle to a process with the memory that's being read the handle must have process vm read access rights so then we could do process security and access rights it's the first microsoft paid for the ad so now it's the top top result and we come in here and we can see the types of things so you could create processes you could create threads you can duplicate handles you could query information you can read the virtual memory so these are the types of things that you could do with types of actions you can take against against process objects does anybody have any questions on that no cool so let's look at how that turns out so now what I've done is I've just added I've taken our function call stacks this is open process open process token duplicate token set thread token and close handle and I just added an operation bubble at the bottom of the function call stack right and then what what you do is now you could connect them right and so what you end up with is something I call the tool graph right and so this is the tool graph that represents samples one and two right so remember we have the function function chain we have the function call stack and then we have the operation chain and so what we're doing is instead of thinking about those 175,000 or whatever different or 183,000 different variations we're going to summarize them right so we're going to summarize 183,000 variations in one operation chain and we're going to say hey there's 183,000 variations but they're actually so similar that we could ignore them right assuming that we think about things at an operational level right okay so that's like in taxonomy you know how you have the species genus order family like the kingdom phylum whatever each one of those is called a rank in a taxonomy and so we're going from potentially from the species rank up to the genus rank when we go from function function chains to operational chains we're just going one level of abstraction higher and so yeah here's here's the math how that all works out so in this tool graph notice that there's one two three four five functions right for open process so we have to choose one function from each of these each of these columns if we are building a tool and there's seven functions here six functions here seven functions five functions five functions five functions and the way that you figure out the number of variations is simple multiplication you say five times seven times six times seven times five times five times five and you end up with 183,750 right but this one operation chain describes all 183,750 different variations right so it's it's just a way for us to summarize or abstract away the complexity that's what abstraction is right the reason why we talk about dogs instead of like if I want to tell you about my dog I say I have a dog and you get this warm fuzzy feeling inside or whatever if you like dogs I don't have to tell you every little detail about my dog in order for you to kind of understand what it's like to have a dog right I don't want to tell you oh well they called this I don't want to have to explain every single bit in a program to be able to explain what it does I say hey first it opens a process and it opens a token then it duplicates the token then it you know applies the token to the thread right and so I could I could explain it at a higher level of abstraction and still get the vast majority of the point across and the idea is that if you can start to use that level of abstraction in your detections you start to have more coverage as a result of it anybody have any questions about that no okay so we could do the same thing with impersonate logged on user and so what we end up with is we end up with a single function so previously every function that we that we called in the function chain corresponded one to one with a with an operation but impersonate logged on user is a compound function and the definition is essentially a single single function that corresponds with multiple operations right that performs multiple operations you can almost think of this as like a miniature application that you're embedding inside of your application as a as a developer right and so what we see is this is actually a typo here but we have token query token duplicate thread set and handle close right and then we end up with a graph that looks like this and I'm I don't want to I'm gonna skip that because it had the picture of what I'm doing and so you probably saw it already but now what we can do is we can do a comparison at both the functional level right which is hey at the functional level these two things look pretty different right there's there's some things missing there's a different function call it's different but at the operational level they look pretty similar right and so if we were to do an analysis at the operational level we notice that they're exactly the same except this one calls token does a token query operation and the the interesting thing which we're going to talk about in a second is the token query operation is not mandatory and you might you might think does anybody know how we know that it's not mandatory because this one doesn't do it right so this one does all the same operations except it excludes token query and that tells me that token query is not a mandatory step it's just a nice to have right and so basically like what what we're seeing is that hey these aren't exactly the same and they're they're different at the functional level but they're similar enough that they're probably the same species right and so here we have a different breed of dog that's basically what we see and so we have a Doberman instead of Australian shepherd but like there's a lot a lot of things that they have in common okay now the reason why like I'm saying you should think about things at the operational level the reason why I say that is because we actually see things at the operational level right so I kind of alluded to this previously so this is an example of an operational chain for process injection right so it's this is there's a bunch of different ways you can do process injection but this is like classic shellcode injection or whatever and the way that it works you've probably heard the functions before right so you have open process virtual alloc EX write process memory create remote thread when I was like first starting out in the industry and I wasn't a programmer at the time I had no idea what any of those things were somebody was like hey if you ever see virtual alloc EX write process memory create remote thread you're seeing process injection and I was like all right and I just like lock that away and then every time I would analyze like a suspected malware I would look for those those words and I had no concept of composability or what they meant or how you could interchange them with things like NT read virtual memory instead of read process or NT write virtual memory instead of write process memory I had no idea about that complexity but I was just looking for those but it would have been better had that person explained it to me by saying hey if you see process open memory allocate process write thread creates you're seeing process injection now if you think about MDE or your EDR right what do you actually see what are the events telling you and what what's interest this is a perfect example because what you find is that Microsoft Defender for Endpoints in particular happens to have a single event for each of these operations right so they have open process API call which tells us about process open it has memory NT allocate virtual memory remote API call which corresponds with memory allocates it has write process memory API call which corresponds with process writes and it has create remote thread API call which corresponds with thread create now they've actually named the events after the high level function but the the logging occurs in the kernel so it actually gets the whole call stack basically so there are some caveats with MDE to where it does some filtering like for instance all all off the expert on this but open process I think only applies to LSAS yeah so so some of them aren't going to be like comprehensive however there is any event that corresponds with each operation in this operation chain and so then the question is it's like okay well which one would I prioritize right so if I could see each operation which operation do I pick because you have to pick one right you could do a correlation and you could say oh if I see x and I want to look for y but you have to pick one does anybody have just like a gut feeling on which one of these operations you would rather look for create remote thread okay anybody else anybody have a different opinion than that so the so that's the that's the answer there actually is like a proper answer and the the reason for that is that as you progress so there's there's this neuroscientist in the in the UK named Karl Friston and he he has a theory called what the heck is it called it's going to come to me later anyway the the general ideas is about every decision that you make in life is about path like selecting which path you should go down right and so imagine that you want to your goal you have a goal and your goal is to have a million dollars in your bank account right well there's an infinite number of paths that you can take to pursue that goal right yeah one way is you could go to the bank and rob it right another one is you could buy lotto tickets every day until you win right and the other one is you could save a hundred dollars a week or whatever for 70 years right but all of those are paths that could potentially progress you towards a million dollars some have higher chances of succeeding others have you know they'll happen faster so there's a so the principle is called the free energy principle and the idea is is that what you're trying to do what organisms are trying to do in life is reduce free energy which are free energy is the element of surprise so when when generally speaking when surprise when you become surprised about something that means that you didn't predict predicted right and so surprise isn't necessarily bad right because you have good surprises like surprise birthdays right a surprise birthday just means that you thought that your significant other didn't love you but then they proved to you that they did because they they had this surprise right so but generally speaking surprise is bad and the reason for that is has anybody read there's a book by Simon Sinek called the the infinite game so so there's this idea called of the infinite game right so like if you think about sports you think about what we call finite games right so like a basketball game there's rules that are very constraining on what you can do and there's a finite amount of time and at the end of four quarters which are you know 12 minutes long or whatever then the game's over right and whoever has the most points wins but you could also think about basketball for instance in the context of an infinite game right so it's like well are you playing a game that consists of one game or you can play in a game that consists of 82 games which is a season right and like do you want to be the winner of this individual game or do you want to be the winner of the entire season and then like are you playing a game that consists of one season or you play in a game that consists of multiple seasons right and so then you start talking about like dynasties and sports and things like that and so like life for instance is an infinite game and the problem with surprises is that the high end of good surprises is like infinite right you could be infinitely surprised in a good way but the low end of bad surprises is you're dead right you don't get to play the game anymore the game ceases to exist and so and so generally speaking it's in our best interest to minimize surprise as much as possible because we surprise can either be good or bad but bad surprises way worse than good surprises good if that makes sense and so what we try to do is we minimize surprise so anyway I said all that to say this the idea is is that there's a number of paths that you can take right and so let's say that we're trying to see process injection right well process injection if you use this method we know that we've injected when the thread is created right and so that's that's kind of like the impetus that says process injection has occurred right and we're dealing with uncertainty otherwise right and so as we as we start to back up there's multiple paths that can be taken that aren't going to result in process injection so for instance I can open a handle to a process to to terminate it I can open a handle to a process to create a create a thread I can open a handle to a process to read its memory I can open a handle to a process to do a bunch of things right and so all of those other things that don't progress towards creating a creating a thread those are all false positives right those are all paths that are not the thing that I'm actually interested in and so the idea is is that the further along it's it's actually not the further along in the chain it sometimes you could do thing there could be operations after kind of like the inflection point but if you look at the open process it's going to be more susceptible or more more susceptible to false positives than the thread create right because there's a finite number of reasons why you would create a thread some of them happen to be process injection other ones happen to be multi-threading applications things like that but there's a whole lot more reasons why you might open a process open a handle to a process and there are create a thread and so there's mathematically it's better to do thread create now there are situations to where your EDR might not have the thread create event and so you have to take a less valuable or a more error prone operation as your base operation but generally speaking what you want to do is you want to say what which operation gives me the most information towards discerning that this thing happened that I'm interested in looking at and that's what I want to start with and then you go from there because that that eliminates the most bad pass most surprising paths okay all right and then there's this this is like a rant slide I think but there's this thing that I call the implicit process create so has anybody like been to a presentation where people talk about data sources and they they show this graph of like we looked at mitre attack and mitre attack references data sources and process command line parameters is the number one the number one data source does anybody create like if has anybody thought about like if I could have one event what event would I would I want and then the answer is almost always process creation right and then like the reason why people think that is they're like oh well process creation applies to everything right I could like everything that bad happens happens from our process and so if I have process creation I could apply that to everything right here's here's the problem with that so if we look at this operation chain for this is again process injection do you see a process creates event or operation no right that's weird but like you could detect process injection with process create events but there's no process creation operation that's because process creation in most cases is subjective right there's a subjective perspective of the process the process is doing the action everything that happens on a computer happens through a process and so there's always a process but the process is the subject not the object right there are other situations which will actually we have an example that we'll look at in a second where process creation is part of the operation chain right and by being part of the operation chain that means it's objective which means there's going to be some static aspects to it so the problem with subjective process creation is that everything is dynamic you could change the first the attacker has full control on everything they could change the name of the process they could manipulate the command line to bypass your rejects they could do all kinds of different things to hide or make themselves blend in or appear different right but in the objective sense there's going to be certain things that they can't they can't manipulate right and so that's why when you're doing your detection if you're doing you know there are cases where you want to do that subjective process process detection where you're looking for like if somebody runs memecats.exe you want to detect that regardless of what it does right there's nothing good that's that's that surprising it may not be bad but you know it probably it probably is and if it is you're going to look stupid so that's that's kind of the thing you got to be careful of but if you're doing a behavioral detection you should be thinking about it objectively instead of subjectively all right so there's this idea that we have called simplification and we kind of already touched on it a little bit but there's in every in every operation chain there's going to be a subset of operations that are non-necessary there are things that people could just leave out if they wanted to right so it's like I don't have to do that and one of the examples that we talked about was token query right we said oh how do we know that token query is not necessary well the answer is is that this one doesn't do it and this one does so the fact that this one actually works means that it's not necessary okay so what we can do with simplification is we can reduce combinatorial complexity so you know we had 183,000 different variations well if we remove the the functions or the operations that are unnecessary we can remove the number of different variations that we have to worry about right and we just we can talk about things at a more specific or a more abstract level okay and so what we could do is we can start to look at which which operations can we exclude right which ones are not necessary to actually achieve the outcome and this goes back to that composability thing right one way that you might go about looking into this is like token open okay what's the what was the function that corresponded with token open it was open process tokens let me go look at that real quick open process token so open process requires a process hand open process token requires a handle to a process right and then the question is is like well how do I get a handle to a process well the way that you get a handle to a process is through open process okay and so that means that it is necessary for me to call open process in order to call open process token and then it's like okay well I need to call duplicate token next right well in order to call duplicate token I have to have a handle to a token which means that it's necessary for me to call open process token in order to call duplicate token but it's necessary for me to call open process in order to call open process token so there's there's this composable requirements to where all three of those things are mandatory and then there's ultimately set thread token which is the one that I actually wanted to call that would be the one that we want to detect eventually right and so that that's something that that would be requires a handle to a token that you want to apply to your thread which means that you have to call duplicate token which so on and so forth okay all right then the other the other one does anybody have an idea for what another operation that we can exclude would be I think I alluded it to it earlier the which one handle you have close handles yep so in theory you could just leave the handles open if you wanted to I mean legitimate programmers do that all the time on accident and yeah actually like bad guys are probably it's it's actually if you see handles being closed all like all the handles being closed you know that you have malicious yeah malicious application but generally speaking it's there's like in the short term nothing's going to break if you leave your handles open and so if if we built a detection that was looking for handles being closed the the evasion would be just leave them open and then we'd never detect it so it'd be that that's just that's just a bad idea right and so we can exclude that and so we end up with is this reduced or simplified operation chain which now is a little bit more straightforward there's only four operations that we have to deal with does that make sense yep and so what happens is when we simplify sample one two and three the operation chains are identical right and so now now this is this is what the dog operation chain looks like essentially right and so we have we have this idea morphologically right more morphology is the structure the structure of an application would be the bits that it that it has integrated into it which then represent the functions which represent the which represent the operations this is like if you were to think about us it's like okay well when we think about morphology we think about oh you have two legs you have two arms you have a heart you have a brain so on and so forth well your brains are made of cells right and so those cells go into I don't know components of body parts and then components compose together to build to form full body parts or whatever so it's it's kind of a analogous kind of idea now we can look at so at the operational level that's what a dog looks like right so it's process open token open token duplicate thread set right that's that's a dog and I just meant for anybody wondering I just made that up there's no actual reason why I chose a dog it could be any animal but it just made it it made the next part a little bit easy to kind of explain so here's a fourth sample sample four and sample four calls log on user impersonate logged on user and close handle right so one thing that I I'm hoping you're kind of like seeing so doing all that that stuff I did in Ida is kind of a pain in the butt and it takes a lot of time one of the feedback things when I tell people that is like well it just takes a long time and it's like well yeah but it may take a long time but what you find is you see the same functions over and over and over so if you keep it once you're good to go so just to kind of show how simple that is right we have we see impersonate logged on user so if I'm trying to build that graph maybe this is something I should have mentioned earlier there's this application called arrows.app so if you just go on your browser and type in arrows.app you'll come to this application it doesn't look very fancy right now but what you could do is you could like add these circles and you could change the color of them and you could end up constructing the things that I'm showing you those graphics that I'm showing you but one of the things that we did just to help ourselves I mean to be honest is we created this it's in the same same github user as the as the workshops github we we started constructing these function call stacks and then we would export them and save them right and so for instance I could go to impersonate logged on user.json and it's just this long json file and you could just copy it and you come here and paste it and for whatever reason this thing is like it's crooked noise me but okay there we go okay and now you now you have that there and it's like I looked at it once and now I have it stored and if I ever run across that again I don't have to go through and do all the the work and then the other one that we saw I think was close handle right so that's in kernel 32 close handle control c boom there it is right there so what we end up with is now we have and it has some like nice little features like if I want this to be equally spaced then I just do that and like see the vertical line that just tells me hey you're good and then you could just draw your arrow I like I'm like very OCD so all the arrows that there's certain color there's certain means to certain colors and things like that and everything has to be lined up so I spend lots and lots of time doing that but but so we have that right and so we're kind of just like building this thing out as we go now we do the same thing but we do have a function log on user which is new right that's something we haven't seen before we have I think log on users in kernel or kernel 32 oh no let's see user 32 okay nope maybe I don't even have thought I'd made sure that we had all of them well crap okay so isn't it ADP no okay well we don't have log on user so we'll do that again for the next one but log on user is another just another function right and so we could do we go through the same process we're going to do it a little bit more rapidly this is I think I made the cooking show joke already but this is like the cooking show to where it's like oh I already have a lot sample four in the oven and so we'll just pull that out but this is what the tool graph ends up looking like right and so we have user log on token query token duplicate thread set handle close handle close right so if we were to just look at the operation chain does anybody see how this differs from what we've seen previously no well we haven't seen a process create yet still what did you say there's no open handles right so we're not calling open process open or token open but instead we're calling user log on so anybody familiar with cobalt strike yeah so cobalt strike has this feature called make token and what this is doing is this is implementing like an analogy of of make token so the idea here would be previously when I talked about token theft I talked about it under the context of like I'm on a computer already and I see that Olaf is logged in and so then I want to steal Olaf's token right but what do I do if I have a username and password and that user's not logged in so I have a username and password the user's not logged in but I need to become that user well cobalt strikes answer to that is this thing that he called make token and the way that you would do that is you would take your username and password you would log that user on yourself you would just say I want to log on as Olaf right but he's not logged on in the computer and as you log him on it creates a log on session and it it gives you the access token for that user and so then you just say okay well now I want to impersonate this access token that I got as part of logging Olaf on right and so that's what's happening here so it's the same same process the only thing that changes is how do you get the token so we call that like token attention or how you obtain the token but how you so imagine that token theft is broken into how do you obtain the token and how do you apply the token right and so here we see a difference in how you obtain the token but we see the token being applied in the same way right and so we could do the simplification process token queries out because we previously saw that token query was just nice to have and then handle close is the same thing right and so we end up with this simplified simplified version of the operation chain and this is where we're no longer dealing with dogs right but we're dealing with a gray wolf right now right so like dogs and gray wolves are essentially the same animal right so one is called Connie's Familiaries that's the dog and then Connie's lupus which is the wolf and so here we have a wolf instead of a dog so it's very similar but slightly different now the reason why it's similar is because the change happened at the beginning right but if the change happened at the at the end which is where we ideally want to be detecting things right that would be I would consider changes later in the operation chain to be more significant than changes earlier in the operation chain I don't know I don't know how that's going to hold up over time but that's my current current approaches like changing this would be more significant than changing the stuff up here okay so it's like a like a mullet you know the business you got the party in the front business in the back and so you don't want to change you don't want to change the business wait a business it's the inverse mullet I guess right isn't that party in the back business in the front I don't even know what I don't even I can't even imagine what that does anybody have like mid journey on there yeah ask it for like an inverse mullet and let's see what let's see what we come up with yeah yeah cool okay so then we have sample 5 right and sample 5 we have some similar stuff and some differences does anybody see a difference in what's going on with sample 5 create process with token right that's the difference right so this is exactly the same as samples 1 and 2 except they've replaced set thread token with create process with token w right and so I mean does anybody care to hazard to guess what the operation for create process with token w might be create process boom yep see sometimes like I would say like 90% of the time it's very easy and then you get really cocky and then it like you run into set thread token and you're like oh crap I don't know what that is okay so now we have this function function graph or function chain sorry and it ends up making this this tool graph right and so we have process open token open token duplicate process create handle close handle close handle close so again I'm gonna now that not gonna look like a schmuck hopefully we can go and we can go and do this thing process open token open so we'll go do that real quick so we're in kernel 32 we go to open process just that I mean this is actually a little bit more a little bit harder but because there's more functions being called open process token just kind of do a little bit of organizing then we have duplicate token okay then we had what was it create create process with token w so that is I don't have that one either so I do look like a schmuck okay I actually prepared for the next one so this stuff will all be relevant so we'll keep that there all right so we have the same operate we have the same deal with the operation chain right in this case the way the the reason why you might do this is let's say that you have you have your agent running cobalt strike or you know mythic or whatever but you and you have something that you want to do but you want to do that in a different context right so maybe you I don't know maybe you want to enumerate something as user x and you but you still want to maintain your your agent running as user y right for whatever reason the what you might do is you might kick off that enumeration task as a separate process but you tell that process to not run in the context of user y you tell it to run in the context of user x so the way that process inheritance works is if I if I'm running in a if I run a process in the context of Jared right if if my process creates a child process that child process is going to run in the context of Jared by default so that's just the default but there's an option in this create process with token w there's also create process with log on there's a couple different functions that allow you to specify the user context of the child process right and so this is this is the way to do that okay so we do the same thing with simplification we get rid of the handle closes because nobody does that and now we have a new a new operation chain a new species but this species changed a fundamental aspect of what was happening so previously all the examples were we were changing we were impersonating inside the same process right we were just causing our thread to impersonate but now we're creating a new process that impersonates and so the the detection that you would apply to this would change fundamentally and so now we're saying hey we're a fox right so a fox is still part of the same family the the canine family basically but it's but it's not a it's not a wolf or a dog right it's more different right and so if we went all the way back up to the very top with a taxonomy you would have seen that like fox you have like the dog at the very bottom the very bottom rank then you have the dog in the wolf at the second rank and then you had the dog the wolf and the fox at the at the next rank so that that's kind of where we're getting okay and then the last sample is sample six does anybody see something that's different here this one probably should be a little bit weird you're like why why did you do that kind of yeah I mean it has a little component of each paid like pay close attention to all the function names because one changed slightly so this one calls create process w not create process with token w right and this was actually so Johnny who is the co-presenter with me he he ended up catching a fever and couldn't make it but he like he literally wrote this and I don't think we've we've ever seen this in the wild so this is actually kind of neat because we thought about how we would detect the other ones which I'll talk about in a second and then he was like well the way that I would detect the first three right the dogs right or the the dog in the wolf is by doing X and the way that I would detect this other one is by doing why well this actually like juxtaposes both of those and then causes it to where both detections fail so it's kind of it's kind of kind of neat right so what he's doing here is instead of creating the process with the certain user context he's creating a process with the inherited user context so imagine you create it you're running as Jared you create a process as Jared but then he's telling set thread token to impersonate and so it's the process is created in the normal in the normal way that you would expect a process to be created but now it's but it's but then it's being executed in the in the different way right so here we see open process open process token duplicate token create process W set thread token bunch of closed handles same thing so this is the one that I'm actually prepared for allegedly so we go to create process W okay this one's a little shorter and I tried to get them all to be like matched up as well so that if you just grabbed them they worked one thing that's worth kind of talking about is see this one's a different color it's not black the reason for that is that this is actually a compound function but one of the things that we we realized very early on is some compound functions are extraordinarily compound and there's lots going on and you get to the point to where if you can't do it all you just don't do it and so we kept getting stuck on certain ones because it's like I'm not going to I'm not going to do this for the whole thing and so what we did is we started this is kind of like an orange shade and we started using that to say hey we only went down one we only did at this point we've only done partial analysis and so over time we want to build that out more but we we wanted to have an indicator that said hey this isn't all that's happening it's just one branch of what's going on so then we have set thread token right and I'm not going to do the man we can do the close handles because it kind of it's kind of cool how that works okay so we got that and then we do three close handles right so you just this is like why this became really cool so you're just like boom boom boom and you could just grab all those move them then you grab these two move them okay and the thing I'm trying to demonstrate here is that that the hard part of this whole process is going in and like doing the analysis and Ida that's that's the hard part and but most of the functions that you're going to see are things that A you've probably seen before if you've been doing this for a little while or B something that somebody else has already seen and so you can rely on them to do the work for you so we're trying to make this something that's kind of open so that other people can contribute or we just contribute whatever we do and then you would be able to build these tool graphs without necessarily having to be able to fill in the middle yourself right and so hopefully over time that will grow right now if you're doing process injection or token theft is going to be pretty good the other other techniques probably not so much so those are the ones that we've done pretty extensively but it's it makes it a lot easier to just kind of build this out and kind of get an idea for what's going on what you're going to find as you start looking at like real world samples a lot of things do the same do it do the same thing in the exact same way and so they they end up being functional functionally equivalent but what you what you do is if you find out if you look at source coding you're like okay that calls the same functions I can just ignore you now and I just move on to the next one and you you want to look for next ones next samples until you find one that you're like hey I haven't seen that I haven't seen it done that way before now let me dig into it and see why right that's that's what we're trying to do is like you the way that you start to get an idea is this is an iterative process right so you have to look at lots and lots of particulars and identify where there's a difference and when you're like hey there's a difference why is that different what is does that give an advantage does that hinder my detection from being successful those are the types of things that you want to be paying attention to okay so same operation chain so again like I said what he does here is he obtains the token right and then he's like okay I'm going to create the process that I'm interested in doing but that process is going to inherit the user context of the calling process right so Jared Jared created the call create process the new process is going to be running as Jared but when you create a process the outputs this is kind of neat okay create process this is create process a does anybody know the difference between create process a and create process w Remy yeah what so w w stands for wide characters so like unicode unicode characters so anytime that a function takes strings right so this is a string like the command line of the process that you want to execute you have to specify the character set that will be used and if you specify a you're specifying ASCII characters which is ASCII is technically like a seven bit encoding but it's a one byte is used for each character and that's that's great if you speak English or like most Latin languages but if you speak Arabic or Korean for instance that's not going to work work very well and so you use unicode characters which which is the why that's what w means and what what you find if you start to build out those function call stacks for create process a create process a essentially just calls create process w so like converts ASCII characters to unicode characters and then calls create process w so they're basically the same thing okay and so like it's kind of cool hey you could call create process as user or you call create process with log on w those create processes but maybe they do it in slightly different ways right but one of the things that that you'll see is that one of the outputs is this process information and so if we go all the way down here we have the process information structure and one of the things that it does is it gives you a handle to a thread right and that handle to a thread is the primary thread of the newly created process so processes can have multiple threads but they always have at least one thread and that one thread is the primary thread that's the thing that's doing all the the major work basically and so when you create a process you automatically receive a handle to a thread and then if we go back and look at set thread token this is I thought I would have a quick link set thread token remember previously we had been passing null to the thread parameter because we wanted to apply it to the calling thread the thread that made the call to set thread token but what Johnny noticed and this is why this is why it was cool is he's like wait a second I can specify a handle to any thread and it will cause that thread to impersonate the token that I'm specifying and so he's like what I do is I create a process when I create that process I get a handle to the primary thread of that process and then post hoc I can go in and tell it to impersonate and so the the big detection that that he had for the create process with log on W or with token W sorry was what he would do is generally speaking when people impersonate you impersonate upward and so you go from like admin to system or you you always impersonate somebody that has more privilege always might not be the right word but generally speaking that's the trend right you impersonate someone that has more privilege than you and so the the trick would be you look for a child process who's running in a higher context user context than the parent and that's like why the hell is that happening well it's because they somebody did that explicitly and that's how you would do it well in this case the child process is going to appear to be running as the same user as the parent but the the primary thread which nobody has insight into threads that's just like kind of how life is at this moment right because threads are messy threads threads have numbers they don't have names and so they're really hard for humans to work with because it's like you remember thread 1,574 or whatever it's like no nobody knows anything about that and so basically he's like I'm just going to impersonate on the thread and then you're going to have a problem okay so where was I so we do the same thing and now we have this I felt really clever with this so give me some credit but see this thing does this look like a wolf or a fox it looks like a fox but it's a wolf and so it's a it's kind of like a mixture of the two you know what I mean and so so that's where I like I actually like literally googled the fox wolf or something like that that's what came up it's called I don't think it's a man wolf I think it's main wolf because it has this little tough of hair on its back when you when you put pictures of things and you're in your slides you of course have to look at their Wikipedia and read about them a little bit and it's not literally a mix between a fox and wolf it just appears to similar it appears to be a mix so what we end up with is kind of like a third or fourth species of token theft based on our analysis of the samples right and so just to bring it all home if we were to look at all the function chains we would see that there's some amount of variation and then if we were to look at all the operation chains we see that there's a lot of overlap does anybody see one operation that is consistent across all examples token duplicate right so token duplicate happens across all of them the question is do you have telemetry for that and how much context is that going to give you right so maybe it gives you a lot I don't know and then our kind of final view of a morphological view of these operation chains and I did some like purposeful spacing to kind of line things up so that you can see the similarity so like like we said all of them have token duplicate we see that there's similarities between the fox and the dogs on the early half of the chain right actually only the wolf is different in the early half of the chain I suppose but then what we see is that on the later half of the chain the dogs and the wolves are similar right and the fox is just like doing its own thing and then the the main wolf or whatever is doing a little bit of a mix you know it's doing a little bit of each so then the question is is we talked about so there's there's all these questions that you have about I didn't think I'd have time for this but we got a little bit of time so I can start talking about kind of what's going on here so when you're building a detection I find it valuable to start with one operational chain at a time and then add more and more so like you would want to detect the dog sufficiently before you try to detect the dog and the wolf at the same time right because it can only get harder right as you add more species and so the we talked about this idea of like hey which one is the optimal operation we we see operations right conceptually but we can only see one at a time and so which one would you choose and there's I just like you kind of guessed right and like you were right and that's that's great and I think most of the time that will work but there's actually like a logic that we talked about the the different paths and all that kind of thing there's this this idea of necessity and sufficiency and logic and so you can start to say I have a goal and there are necessary conditions for meeting that goal which means that in order to obtain it like meet the goal I must do this thing right so by if if you don't see X then you know why can't happen if X is necessary for Y right so it's like the absence of X means the absence of Y is that's that's the logical representation but then you have sufficiency which is if I see Y then I can infer that X happened right and so the presence of of Y indicates the presence of X and so that that means that this can only happen if this other thing has already happened and so when you start to think about the best example I have is if we were to talk about credential dumping from LSS so imagine that the operation chain for I don't have a picture of it but we could draw it why not let's do that so credential dumping from LSS let's show what that might look like okay so there's something like process enumerates right I have to enumerate processes then there's process open because I need to find the I like I want to dump credentials from LSS and so I have a name for LSS but that's not the way that processes actually work the way that it works is based on their process identifier and so I have to enumerate all of the processes and then say which one of these processes has the name LSS right so I enumerate then I open a handle to LSS then I have to read process read sorry I have to read the memory of LSS and then a lot of times what happens is that if you call that mini dump write dump function you'll write a file so there's like a file right that's like the dump right and so this is an example to where just because it's later in the chain doesn't mean that it's better right because you have to think about like what are you trying to do right what you're trying to do is you're trying to read the memory of LSS and so that means that this is the optimal right the optimal operation and so we start to think about think about it from a necessary and sufficient perspective right so what you could say is that in order to open a handle to LSS you have to first enumerate all of the processes so enumerating processes is a necessary operation for opening a handle to the process then you could say okay well in order to read it's necessary to open you have to open a handle to LSS before you could read from LSS right so that's necessary right and then like what you could say is that is file right necessary for actually reading the memory of LSS the answer is no right like you could read the memory of LSS and just hold the contents of memory in memory right and never never write it to disk that's what like MemeCats does for instance but if I see a LSS memory dump file then I know that somebody read from LSS right so I could infer that and so the interesting conversation starts to get about this this becomes both necessary and sufficient for the thing dumping credentials from LSS so what you would say is that this is necessary this is necessary this is necessary and sufficient and this is just sufficient right and so that like for instance no attacker is just opening a handle to a process just for the sake of like holding onto the handle and being like look at me I got the handle they're using the handle for something right so it's not sufficient but it's also but it is necessary right and so the idea is that the necessary and sufficient operation is the intersection where you've minimized both types of error right so in detection theory detection theory is trying to solve like trying to identify something in the in the face of uncertainty right and uncertainty means that there's error right because sometimes you're going to be wrong and there's two two ways that you can be wrong sometimes you think you saw it when you didn't see it has anybody ever done like the the auditory testing where they put you in like a silent room and you put on headphones this is like a military thing so I like I did it in the military a lot but I don't know if normal people do it so they they put you in this silent room right and what they're doing there is they're trying to constrain the environment right so like your ability to discern signal in Times Square in New York City is going to be way worse than your ability to do it in a vacuum and so if they're trying to if they're trying to only test your hearing and like the mat and maximize the the range of of your ability they're going to put you in a room that's silent right and so they put you in a room that's silent they put on headphones and then they play signals at different tones and you have a clicker where you click the button every time you hear something but inevitably you're going to hear something when nothing happened right so there are situations to where some the tone will play and you can't hear it and that's like indicative of hearing loss right but then there's also times when you're like you overthink it right and you're like oh was that something okay let me click the button and then they're like yeah you had like five misfires or whatever where like there's no tone and you and you clicked it right so those are the that's the example the two types of error right so a false positive is when you click and there's there was no sound a false negative is when you don't click and there was a sound right and like what you would say is that the tone that's playing that you're supposed to be hearing that signal right and then times the the bit the hustle bustle of time square that's noise but you all so there's two types of noise there's external noise which is time square and then there's internal noise which is sometimes so sometimes you just think to yourself that you might have you might have heard something but other times just in the random firings in your brain the the neurons that that connect to tell you that you heard something may fire arbitrarily when there was when there was no reason for them to fire just like out of that that just happens sometimes right and so and so that would be that would be noise internal noise and so the idea is is that you could have either type of error well the interesting thing is that if you want to let's say let's say that process read is optimal and you but you don't have telemetry on process read for whatever reason the question is is which way do you go right do you go to sufficient or do you go to necessary now you go to necessary you go to sufficient okay yeah it's opposite for me like I was like which way is that anybody else have a anybody disagree with sufficient you can answer because I've talked to you about this all right so the the answer is it depends and the the the thing it depends on is which one are you more sensitive to false positives or false negatives okay and so the idea is is that if you're sensitive to false positives you error in the direction of sufficient and the reason for that is sufficient means that the thing already happened but the problem with sufficient is just because the thing happened that doesn't mean the sufficient thing will happen right and so you have you you have a higher propensity of false negatives if you error in the direction of sufficient but if you error in the direction of necessary right you're going to have more false positives the false positives are those paths that go somewhere else right but you're going to have no false false negatives right because you can't do the thing without doing this right and so the idea is is that if you're sensitive to false negative and that that's a whole discussion about all kinds of things right about whether you should be more sensitive to one or the other but generally speaking every organization in my opinion should at least have some answer for which way would you rather error and under what circumstance like there's this thing called the Blackstone ratio in English common law and so Blackstone was this guy who was a jurist back in England way back in the day and he said that when talking about the criminal justice system he said it's better that 10 guilty people go free than one innocent person be incarcerated right and so what he did is he established a ratio of their ability to accept false negatives compared to false positives right and so he said it's a false a false positive is 10 times worse than a false negative right and then Benjamin Franklin in the U.S. he said it's 100 times worse right so he like I don't know is that doubling down yeah double double down on it and said it's a false positive is 100 times worse than a false negative right and so the idea is is that the criminal justice system can't be right all the time right because you simply just don't have the evidence all the time and so if you're going to error in one direction which way do you error well you let's say you you could error in the direction of like we're never going to we're never going to convict somebody who's not guilty right the problem is there's a bunch of different ways that you could achieve that goal one is you never convict anybody and it's like okay problem solved but then like ideally the the entire purpose of the criminal justice system is that generally speaking you want to be right most of the time right and so it's like okay well how do we establish a threshold that allows us to obtain this you know 100 to one ratio well that's what beyond a reasonable doubt is right the idea is is that you you have to be beyond a reasonable doubt you have to be guilty beyond a reasonable doubt in order to be convicted is the theory anyway and that's and that's the standard so the standard is extraordinarily high right in order to achieve that if if it was like a one to one then beyond a read then you would have there's a standard in in us law at least that's the preponderance of evidence right or I think it's called preponderance of evidence so it's like a 51% standard so it's like it has to be more likely that you did it then then not right and that would be if you had a one to one relationship between false positives and false negatives you would use that standard instead of beyond a reasonable doubt and so that's something that's is worthwhile for people to start thinking about it's like our false which ones are worse false negatives or false positives I think logically speaking false negatives have to be considered worse because the reason the reason that everybody ever gives for why they don't like false positives is because it might create a false negative and so it's like if the reason why you don't like we anybody want to play a game I guess okay so you chose a sufficient so I'm going to pick on you the so let's say that you what do you think somebody would say if they were to say I think false positives are worse than false negatives in the context of detection okay so you're wasting time right yeah yeah well yeah yeah for sure okay so I'm just trying to like walk the dog which I guess goes with the with the theme of the thing so okay so okay so you said oh we have to look at all the alerts that's the problem right and what what is that like you're wasting time is essentially the the problem or you get alert fatigue yep and then you miss stuff and what do you call it when you miss something that you should have caught false a false negative yeah so that the nobody's ever answered that question differently so I like maybe there is a better explanation but I've never I've never heard it so the the ideas is that false negatives everybody whether they accept it or not thinks that false negatives and detection are worse than false positives the problem is is that the ratio of occurrence is like vastly disproportionate so it's like you you go 20 years and never see an actual like maybe not 20 years but like you don't see bad things every day right but you could have millions of events that are legitimate right and so then the question is is what like what how much worse are false false negatives and false positives and it's like there's a threshold at which you become inundated and now you're not going to accept it and so you like it's worth people thinking about that yeah burden yeah see like I'm not I'm not a person that responds to alerts so I like probably have like a little bit of like potentially unrealistic perspective but I think of it like you could literally calculate your alert capacity right so it's like let's say that alert alert handling capacity is like you could mathematically represent it as a function right so it's like number of employees number of hours worked number of alerts per hour and like obviously that's going it depends on the alert and all that kind of stuff but you could average it out or something like that right so you could say let's say you have 10 employees I don't know I'm trying to think of how to make the number easy 10 employees that work eight hours eight hours a day right so that means you have 80 or yeah 80 hours right and then they could do 10 alerts an hour I don't know so you have 800 alerts so you could process a day or something like that so like as an organization it's your your best solution is to give people 800 alerts right you want to maximize your capacity otherwise you're paying people that you're wasting you're wasting the wasting the time right we're wasting the capacity now there are a better set of 800 alerts and a worse set of 800 alerts and so like you want to maximize the efficiency of the alerts that you're giving people but you there's there seems to be a little bit of there's a lot of people that will be like oh we want no alerts unless they're bad and it's like well the company that's like not in the organization's best interest right because you if if your standard is beyond a reasonable doubt for even alerting right then you're gonna not convict anybody so like my analogy going back to the beyond a reasonable doubt is in the criminal justice system you don't just like suspect somebody is doing something and then have to go to a jury and convince them that beyond a reasonable doubt right you do like you have what they call like an escalating standard of evidence so at first it's reasonable suspicion to stop somebody right and then it's probable cause to search them and then it's you know to indict them it's like preponderance of evidence right and then it's in order to convict them it's beyond a reasonable doubt and so the idea is that a lot of people a lot of organizations are holding the beyond a reasonable doubt doubt standard at the traffic stop level as opposed to at the at the jury trial level and so that's actually like I don't know who saw Emilio and Remy's presentation but one of the things my interpretation of what they were saying essentially is stop treating the output of your detections like a jury trial and start treating it more like a traffic stop is that's my takeaway I don't know if you like that an analogy put words in his mouth but all right so we got a couple more minutes but okay so here we have two chains so we have two chains like the rapper but the sometimes you just say stuff and stuff pops into your mind and you're like okay I got to get that out okay so you have you have the first operation chain which is like the first three samples basically and then you have so the dogs and then you have the wolf right and so what you could do is when you're building your detection you find what is the optimal thing and then you would ask questions like what telemetry do I have that would help me understand that a thread was set maybe like most EDRs aren't going to have that and then you're like okay well the next one is token duplicate so on and so forth right but then you start to think okay well what happens when I overlap these things well in this case when you overlap the optimal operation is maintained and so you're like okay that's a good match right so I can I can do a detection across both of these chains and I'm still using the optimal operation but what about in this case this is the the fox I think yeah the dog and the fox so if we overlap now we have the early part of the chain right but notice that the optimal here is the thread set the optimal here is the process creates but then when we go and combine them the optimal so the idea is is that in order for something to be optimal for a inter chain detection it has to it has to be something that they both share right and so in this case it's going to be token duplicate but the problem is is that token duplicate is neither the optimal for the first chain or the second chain and as you go backwards that means you get more false positive so the the combined detection for both of these chains must it can only be as good as the the it could only be as I don't know it could I'm trying to think of how to say that it will have at least as many false positives as either of the two chains or more right that it can't be better you can't you can't can't get any better and then you have a mid chain overlap which is something like this which is also not great right and so you're going to have a similar situation to where the optimal the optimal detection in both both cases isn't going to work I didn't label did I label that no but yeah so so that's the ideas is when you're trying to do multi detection right so what you want to do is find something to where the optimal operations are going to overlap and that's going to maximize your reduction of error essentially is what you're trying to do because there's there's this idea of like irreducible error right so when we start talking about if we come back here this is kind of the last thing that I'll touch on when we come back here I say that what I want to know or like the the thing that the attacker wants to do is read the memory of LSAS right but just because you read the memory of LSAS doesn't mean you're malicious right the problem the problem is is like you could get even more specific it's like it's not just any reading of the memory of LSAS it's like memory of or reading of the section of memory in LSAS where the credentials happen to be right that would be more accurate right that would reduce error even further and then you could say like oh well just because you read the the section of memory where the credentials happen to be doesn't mean that you are intending to do anything bad with them and so it's like well if I knew what they intended to do with it then I could be even more accurate right and so but there's there's a point at which your ability to gather more information ceases to exist right like I'm not going to get the attackers intention unless I have something that I personally don't have so there's some limitation to where you're kind of draw the rest of the owl I guess is maybe the the thing and that's that's kind of like irreducible error there's going to be some amount of error that you're not going to be able to get rid of but the question is how do we get to the best starting point for something like what Remy and Emilio talked about to where you now have indicators that you can use to kind of track things down a little further that's not the slides okay so let's see that's it that's all I got for everybody appreciate everybody's kind of giggles up my jokes and attention and answering questions but happy to answer any like big picture questions or any specific questions over the next I don't know I think it's we got 15 more minutes scheduled for the time or feel free to leave or or whatever but appreciate everybody's time yep thanks yeah thank you good I really like the I saw that you liked it I could tell from your reaction so so so interesting but actually I really like reference of your brain something like oh yeah yeah yeah thank you yeah yeah that's brilliant I mean he's not thank you yeah yeah the like one of the things that I really fascinates me so I think in infosec a lot of the times we act like we have unique problems that nobody else has ever seen and it's like no no no this this problem has been solved by these people and like you know it's like yeah the medical field has been trying to find figure out how to detect things for like ever and like it's literally life and death for them yes and so they have a little bit more pressure to like actually answer that question properly so I love the idea with you thank you yep