 Okay. This is not really a new topic as far as Redos goes. It's an old attack. The thing that I'm bringing to this that's sort of new is automation. And some benchmarking too. But what I mean by automation is with the NFA engines, you have to kill the performance. You actually have to craft an evil string. And normally that takes a little bit of craftiness. You have to understand the expression a little bit, know what would make it perform terribly. That's the part I automated. And I have a tool that I released. And there will be a demo at the end with that. So that's just to know what to expect. But I will go through regular expressions, how the dosing works and all that first. But before I start with that, I just want to give some shouts and some credit to some of the places that I hacked at and did my security research at. That's where I primarily did most of it. So he's seen claps from Phoenix when I used to live there. And then my new life in Manhattan. So NYC Resistor and Hack Manhattan is where I did a lot of the later stuff. So I don't really go into depth on this kind of a slide. But about me, I'm Eric. I'm not a security researcher. There's my email address. My blog is there that has a little bit more details about some of the other stuff with not just regex but some low level assembly machine code stuff I talk about. I'm always talking about how assembly language is too high level. And then my GitHub and at the bottom and smaller just to fit the screen is the specific Pearl script that is the tool that I'll be talking about later in demoing. And when I say not a security researcher, it's kind of funny story. I typically don't include like a company name or title or anything in my bio. So when you don't do that, if you notice there's a lot of people that says security researcher after the name. But I didn't put that because I don't really know what security research is. So I just like hack. So this first part kind of the first third of the presentation. And I'll make this part really quick because there's a lot of hands. So I'll try to race through this. But this is kind of the TLDR of regular expressions. There's more to it but this is 90% of what you would ever use. And for somebody that didn't know regular expressions, like even learning half of this stuff, you could get moving really, really quickly and be useful. So like I said, I'm really going to race this because all the hands that went up, I didn't even expect that. And you know what? I'm almost not even going to explain too much about what regular expressions are. I mean, it's a great way to search is the way I look at it. It's kind of a programming language. It's non-turing complete. Which is being that it's kind of like a language. It's why you can have performance issues. But it's like, you know, when you're doing file searches, you can do like dot star, star dot or a question mark for one character. It's kind of like that. Kind of a syntax but way more powerful. So first I'll talk about quantifiers. Say I was looking in a file or any kind of data packet or anything. If I wanted to find five or anywhere from five to 15 X's, like the syntax of that, that search looks a little bit like that. So X and then the amount from five to 15. And then we have aliases for useful ranges that we use a lot. So if we wanted to say zero to one of that, like the letter X, that's just a question mark after the X. Or a plus is something like say X plus would be one or more X's. And I say one or more, I kind of say one to infinity here but that's not completely true. There is a limit. And then the star is not like a star when you're doing file searches, like the glob kind of format. It means anywhere from zero to infinity. So instead of one or more, it's like kind of like one or more but also the option for none at all is there as well. So there's also character classes. So you can group a certain kind of character. Like I'm looking for like three fives and nines in my string. And I want to find five to eight of those kinds of characters. So because of that, a string like that could match. There's also negative space. You can set up a character class that is negated. So this right here, this little carrot is the negation. And this is the class that we're negating, the comma. So what this really means is one or more not commas. Like one or more characters that is not a comma. So this string down here in red, this is the part of it that matches until we hit the comma. Somehow. Wow. I have no idea how that just happened. So give me a second here, man. Technical issues. What's up? Oh, weird. It's doing that scaled mode. Does anybody know OSX well enough to like get out of the scale mode? Hey, Joe. Derp. No, no, no. That worked. Cool. I'm back on track. So some aliases for character classes. You can do, I'll go in here, white space, numbers, alphanumeric, also underscore. And then if you capitalize it, that's a negated character class of that. So you can say not white space or not numbers or not alphanumeric. And then you have a dot which is kind of like the question mark in globbing. So that's any character except for new lines typically unless not, because you can modify that too. So we can also do kind of like an OR statement. So if we're looking for any of these three words in a string, like good, bad, evil, this string down here, this non-evil sentence would match. The part that matches is the word evil. So it had one of those three in it. We can also group it as well. So if we did the good, bad, evil and grouped it with parentheses, we can also say three or more of that. So any one of those words, we have bad, good, evil, bad. So any one of those three words, we had four of them and it matched all that. And that's that. We can also anchor. So what that means is for the carrot, not in the carrot class, but a carrot means we want to find the word anchor in this case. This is our regular expression. But it has to, the string has to start with the word anchor. So a string that would match is anchor is an anchor. But a string that would not match is boat is an anchor because it does have the word anchor, but the thing is it doesn't start with that word. It starts with the word boat. And then we can do the same kind of thing with the end anchor and the syntax or the sign for that is the dollar sign. So that's the regular expression. And a string that would match is anchor is an anchor because it ends with anchor. This would not match anchor is a boat because it has the word anchor, but it doesn't end with the word anchor. If we want to search a string for a character that is actually a regular expression character, well, we can't just do that plain because it's going to be interpreted as a regular expression character. So we have to, what's called escape it with a backslash. So if we're looking for three to six dollar signs, we specifically have to say, you know, this is an escape, this is a dollar sign, not an end anchor. And this is like the last regex specific thing I'll go into because this is like kind of boring stuff if you already know regex and I'll get into the docing. There's greediness, laziness, and there's also possessiveness. But it's kind of, it's a useful thing to know and actually kind of a source of confusion when things aren't matching the way you want. But a good example is kind of an HTML example here because regex is the best thing to parse HTML. I don't know if you guys have seen that Stack Overflow epic post. But in this case I have a script tag and then it says not really and we end the script and then we have some text and then we start another script tag and we end that and then more text. So if we had a regular expression where we were attempting to try to find just the first script tag and then, you know, everything in between it, that would kind of work if we only had one script tag but because regex by default is greedy, it tries to capture everything. So the thing that it matches is starting from this first script. Wow, it's a whole presentation. Here we go. Starting from the first script all the way to the last script, the second end script tag. So if we want to adjust the first script care, we can use that question mark modifier after a quantifier which is the plus in this case and it makes it go and tell the next thing we're looking for, not everything until the next one. So now read us. Now it's all evil because it's red. I'm going to take a drink in my caffeine real quick. Okay. Before we start attacking there's more than just these two engines but these are the most common types of regular expression engines. There's hybrids too but there's a deterministic engine and a non-deterministic engine and they have different kinds of problems when it comes to performance and usually when you read up on, read us, NFAs are the engines most talked about and the way to dose it is through time. Deterministic engines are a little bit different because timing wise it doesn't really even matter what string it's searching, it's kind of going to deterministically time wise, find out if it matches or not. But you don't get that for free. The downside is it can take up a lot of memory to build a state table. So again if you're going to try to dose an expression you kind of want to know what kind of engine it's using because your strategies are going to be a little bit different. So you can actually do a bit of a recon to find that out because they don't actually, it'd be nice if they work exactly the same as far as the output that you get but it doesn't completely. Like one engine will, if you have a list of ores, one of them will pick the first one, another engine will pick the longest one instead. Also laziness, the DFA doesn't support and possessiveness is handled really weirdly for one of the engines. So we can test it with like a proof of concept, the two different greps, I can show some of the different results and show what I mean when I'm saying that. So the longest alternation thing, the NFA which is the first example here and I'll zoom in, the sample string I'm echoing out, the string is AB and the grep or the search that I'm doing is A or AB and the match that we get is just A. So really what's happening is we're picking the first thing that we see that matches the string and the first thing we see is just A and now we're done, we know it matches, we don't try to get any more than that. We're on the other hand the DFA, we do AB, it grabs the largest thing, it actually matches AB, the full AB. So there's actually a difference in what it actually matches. And laziness, this example is pretty straightforward just because like the DFA doesn't even support it, but we're trying to, we have the search string of AB, ABA, we're searching for A and then any amount of any character until the next A which makes sense to why we get ABA. But if we use a DFA, we're still trying to say the same thing, but that laziness, the question mark doesn't work and we actually get ABABABA because it is an A and then any amount of anything until another A is how it's working. So it just keeps on going, B is any amount of anything, A is any amount of anything and B is, you know, like that. This one is really weird. With NFAs when they find something that matches as it's going along, it holds on to it and doesn't want to give it back, it just keeps going. So in this case we have ABC as the search string for both and our expression is a little bit more complicated, but just to break it down, we're looking for an A and then maybe a B, so zero or one B's and then we're looking for maybe a BC, zero or one BC's in both of the expressions, that's what we're looking for. So for one of them we get AB and another one we get ABC. So it's weird why we get that. When our whole search string is ABC, we only get AB for one and ABC, but if we follow it with the NFA, we get our A and then the next thing is, yeah, we get our A and then there's also going to be a B, zero or one B's, we find that, so we hold on to that and now the last part of the string we're looking at, we have a C and it's not a BC, so we just match on the AB, whereas the DFA will see the A and then the B will match, but then it sees the BC a longer thing will match, so it gives up that first part to match the longer string, so that's handled a little bit differently. And as far as recon goes, that's a big assumption that you actually have the ability to know what the system's expressions are and be able to give it strings and to be able to know what it's matching, so sometimes you can't always do that recon. Another thing you can do is if you can give it a string and you kind of know what the expression is, you can time it and if it takes the same amount of time for a lot of different strings, then you're probably dealing with the DFA. If it's inconsistent on the time, it takes longer for some strings than others, it's either an NFA or it's a hybrid. So we'll kind of do some comparisons between the engines still on how they kind of work on the back end and I'll start with like kind of the lies to simplify things and then I'll kind of like go farther under the hood as we go. So the first kind of like simplified version of how it works, I look at like kind of a labyrinth and how an engine would do path finding. And this is going to be the NFA example, so it will kind of like go down, like say the algorithm is just to follow one side of the maze and if it hits a dead end kind of backtrack and go take another path. So this is kind of what that would look like. You see it backtracks at that dead end, comes finds another backtrack and that's just kind of how that goes. And depending on how the maze looks like it, you don't know exactly how long it will take to get through depending on how the dead ends are. Whereas with the DFA, it kind of in parallel just goes through all the options and picks the longest one that works. So it's going to be deterministic how fast it goes and then you get the longest match. The problem is you're doing all that at once, so it takes more memory. So part two, say this is our example, regular expression at the top, kind of in the blue. And at the bottom I'm kind of wondering, yeah actually contrast is not as terrible as I think at least in my monitor here. Red on red, that was kind of stupid. But yeah, so that would be the red is like the search string. So with the NFA engine, it starts at, it takes a look at the expression and sees how that's matching with the string. So it's going, okay, zero or more L's, no, that doesn't satisfy the P, no, no, there we go, that matches. So then we go on to the next part, no, that L doesn't match because we're still looking for more L's, right? And so that I was not an L. So we go to the next part of the expression. Is that a T? No. Is that a C? No. Is that an I? Yes. So then we can go to the next part there and just keep on looking. Is that a T? No. Is that a C? Yes. Is that an A? Yes. And we can qualify that and keep going. That's a T. Just keep going, you know. And then this is where we start to fail. So is a T? No. Is a C? No. It's an I? No. And then we're done. And that's the match. So the difference with, well I'll get into how the DFA does a little bit later. But also we can kind of like flow chart that out into a state diagram and this is kind of what it looks like. And it's kind of convoluted and messy. But this is what an actual state diagram would be for this specific example back here. So we start at like nothing and we kind of at the bottom here we go is this an L? But it doesn't have to be because we kind of go back. And this is where we kind of like diverge our three different patterns in our alternation list. So specifically what I meant by that is like, well, back. There we go. We have our T group, our CA group, and our I group. So going back there, we have like at the bottom there the T right there, we have our CA and then we have our I. And then in green is like kind of our, we solved it state. It matches state. And if you really wanted to pop the hood, there's a way that you can actually get a pearl to tell you exactly how it's compiling a regular expression. So again this is our expression at the top. And this is kind of the programming language side of it. This is how it's being compiled. And I say it's, at the beginning I said it's non-turing complete because you can't actually have like an infinite loop to the best of my knowledge but in theory you shouldn't be able to have an infinite loop. You can have like an iteration that goes over and over and over and back tracks and over and over. But you shouldn't ever really have an infinite loop. So in other words it should actually finish which is the F and the NFA and DFA. It's a finite automata. And then this is just more of that same expression. So DFA, it's a little bit different. This is like even like more of a lie of how it works. It really doesn't work this way. But it's more string based. So it looks at the string and says does it match this part of the expression? And it kind of follows the string instead of the expression itself. And this is kind of more of what a DFA state diagram would look like. And it has multiple ways that it can match and it picks the longest match. Whereas like a NFA would pick the first one and it's done when it picks the first one. And this is a more accurate way. Like if you're actually writing up an engine, a DFA engine in a programming language, it's kind of like you're going to be setting up an array. It's like a state diagram and you're going to like go through the states. It's a lot easier to comprehend than how an NFA engine would work. So this is again this state diagram. It's not arbitrary. This is still following the same example with like the application string and that regex. So like you start at state zero up here and you would, in state zero you're looking for, you know, does it have an L, T, C, I or A. And you eventually get to that L. And when you do, that just tells you now you go to the state that's listed in this diagram which is one. So now we'd be on state one. And then we'd like look for any of those letters and whatever letter it is, it instructs you which state, you know, two, three or four. Like if you had a C, it would instruct you to go to state three. And eventually you might get like a character that's not there which would mean you fail or you would just, you would match eventually. And that's important to look at DFA in that kind of a context because that gives you an idea of how you can dust the memory. Being that it's a table in memory, what ways can you make that table grow? One way is to have more of those, you know, different kinds of patterns you're looking for would make it grow out or you can just have a lot more states. And I've kind of learned that the states are the easiest way to attack it in practice. In theory, you either should work when practice having more states is the way to do it. So now let's talk about abusing DFA's and we'll talk about abusing NFA's after this, which is more complicated. But thinking of the labyrinth, this is the way I'd abuse it, right? You give it multiple paths so you expand that memory out because it has to traverse all of them all at the same time. So that's what I explained there and that's kind of an example. That's a POC. That's an expression that not even like getting to the point of giving it a string to search, just the fact that it has to load that expression up as a state table, that would consume a lot of memory right there, that expression. It's because to break it down, in this first set of parentheses there, we're looking for 0 to 75 A's and then we group that and then we're looking for 0 to 75 of that and that's grouped and we're looking for 0 to 75 of that. So it multiplies out pretty badly. Now we'll talk about abusing NFA's. I'm going to check the time real quick here. I'm going to pretty fast. So this is another labyrinth. One important thing is there is no way to solve it. You see like a startup here but there is no solution. And keep in mind it tries to like go every path it can to try to find the solution but if it doesn't it kind of backtracks and tries another way and backtracks and tries another way. So if you gave it a way out, once it got out it would be done. It would be matched. So you just make it try every single possible thing and in the end it doesn't even match but that takes a lot of time and that's the way to abuse NFA. So the way to kind of conclude that is it tries everything until it finds a match. So I don't know about the contrast here but they have a pretty easy to understand example of how that should work. You have an expression that has to start with an A like one or more A's and then the grouping one or more of that and you end anchor it because then you can like give a different character to make it fail and that's kind of what they do. Their example is say like you had four A's and an X that gives it 16 possible paths that it would have to take to find out that it's not matching but if you just gave it a little bit more A's like that example there that's you know 65,000 ish different paths has to take which takes so much longer but kind of unfortunately that's the naive assumption that the regex is not going to do any or the engine that you're using is not going to do any kind of little bit of optimizing. So like optimization like without your thinking I'm going to flood the world but in reality like it's really getting optimized to that right there. So kind of a metaphor for that for any people that are hardcore C people with compilers you know they do some optimizations as well. This is even if you don't know C this is really kind of child's play like this is saying the number we get if we're actually we're not even taking an info we're just saying hey if five is greater than zero print true otherwise print false like that's pretty simple. So when we actually compile it and look at it in a debugger it's not actually doing a whole lot we're just making making a call here after we set up a stack frame and you know we already have this true populated in our register there and if we look at the hex of the program itself we have true but no false. What that's really meaning is it's just straight going to the print out it's just printing true because the compiler knows that in this case back here there is never a situation for this code that false will ever be true it already knows that so why even compile that why even make code for that. So we have to trick the optimization and only just a little bit just simply so we make that variable so we have a variable that's equal to five and then we test to see if that variable is greater than zero and that's all we have to do to make the compiler like not be able to understand where that's going. So when we do that simple modification if we were to look at that in a debugger we actually see that it's moving five into an area and it is comparing right there with zero and we do have some conditional instructions until we actually get to the call to print out and then if you go into memory you actually see that it has both true and false state so that's what we need to do for regex like I know it's kind of like a tangent kind of analogy but we can kind of do that same thing with regular expressions. So in the OOSP example they were using a plus so what's kind of similar to a plus well we can use the curly braces the range thing so we can say I should be done with memes now but this is our range it's like almost like one or more right. So to kind of format that OOSP example in that way to replace all the pluses with one to more than 9,000 we have that there and then grouped and then we have that one to nine more than 9,000 again and end anchored and then I will zoom into this but this is me benchmarking it so I didn't do A, X, I did A, B with that expression there and I timed it just user bin time and in that first one there we have like almost a second and you'll notice I'm increasingly adding an A each time I test the timing for it right so I go back over here to the actual timings one second, two seconds, four seconds almost 10 seconds, 18 seconds I can kind of see what's happening here after we add one more A each time it doubles and then you see these two down here that it like took you know .02 seconds what that really was was the original OOSP example we see that it is actually getting optimized and that it doesn't take a long time so you still have to trick it and you might be thinking there's not a lot of times that I have control over the expression itself because this is considering you do have control and you do get to make a bad expression it does happen rarely though like one's kind of really naive but still it's happen scenario is you have a server side validation for someone trying to sign up for an account on a website client side you just toss yourself but so server side you want to when you're registering your username and password it wants to make sure that your username and password are not the same you should never use regular expressions to check that but say you did well for the username you make a really poor regular expression and for the password you have a string that dosses it and now you're dossing that on the server. So yeah it was a run down to the benchmarks and I get a little bit more gnarly about that later on. So now we'll talk about the automations the theory about what goes into it and I'll show some cool and funny examples of that and then I'll do a demo of the script that I wrote. So for DFA's actually it's kind of funny because DFA's haven't gotten that much attention but I mean as far as the research but it's actually kind of the easiest to benchmark and that's really all you can do you can only benchmark it you can't really like make something worse for an expression. Expression already is like either good or it's bad so if you're dossing it you just kind of have to be aware of what a bad expression is and for typical kind of like doss or d-doss situation instead of like just loading a webpage a lot you just do a post to that expression with the string that's arbitrary the string doesn't matter but now you're just making it load that state table and consume memory. So what I did to benchmark it is I used the RE2 module I did it with Pearl and I just kind of slowly starved it of resources so like I would load the state table and then I'd use the module again but like I'd tell it to use a certain like a little bit more memory and then I would you know try it again a little bit more memory and eventually I'm going to get like some errors saying it ran out of memory you know and then I just capture that and record it and this was a yeah so that's all for DFA's I'm like right into NFA's now so for NFA's it's more complicated the string matters so how do you like automate that how do you say like look at expressions no well this one's not going to perform well when you don't know what string to test it with you can't just give it an arbitrary string and time that so you kind of have to like some way automatically generate a string that's bad and in my case also generate a string that's good too because then I can not only compare like is a string just generally bad for any strings or is it like an expression that might be okay given your environment but somebody can still abuse it and I keep track of all that and one way to think about craft crafting a really bad string is to I call it a long circuit attack because when you're thinking about programming languages so you had a bunch of or is in the conditional like if A or B or C or D and on if you're a variable had A it's not going to evaluate anything after that it's like it has A and it stops so if you long circuit it you have a string that's like a well it's it has everything like ABC but then the last one is not so it fails at that part another thing is if I see a quantifier my string is going to pick as many as possible if it's you know A plus I'm going to put a bunch of A's if it's A 1 to 15 I'm going to pick 15 A's and for any alternations that I see I picked the last alternative so I have to look at all of them before it gets to that. So here's an example this is the expression either an A B or a C D or a Y Z anywhere from 1 to 20 of them and then a G so a simple example that would match and this is like it matched the quickest there's an A B satisfied and that was one of them and there happens to be a G we're done we match. An example of what would take a lot longer is there's a Y Z but you know you have to look at the A B no C D no Y Z yes okay there's a Y Z is the next one a G no okay well is it an A B is it a C D is a Y Z so we do that 20 times and then eventually A and we picked A because it's not G so still has to look through everything in backtrack everything in backtrack and that's the longest it would take to solve that one and as far as the complication to automating this say our expression ended with a G and then the star which means zero or more well zero is an option so again being that I have to negate that last lexeme this Y Z Y Z Y Z A would still match because I don't need to have a G at that point so it still works so I have to still find a way to negate the last lexeme that isn't optional so anything star any zero to whatever number of quantifiers like I can't use those or even the question mark one so I made a script called like it was a bench rex is that PL and I'll show you some examples of actual output it gives for arbitrary strings and then I'll just show you it working but I thought it'd be cool to look at some examples to see what kind of tricks it does for it so the expression at the top is high that's the regular expression my script would output ha like it matches the first part and it negates the last part and that's kind of how that works I started simple like that so say being that we got off from here let's use ha as the expression and see what it does with that I say ha then it does h1 it still negates it say did one to 15 a's and then an h and then then a bang well then it's like the you know again the most amount of a quantifier so it's 15 a's and then an h and then not a bang it's an a so it still fails but makes it have to do everything possible again like a big long alternation and of a lot of alternatives than a D and then a one so it picks the last alternation the YZ does a D but not a one if I do any amount of h's like we're one or more h's than an I a lot of h's and then an a this is this one okay so it has seven x and then one or more or one to ten a's and then that one to ten times but it picks just a one because it can't have any a's this one's more interesting it's kind of the same thing but we're just adding an end anchor to the end of that so it starts the X it does a whole bunch of a's so it can end with the one so it has to do a lot of backtracking to see so it has to it takes the most time to evaluate no it doesn't match so this is the script I was talking about what you do is you give it a text file with a bunch of regular expressions it goes through them all it does the DFA memory testing it generates the good and evil strings it tests to see how long it takes to run each one and then you can even have it output a CSV file so then in the spreadsheet software you can sort them by best or worst or whatever and for my research just to go through a lot of list of expressions I thought it'd be fun to look at you know see the emerging threats IDS rule set and also regexlib.com and that was like the best for debugging my script because there was some like really really strange and terrible expressions on there that just broke my script and some of them still do because like I don't actually try to validate whether it's a real valid expression so I would say like to be honest that the script is still a little bit buggy but it still works pretty well and I also mentioned what I tested it on because I'm not going to say like this expression universally takes 1.5 seconds what it does on this machine but it's still to give you an idea so I'll look at some examples of real stuff in the wild. This is the most complete URL validator. If you don't need the most complete URL validator don't do that. If you're using a DFA that would take 150 megs each shot you use that expression. This one's not so bad but I was just throwing an example in for something that's not so bad. This is supposed to validate long Windows file names and it uses like you know less than a meg of memory. This is probably the worst time based or DFA based attack from regex lib. I don't even know why I used an expression like this but it's supposed to match any valid human name like people's names like Mr whatever you know. This expression takes more than four seconds each time you use it with an evil string and now I get into some of the IDS rules. This one's not so bad memory wise but I'm just showing it here. This one's really bad time wise. The actual rule is the emerging threats, active x, image check, toolbar remote code execution. For the evil string that my script generates it takes 1.6 seconds to evaluate and of course your like beefy IDS machine is going to go a little bit quicker than that but the way to think of it is if it takes longer for your IDS to evaluate this string or packet in this case then it takes for me to send the packet. You got a problem. Oh and by the way last year at DefCon I did a presentation on various things but one of them I released a script called 8 ball that will attempt to trigger every single IDS rule on an IDS. You feed it an IDS rule set. It kind of deconstructs all the rules and it makes packet for each and then you send it off to a target and these IDS rules have regex in it so this 8 ball script I added a speedball option so now you can tell it to do read-offs for all of these packets as well. And in this case it wouldn't actually trigger all the IDS rules. It would actually make the IDS take as long as possible but they all fail to match too so you wouldn't even see alerts in theory. And this is just to show an example because I'm talking about the benchmarks but if you want to see what a string looks like with this tool for a real expression that's what it generates. And it's cool. That's kind of the point of automating because I read the OOS page and I'm like yeah obviously that makes sense but I don't think I would ever really look at this expression and think yeah this is a terrible expression. This would take a while to evaluate with some string I haven't thought of yet. But when you automate it I get something like this that still doesn't look that crazy but it still takes well. Yeah. And then looking at it in a DFA context this is where I like give examples of the worst I can possibly throw at it. Because like yeah 150 megs that doesn't sound that bad or I mean this actually I think that other one that was really bad. Just want to see. No yeah that's really, wow like that's kind of funny like when I tried to be the worst I possibly can that's about as good as I can do so this one must be really bad. Because this is me trying to do like a really a simple like bad expression by building up a big state table and I barely get like 10 megabytes more. But this is the worst I can do myself. And then for time ones I mean man forget about it like to do really really bad with that I give a few examples but I start with this kind of expression and this is the same expression I'm going to use every time but I use different strings. So like this string with 40 a's and a b that would like take two days and I actually tested that. That took me two days right. And then in theory like and it is really like times two it's not like it doesn't kind of trickle off. So 54 a's would take like a lifetime and 81 a's would take the existence of the universe based on like 13, 14 billion years. It will finish it's not an infinite loop it will finish but that doesn't mean much if it's going to take that long. And kind of a sidetrack and I'm almost done I'm going to show the demo real quick and that's not too long but I thought this was funny. I look at an expression like this up here and I'm not immediately thinking what this is looking for but this is my non-doss engine and my DOS engine and these are the strings that it makes for it. And I just think that's kind of cool you know because I don't look at that expression and think yeah Viagra but this is from Regexlib and there is a regular expression that was meaning to look for Viagra probably for spam you know but that's kind of cool too. Automation is funny. And then more recently I didn't dig into this one a lot but Yara too. You can use regular expressions in that. And again I kind of so the funny thing is I wanted to see okay is this an NFA or a DFA so I had to use my own little tricks to try to see what that was. And it turns out it's definitely not a DFA because it's not deterministic time wise but it doesn't seem to be pure NFA either because it's not like it doubles for every A I add. It seems like it is hybrid but it definitely has some NFA elements based on the exponential increase in the strings that I give to it. So an actual test that I did was like a 100 byte file that is checking for malware. This is like the Yara thing. So up here is the rule that I'm using and I have a file with 99 A's and the B and that took me like 13 hours to see whether that was malware based on that definition. So like that's still kind of bad but again that's like harder to attack because I don't know of anybody that's going to be using a definition like that. This is just me trying to do a badly and really out of a lot of the definitions I've seen most people don't even use expressions anyway. They just use strings or hexmashes and build complex logic or the condition section here. And you know, Pixar didn't happen. This is a screenshot. So it's me timing it, running Yara on my rule file with that text file, the DOS dot text. This is the 13 hours, 19 minutes. I'm showing the output of an expression that's similar and then the A's and the B and then I'm showing, you know, hey it's 100 bytes down there. And yeah, so demo time. It's a good time for me to take a drink again because it is testing the timing of this and it does have time out so it's not just going to go crazy and take like, you know, your lifetime. So you can give it, by default it times out at like 30 seconds. So if you are getting expressions that time out, you can actually define the time out as longer if you want to really explore that. But by default it times out so you don't go crazy with this. And you see I'm running the tool against a text file called PCRE simple dot text and then I want a CSV. That text file is back here. I'm giving it just nine expressions. Some that look kind of familiar as bad ones. Some not the Viagra one. And I also have the bad IDS one and the bad Regex lib one. And it has out, it has finished. So I'm going to open it up. It's tab delimited. And I know it's kind of weird, not common delimited, but it was easier because a lot of expressions have tabs in them and I have the expression output in it as well. So like I just, that would be annoying without using a really, really good module for CSVs. So those are all expressions. We're going to see how long they take. This one is clearly the worst of those. 11 seconds. But you also see the four and a half second one and the one and a half second one. So that's the DOS time. But I also time how good it works on a good expression. So that bad expression here, the one to eight, one to eight thing with the anchor. Normally it can take like a split second but it could take up to 11. So that's the interesting thing. It's not always bad. And I give a delta, like the difference between the good and the bad. And then I give the memory for assuming it was a DFA. So clearly this one is probably the worst down here. And that was the one that builds up our state table DOS. So that's all I'm good to know, like, which ones are the worst for time or memory. But it would still be interesting to know. Yeah, I'm almost done. It'd be interesting to know what the good and evil strings are as well because that didn't tell us. So I gave an option for that. It's just dash dash strings. And it doesn't actually measure the time. So you just get it instantly. So if I go up here, you actually can get the feedback right away. So A dot plus RGH. It's like arg, you know, for the DOS string and then simple string it's just arg. Some of them aren't really that DOS-able. Like this one here, the DOS string is A. That's not really going to take any time. And the simple string is B. Because it's A because it's negating the whole thing. A bad one for the DFA would be this. But the DOS and simple string is more for the NFA. And yeah, you get your Viagra there. You get the one that you saw on the deck here. And then you get this strange one down here from the Regex Lib. So that's the demo pretty quick. And it's on my GitHub, which again, if you want that to address, it's the guy down here. And it's Silver. Thank you.