 Welcome to Def Con. Let's do something crazy. Okay, that's the name of the talk. I was feeling really ballsy the first time I wrote it, and then I backed off. Here's our agenda. Here's what we're going to do. We're going to start very simple. We're going to cover some basics. So if you're really advanced and it's too easy, sit tight, because we're going to get some cool stuff on the third bullet. By the way, I don't read the slides to you, so I kind of assume everybody reads as well as I talk. So this is the story of a project that I did, because I was really interested in it. I spent about two and a half years on it. I got really interested in DNS tunnels, and what fascinated me about them is that you can look at a DNS tunnel and immediately pick it out. If you know that there's one there, you can immediately pick it out of DNS traffic. There's no mystery to it, and yet they work amazingly well everywhere because no one can find them. So what I wanted to do was automate that ability to just look at traffic and pick out a DNS tunnel. As I said, this is the story of that project, so this talks about a lot more than that, hopefully. So first of all, when I'm talking about AI, I'm not talking about anything fantastic. I'm talking about real AI, like what you see in video games, what you see in spam filters. Spam filters are an amazing application of AI. They work tremendously well, and that's the kind of AI I'm talking about. I'm talking about getting a machine to make a sophisticated decision. So the decision in particular that we're going to try to make to pick out a DNS tunnel is classification. So to get our minds on how this AI works, we're actually going to talk about how your brain works a little bit. We're not going to talk about the internals of AI because that stuff's boring and you don't need to know it to write AI applications. So if you see this picture and I said which is the apple, which is the orange, and which is the banana, most normal people would look at the red globe there and assign apple to that. Most people would look at the orange globe and assign an orange to that. They'd look at the yellow and assign a banana to that. So what did your brain do there? Here's kind of how your brain works. You looked at, I'm sorry, let me start over. You thought about the real objects, like an apple. You said, well, an apple is red and it's delicious and it's sweet and it's crunchy and it's round. Then you thought, three of those don't even matter. So you threw those three away. For that classification decision, we don't care what the apple tastes like. We don't care that it's crunchy. So you discarded those and you took two traits. You took color and shape and you assigned a weight to those. You said one of those is much more important than the other. So even if you didn't realize it, that's how your brain works. You took an abstract concept and you pulled the weights and traits out of it. So we're going to talk about traits and weights a lot. So that was a really easy example, right? You only had a couple traits, color, right? But here's actually one trait, we're still talking about color, but it's much more complex. Because what if we asked you to tell me when does green end and when does blue start? And that's a lot harder, right? Because everyone has a different opinion on that. So that's where thresholding comes in. So thresholds would be your personal threshold of when green becomes blue, right? So the whole point of this slide is really so I could get a naked dude up on the screen at DEF CON, but it worked. So here's a really complicated classification decision. Which one is modern man? Well, people debate this all the time and we still can't decide. Because there's hundreds or thousands of traits, right? And everyone assigns weights to those traits differently and we still can't figure it out. Okay, so let's talk about neural nets for a minute. So there's a few key terms and if you grasp these key terms, then you're good to go. Okay, nonlinear statistical data modeling. So for our purposes in this room, nonlinear statistical distinct math, okay? Numbers. Data modeling. Okay, we're going to take something from real life and we're going to model it. Pretty simple, right? So we're going to model something in math. Adaptive. So this is key. This is really the point of the whole IDS project is that it should learn. It has this ability to learn. It makes a mistake. You can tell it that was a mistake. It should go back and not make the same mistake again. Okay? It can be used to model concept relationships between inputs and outputs. So it's got inputs. It's got to give outputs. Our inputs are going to be DNS packets. Our outputs will be, hey, I think this might be a DNS channel. So don't be scared of AI because for whatever reason, if you're going to do firewalls, you go to the bookstore and you pick up a book on firewalls and it's how to use firewalls. But if you want to use AI and you go to the bookstore and you pick up a book on AI, it's how to write your own AI code which is not useful for anyone, right? We're not going to do that. We're going to go use someone else's AI code. We're not going to do everything from scratch. So I used fan which is fast artificial neural network. It's fairly popular. It's C. So at that point, all the AI stuff is pretty much done, right? What I'm going to do with that point on is I'm going to pick my inputs, which are the traits. Remember we talked about traits. We're going to give them some values, right? Weights. Some traits are worth more than others. Then we're going to adapt our decisions until they give us the output that we want, right? So basically what that means is we're going to tell this neural network, learn to do what I want. And supervised learning is actually the AI term for what we're doing there. That means that we're going to give the neural network some answers so that it can crunch its numbers and know when it's right. That's how it's going to learn. So again, we're not going to go into the nuts and bolts of artificial neural networks. That's boring. I hated it. So again, you don't need to know how they work. Go get someone else's artificial neural network package. The reason why the slides are so repetitive and verbose is because I tried to write slides that you could come back to later and learn the whole thing over again if you forget. So we're clicking along on the agenda. That's neural network basics. That's all you really need to know to write a neural network application. Okay, what's a DNS tunnel? So DNS is the domain name system. Its primary purpose is to assign IP addresses, or to relate IP addresses to domain names, okay? And DNS tunnels are such a problem because they're a covert channel that works anywhere. And even though this project is really about DNS tunnels, I'm hoping that audience members will see that there's a little bit more here than that. So let's walk through a real basic DNS tunnel. I don't know if anybody can see this from where you're at. But in the lower left-hand corner, we've got a host, and he makes a request, whoisdata.badguy.com, okay? And that request goes to his DNS server, and the DNS server says, I don't know who data.badguy.com is. Let me ask a root server. So that same request then gets passed to the root server. Root server says, I don't know either. Let me go ask badguy.com. Who's authoritative for that? So he goes to the actual authoritative badguy.com server. He sends a packet, data.badguy.com. Badguy.com takes data off, right? That was the actual request. And then he can either just eat it if it's data exfiltration, or he can respond back with an IP address or a senior record or whatever for command and control tunnels. But as far as exfiltration goes, that's the gist of it. Everybody clear on that? Because we just explained that really fast. And if you've never heard it before, I want to make sure you get it. Okay, so key points are this. It basically gives you a high-level way out of a network, right? We're just requesting things in an application layer, and they're magically delivered across the globe. I won't have to worry about routing. We don't really have to worry about firewalls. We don't have to worry about ports, IP addressing, none of it. It's all application layer, and it just magically happens. And then, of course, DNS requests that are not cached, get routed. That seems pretty obvious. So this is just another verbose explanation of how to turn DNS requests into a tunnel. I'm not going to repeat it, unless you want me to. Okay, so now that we know what a tunnel is, here's a little takeaway. Like I said, there's a little bit more to this talk than just catching DNS tunnels. So IDS and IPS are signature-based, almost everywhere, right? And I'm not saying that's broken, because we use it and it works, but it's not enough. And it hasn't been enough for a long time, and the returns it's giving are less and less every year, right? So almost everybody I know who does a lot of IDS or IPS work ends up looking at egress traffic. They spend a lot of time looking at egress traffic. What's leaving your network, right? Where are the tools to do that? I'll take your silence to me, and you can't find them either. Okay, so we spend a lot of time looking for egress, command and control tunnels. We spend a lot of time looking for data exfiltration, right? And the reason for that is if someone gets into a network, they have to have a way to get out, okay? So all we want to do is for this specific project is block off one pathway out. Okay, so we've covered neural network basics. We've covered DNS tunnel basics. Let's do some fun stuff. So why use an artificial neural network to look for tunnels? I mean, what's the point, right? Why use AI? So I wanted to get away from packet counting, which is how most things work. Packet counting has very limited returns. It does work a little bit, but it doesn't give me what I wanted. I wanted to turn away from packets into traits, weights and thresholds. And guess what? If you're using a neural network, all I have to worry about are traits. The neural network will self-adjust the weights. It will self-adjust thresholds during the learning phase. All you have to do is come up with good traits, feed it to a neural network, give it an answer sheet so that it can learn what it's doing correctly, and it should do the rest itself. That's the adaptive abilities. So again, this is just to drive home the way it should work, right? Which is, let's say that we're feeding our neural network a bunch of DNS traffic, and it's not catching a DNS tunnel, okay? Well, when we discover it later, we pull the traits out of it. We refeed that to the neural network, and we say, hey, you missed one. It should automatically learn how to catch it the next time. It should not get fooled again. And we'll demo that here at the end. And of course, it's also to get us out of signature writing. Let's leave that to vendors. It's a chore. So this slide used to be really snotty. I was really fired up and I was tearing into some vendors, and then I realized that may not be a very smart thing to do. So this slide's been neutered quite a bit. All the IDS and IPS out there aren't very good at finding DNS tunnels. It's just not the vendor's fault. It's just how they work. I don't see a fix to that. And then this last bullet is just something I wanted to address in the talk. And that is, if you do searches on the web for artificially intelligent intrusion detection, you'll find a lot of papers and lots of projects, okay? Those are mostly graduate school projects. They're terrible. They don't work. You'll find a lot of academics talking about it, but they make two big mistakes. Number one, they're too focused on packet counting. So they're packet counting and then they're passing that to a neural network. But we don't need a neural network to do packet counting, right? We can do that with a bash script or a Perl script or something, right? So why involve artificial intelligence in that? And secondly, on that matter, who cares if you get scanned? Who cares? Everybody gets scanned. The other thing that's wrong with their scheme is that they keep trying to come up with artificial intelligence schemes to look at the output of signature-based IDS. Well, again, if the signature-based IDS isn't catching what we're looking for, coming up with a new way to look at the output of the IDS box isn't gonna help, right? There are some good research. There's one in particular where they came up with an idea with a neural network scheme to catch new shoe. It's a pretty well-known paper. One of the problems with that is they don't release the details of their scheme and they don't release their software, okay? So my scheme may be flawed, but I brought it to DEF CON and I released it with software. Okay, so step one in our DNS IDS project. You gotta frame the question correctly, okay? So don't start out trying to tackle a big problem. You solve a little problem. And the problem that we wanna get is this DNS request part of a tunnel, okay? That's pretty much all we care about. So if we were to take this and apply it to other exfiltration methods, let's say we wanted to look at HTTP traffic. Is this HTTP traffic being used to exfiltrate data? That one's a lot harder. We're gonna have to break that down into lots of different decisions. So we're probably gonna have multiple neural networks making multiple decisions about the destination and the content and all that kind of stuff. That one's hard. I chose DNS because it's fairly cut and dry. There's not a whole lot we have to consider. So again, all we care about, all we're asking this neural network to decide is, is this part of a tunnel or not? So we're gonna pull traits, because we already talked about traits. These are pretty straightforward for the most part to begin with, right? We're gonna pull every domain name that goes past us. We're gonna count how many packets to that domain. I'm doing it. Let it go. Let it go. We're gonna count the average length of the packets. So we're still just packet counting. That's not really useful. Average number of distinct characters in the lowest level domain. So let me address the lowest level domain right now. I use this term a lot because I like it and I think it's descriptive. I have been corrected on it before. I'm still going to keep using it. What I mean by lowest level domain is the piece of the domain name that is the furthest to the left. The opposite of the top level domain. I like the term lowest level domain to describe that. And there's still something missing, right? I mean, we're not doing a whole lot better than packet counting. We don't have enough. We gotta do better than that. I wouldn't bring a packet counter to DefCon anyway, right? So I was told there'd be no math. This is where it starts to get bad. So my original plan was that words have a lower entropy than data. And I was going to figure out a scheme to measure this entropy. And then we were going to feed that to a neural network and let it make some decision based on this computer science equation of entropy. And I spent a long time on that. And the slide exists because I think there's some value there, but I couldn't get it to work. So that's pretty much there for historical reasons. It is not in the proof concept. Okay, so this is a money slide. If we have two groups of DNS requests, like we do here, top and bottom, okay? And I told you one of those is being used to exfiltrate data from a network. Which would you think it is? I mean, you would intuitively think it's the bottom one, right? It's not the top one. Why? I mean, what is your brain doing there and why can't we get a machine to make that same decision? It shouldn't be that hard, right? So I thought about this for a long, long time. And you'll see what I came up with. By the way, why would you think that the top one is legitimate and the bottom one is not? It's rhetorical, you don't have to answer. And furthermore, if I give you a fourth... If I give you a fourth domain request in the top group, okay? And it is also www.meaniepants.com. You're going to be even more sure that that's not a DNS tunnel, right? Why are you more sure? You're going to be even more sure that's not a DNS tunnel, right? Why are you more sure? What happened in your brain? If I give you a fourth request on the bottom that is some random string www.meaniepants.com you become more sure that those are not legitimate. Why? So what I really wanted was some way to compare LLDs in the same domain to each other. I want to see the difference between the LLDs, right? Here, the first two requests are exactly the same. That's probably not being used to smuggle data. I mean, it could, right? Probably not, though. The bottom ones? Hmm, different call. They're very different. They could be used for lots of things, right? So how much is the first request like the second request, like the third request? That's what you're... I think that's what your brain is doing when you immediately look at that and make a decision there. And if data is moving via a DNS tunnel you should see the LLDs change a great deal. They change between requests to request. So let's stop thinking of them as LLDs, right? Let's stop thinking of them as strings and let's start thinking of them as geometric structures, okay? So here's a two-dimensional graph of dog and cat, right? You can see that they're different and we can measure the difference between the two words, okay? Everybody with me now? We can measure how different they are. The problem with this is that we're measuring the difference between letters. There's no concept of a set. It has no concept of words, right? We're just simply measuring the difference between letters. So that wasn't good enough. I tried this, by the way. Everything I've talked about in here, I tried until I got it to work. That failed. So we need something more complex. So we're going to skip ahead a slide. So we end up with this. This is what we're going to use, right? So half a room just got nauseous and the other half just got horny. And whichever half you're in that says a lot about you. But don't be scared of that because that's actually pretty straightforward once we get there. Okay. So LADs can only have a limited number of characters anyway. So says the RFC, right? Everybody falls RFCs. Most people try. So that really breaks it down to about 36 possible values, right? A through Z, 0 through 9, and then a few oddball characters. I'm actually throwing all the oddball characters out because I don't need them. And we're actually going to do some multi-ordinal vector math. Okay. So this is simply a slide to explain how we're breaking down the LADs' letters into numbers, okay? Because we're taking characters and we're going to do math on them and I found it difficult to understand that if you don't understand how we're turning the letters into numbers, we're basically going to normalize all the characters into a numeric scheme and then apply this, okay? So I didn't make this up. I didn't come up with this. This is Euclidean geometry and I stumbled across this reading a college professor's web page on statistics and what he was doing is he was giving surveys and then measuring the difference in responses, right? Two groups of people and how did they respond? I was like, oh, that might work for what I want to do because I want to measure the distance between words, right? So what we have here is actually a comprehension of set. We're actually measuring the distance between sets of words or sets of letters, not just characters. So we're going to turn each character into a number, okay? We're going to subtract one string from the other. We're going to square that. We're going to sum all those up for every character in the request and then we're going to take the square root of that and that was hideous, right? Anybody understand that? If you meditate on that for a while it will make perfect sense. This is the textual explanation of what I just said. Again, it's the summed squared subtraction of each letter's numeric value. So we're comparing strings. This allows us to measure the distance between LLDs. So in Euclidean geometry, another word for distance is similarity and dissimilarity. So we're actually measuring how similar different DNS requests are. So let's go back to our DNS request slide for a second. Okay. So WWW and WWW2 are not the same but they're not that different, right? Compared to the bottom three which are radically different. We need to be able to capture that. Okay. Power of cheese. Here we go. So now we're measuring the difference between two LLDs. How different LLD1 from LLD3? We're measuring the difference of these domains as they roll past and hopefully some light bulbs are going off. So that's our list of traits. We're doing some packet counting and then we're coming up with this numeric representation of how different each of the domain requests are from each other. We're going to train the neural network to do what we want, to find what we want. We're going to give it positive and negative examples. Then we're going to run it on real data and see what it finds. The beauty of using a neural network is using a false negative which IDS always has false negatives, right? We can data mine our traits out, add it to our configuration file, let the artificial neural network know that it's a false positive and say relearn this so that you don't give me that bad answer again. There's a... If you get real into AI there's some concepts called overfitting and underfitting. And basically what those mean is I make very clear decisions. And I reached that point during the testing several times. And so when you do that you just have to eat some false positives. Until a better AI guy comes along which there's lots of them out there we're just going to eat false positives. Okay, so here's the tool. DNS tunnel trap 0.9 It is proof of concept only. So don't email me and tell me how it didn't save your enterprise or whatever. It's pock only but it is real code. It exists on the web. Right now you can go download it. I'm not speaking at a conference about something theoretical that you can't get your hands on. It does not sniff off the wire, right? It's not real time. That's because it's proof of concept. So you go out with your t-speed dump and you capture some big chunks of DNS traffic and then you're going to run it on that. Incidentally, while I'm thinking about it and we'll address this later when we talk about Heyoka it is asymmetrical. I only care about your outgoing DNS requests. I don't pay any attention to responses. We don't need them, right? All we're doing is measuring the difference in outgoing DNS requests along with some packet counting. It has three major functions. Find tunnels. Basically here's a PCAP file I captured. Tell me if there's any tunnels in it. New data creates a new training file entry. Training file is what the neural network learns off of. And train, which is to train or retrain the neural network. I'm going to push the demo until the end. We have plenty of time. I'm excited. I'm talking too fast. Okay, so how does it work? It works. I didn't crunch stats. That's the first thing a lot of hardcore guys ask me. They're like, oh, what are the stats? How old does it work? I don't know. It works. Right off the bat it catches all three of those. Easy. Those are easy. And we'll talk about why. To test this, I wrote my own DNS exfiltration. And I tried to make it a little bit harder to catch. All it did was grab eight bytes of data, encode them, and then ship them out in a DNS request. It was exfiltration only. Didn't care about response. And I was able to tune the neural network down without a false positive. That's on a small network, by the way, because remember we're talking about scalability is an issue here. Again, it only works on TCP dump files, and it only works on up to X domains at a time. I don't have any experience in real-time programming if you hadn't figured that out by now. So that's why it's not real-time. It doesn't sniff off the line, and it only works on X domains at a time. And that's because the programmer sucks. Okay, so I just listed the three DNS tunnels that it catches without problem. What about Heoka? So if you don't know what Heoka is, Heoka was a DNS tunnel tool released this year. I think it's Shmucon. I'm not really sure. And the authors of Heoka did not release the tool. So I can't stand up here and say, oh, we put them head-to-head, and here's how they did. We don't know. We have to guess. The real feature, the two real features of Heoka was that they figured out how to smuggle some data out, additional data out. I'm not sure that matters for what I'm doing. And they're spoofing source addresses to get more data out and make it harder to catch. But again, remember that I don't even care about source address. This tool doesn't look at that. I'm looking at actual data, right? We're comparing the difference in data. You can spoof all the source addresses you want. So my guess is that we would still be able to catch it. That's a guess. But again, what we're doing is completely asymmetrical. We don't care about the data coming back in. So it works as well as you can tune it. And that's just the way of neural networks, right? So a good example of a pretty well-tuned neural network would be Gmail span filter. Not bad. This is nowhere near that sophistication. I want to make sure everybody understands that. You're able to tune super low false positive on a small network, right? Overfitting results in false negatives. That's the way of it. So you can tune it down. You just have to accept some false positives. Hopefully you can tune it better than I did. Lots of AI guys are better than me. So it does have a few weaknesses. There are some ways to defeat it. Number ones, don't use the LLD, right? Because that's all I'm looking at. You're happy because you're actually attacking my programming ability more than the theory. So I don't really give you any points for that one. And the other one is you can make tons of requests to multiple domains. So you can go register thousands of domains and send each one one request and you get a thousand packets out. And obviously there's no way for me to measure the difference in different packets to the same domain because you never made a request more than once to its same domain. That's not much of a victory though because DNS tunnels are bandwidth limited and you'd be splitting it up into something even slower, I think. So this used to be just a complete jerk slide and I lost my nerve and changed that too. So the slides and the source can be found at minipants.com. Don't call me minipants. You don't need to email me to tell me things that are wrong with it. I mean unless you fix them. So I have no idea what I'm going to do with this. I have no illusions of grandeur with it. I'm not interested in patents. Everything I've done is public domain. You can take it, you can mess around with it, you can experiment on your own network, improve it hopefully. And then these are a list of folks who gave me some help along the way. Just because their names here doesn't mean they endorse the project. It just means they were helpful. And it does need a complete code rewrite. Of all the projects I've ever released, it is ironic that the one that has code I am most ashamed of is the one I'm presenting at DefCon. I don't think I'm alone in that. And then the last slide is really just, you can pull the slides online. If you saw this talk and you're like, wow, AI is not that hard and it's not beyond me because it's not, you can go right, AI applications right now. These are some good places to start. Neurosolutions in particular makes a GUI on a neural network that allows you to see things and move it around and you really get to understand the guts of how a neural network works. And that is available on a trial license. Fuzzy thinking should be required reading by Bart Costco. Wek is pretty useful too. Let's do a demo real quick. Show you that the tool does what I tell you it does. So when I practice with no audience, that was 45 minutes talk and we just did it in 30 minutes. It's probably not too good. Okay. Do you guys see this decently well? It's just a command line tool. Sorry, there's no cool GUI. I don't really even know how to do a GUI. I've never done one. Nor do I intend to. Okay. So this is our configuration file. The first line is the number of examples we're going to give the neural network, right? We're going to delete these to start with. And we're going to train our neural network on that initial data set. So it's trained. Now I'm going to say show me if there's any DNS tunnels in this sample off of a small private network of iodine, which is a DNS tunnel. Example three dot com might be a tunnel. It is, in fact, iodine. Okay. So that's not very impressive. Now let's say show me if this sample of Ozzyman has a DNS tunnel in it. Oh, oh geez, it didn't find it. That's okay. So the next step is we're going to data mine out the correct values that we want of this example of Ozzyman. And we're going to put that in our training file. We're going to adjust the training file a little bit. And we're going to give it a one because that is, in fact, a positive positive. We're going to retrain the neural networks. Hang with me here, guys. That is boring, but hang with me. We're going to retrain the neural network. And we're going to say, okay. So I just gave you the answers. Now I'm going to show you the exact same network traffic. Now can you find a DNS tunnel in it? And it finds it right away. Well, that's not very impressive, right? Because we just gave you the answers. What if I show you a second example of Ozzyman? You should be able to find it, right? I mean, the numbers should match up fairly well. You should be able to catch it. And it still does. So that's an Ozzyman sample. It's never seen before. But you can see how it learned. It adjusted. And it can catch stuff it hasn't seen before. Okay. So what about a DNS tunnel that's never seen before? Entirely, right? Because we've given it two examples of what a DNS tunnel looks like. From this point, it should be able to infer some things. Can you catch DNS to TCP without seeing it? Yes, we can. Right. So we've never seen it before, but it doesn't matter because we've adjusted our weights and thresholds to the limit where even stuff we've never seen before is going to trip it. Okay. Let's talk about false negatives for just a minute. So I've got a file here. And we're going to tell it to go look at this file and see if there's any DNS tunnels on it. It comes up with two. It comes up with Comcast.net and Verizon.net. So we've looked at those, and we've determined those are, in fact, not DNS tunnels. So it made a mistake, right? False positives. We're going to do the same thing. We're going to data mine out what we want. And we're going to stick it in our configuration file. And we're going to give it a negative one for false positives. This is all documented in the project, by the way. We're going to retrain the DNS, the neural network. Not good. Live demos. There we go. That's what I wanted. That was not my fault, by the way. The fan has its own bit of weirdness. So let's start to find it on the false positive file. Bang, our false positives are gone. So what did it do? It readjusted its weights and thresholds until those two domains are no longer tripping them. So there's one more thing I want to address here before we go. If you go play around with the tool, sampling becomes very important. And the reason for that is because each of these is taken from a sample that was approximately as long as my attention span. I'm doing a TZB dump, and when I get bored, I do a CTRL C. And that's the sample that this came from. Your results may vary. You should be building your own training file when you mess around with the project based on your own network and what you're trying to accomplish. So it comes with a stock default one. Don't expect much from that because it's not the same. So we blazed through that way fast. Any questions? Go ahead. So the distance equation. The distance equation is the third number in this series. So you can see the tunnels often have a much higher distance. However, these false positives also had a very high distance. Did I answer your question? Right. No. So what that is is we give our neural network all these inputs, right? And it gives us a single numeric output, which is a likelihood of how close it matches. So inside my C source code, there's a judgment call where I said, okay, anything greater than, you know, 0.4. Anything greater than 0.4 is probably a tunnel. So output that, right? So this example of Ozzyman, it gave a likelihood of 0.89. It's over the threshold. You may have to adjust that as needed. Any other questions? Can somebody up here shout his question up to me? I can't hear him. Okay, so how did it learn the... Okay, so now you're talking about artificial neural network internals. Okay, I got you. So this is actually done with a straight stock default fan configuration file, right? Fan is the neural network that I'm using. I'm using straight defaults. And at one point I got in there and messed around with those and I discovered I really didn't make the situation better. That's why I went back to just the straight stock fan configuration file. And it seems to work. Is that what you're looking for? Yes, it does. Absolutely. Because it's adjusting thresholds. That's right. So at one point I took a lot of network dumps off of a really busy internet server. And I started intermixing the number, the output of that in with stuff off small networks and I ended up with kind of a collage. So I really don't like to say that our own network is learning your network because that sounds... Right? But you do end up with some strange results by intermixing those types of things. Go ahead. He asked if we could run through the Euclidean equation real quick. Oh, his question was can we run through the Euclidean equation really quick? By far that is like... I never imagined someone would ask that kind of question. My life is so much better now. Humanity is restored. So we glossed over that pretty fast. I don't have an example ready, but so we've turned every character into a number. Right? So Comcast, or actually it's going to be... Let's say Apple.comcast versus orange.comcast. And we want to know how different they are. So we're going to take the numeric A and the numeric O and we're going to subtract the O from the A. We're going to square that and then we're going to go on down the line. Substituting nulls for length mismatches. We're going to square that. We're going to sum all those and take the square root of it. That's the Euclidean equation for similarity. That's probably the best I can do on the spot. So... Yeah, okay. Right. So here's one of the reasons why it's not real time. So you can only compare as many domains as you've saved, right? So every time I save a domain, if I want to do a comparison against it, I have to keep it. So... In the proof of concept, I'm only comparing against the last one I saw. I'm only keeping one and I'm comparing the one I have now to the last one I saw, right? It would work better if we kept more. But then we start to get into windowing issues and memory issues and I was not... At the time, I was not as interested in that. I don't know. I haven't got that far. Thanks for asking it. Yeah, that's a good question. I'm kind of hoping that... To me, this is... I've convinced myself this is clearly a better way to do IDS than signatures. And I'm kind of hoping that other people will ask these kinds of questions and maybe we'll code that in the next version and it'll improve. Go ahead. You should code that. Was that a volunteer? Go ahead. So unequal length, I just substituted in nulls. And at first I was like, oh, this will never work. But it actually seemed to work. Okay, so I went with it. I don't think so. We're comparing least level domains and I worked on different schemes and I don't think there's a lot of value in comparing domains against each other. It's mostly the same packets to the same domains. That's what we want to look at. I think that's what we want to look at. Go ahead. Are you a math guy? It's awesome. No, it's awesome. So the question was, it was about why are we doing the comparison letter by letter? The best answer I can give you is that this equation is supposed to take into account the entire set. That's like the point of the equation. So even though we're doing a letter by letter subtraction, the equation is supposed to take into account the set. That's an insufficient answer, I'm sorry. No, those will come out as different. So we're just going to stick a null value at the end of Apple, right? So those are going to come out. So those are actually going to be the same until it gets to the last character and then it'll be different. But the difference will be a small amount compared to Apple and Orange. There will be a smaller amount of difference. Did I answer that? Agreed. Agreed. I didn't think of it. Go ahead. No, I didn't take case into account. These are a lot of good questions. Good job. Oh, I'm done. They're kicking me. Thanks, guys.