 Can people hear, okay, I'll go ahead and get started. So this talk is Rack Attack and how to protect your app with this one weird gem. Where does Rack Attack come from? We built it at Kickstarter. If you haven't heard of Kickstarter, it is a funding platform for creative projects. So somebody has an idea for a film, a comic book, an open source project, a gadget. They put their project up on our site. They can offer rewards for various pledge levels their friends, family, strangers on the internet come and can give them money at the end of the deadline if they've reached their funding goal and so they have enough to do their project. That's when we process the transactions and the creators get the funds they need to do the project. To give you a sense of scale for what we do, we recently crossed over a billion dollars pledge to the site. It's over a million dollars a day and it's gone to over 60,000 creative projects. Quick introduction. My name is Aaron Suggs. I go by K-Theory on social media. I love dancing in my bare outfit. And I'm the Operations Engineer at Kickstarter. We have a very DevOps-y style workflow. So it means I end up writing a lot of Ruby code and I love writing Ruby code. So Rack Attack is a tool I wrote and it's rack middleware for blocking and throttling abusive requests. What do we mean by abusive requests? These can be things like malicious attackers trying to take down your site, doing things like trying to crack user accounts or get sensitive information, or it can be naively written scrapers who are just like people on the internet doing weird things as they are prone to do and that's cool but sometimes it is a lot of traffic, it's a lot of resources for your app to try to handle and Rack Attack is a very elegant DSL and way for dealing with these sorts of things, sort of constraining their behavior so your website stays up. Rack Attack is on GitHub at slash Kickstarter slash Rack Attack. It's an open source Ruby gem. There's a read me, sort of exactly like you'd expect. So the big wins that Kickstarter has gotten from using Rack Attack and the reason we developed it was we wanted to increase our performance. So this is like site performance. We had problems with sort of abusive requests making our website slow because they were using up too much app server CPU or too many database resources. By sort of constraining them, we were able to make the website faster for the sort of most important requests like people coming on wanting to watch videos, wanting to pledge money, not people just trying to scrape down the entire site. We also improved our availability because sometimes these requests were so much, there were so many that they would take down the site or there would just be some weird incident and wait, right, it hurt our availability. But the biggest win that we had was developer happiness because dealing with these sort of bad actors on the internet, especially if it means like your site's going down or like you need to scale up because somebody's doing something weird, that can really interrupt a lot of developers. It can sort of derail your product roadmap. We want to be writing cool features and Rack Attack was a great DSL to let us spend less time thinking about that stuff and more time doing the stuff that we like doing. So let me talk about the origin story for Rack Attack, like what happened at Kickstarter that made us realize we needed this? Let's rewind to the summer of 2012. Do-do-do-do-do-do-do-do-do-do. And this happened. So this is a story in a graph. So the blue line, I hope it shows up pretty well, cool, is our regular successful logins. People typing in an email and password and us being like, okay, you are logged in. It ebbs and flows throughout the day. Suddenly, one Saturday afternoon, we just get so many of these bad login requests. And for a while, we're like, what's going on? Did we deploy a feature that broke login? No, somebody is trying to crack our user accounts. They're just like guessing email addresses and passwords as fast as they can from several different IP addresses. So as the ops guy, this is sort of on my plate. I'm like, okay, well, I got to stop this. This is bad for the site for this to be going on. So I wrote a pretty nasty before filter for our login action that's like, keep a counter and memcache. And if it's too many, give them an error page. And it was kind of a sucky experience because I was changing a really critical feature of our site sort of under duress of knowing that I needed to get it out there quickly. And it was sort of like a big change. In the pull request, I was apologetic being like, I know this is badly tested and there's like a nasty code change, but we got to get it out fast because this event's going on. So we did that. And then sort of in the cold light of day, I reflected a little bit and I thought, we need a more elegant way to prevent bad requests. This is, it's not just gonna be about this login attack. There's gonna be about a whole class of problems that we might have on the site. You know, I should say too, with that login attack, it was something that we sort of always imagined being like, oh yeah, of course we should like throw out a login request. We just hadn't ever gotten around to it. You know, it was in our ticketing system as like a low priority someday. Somebody should do this thing. And having it actually happen was like, okay, now we got to do it right now. So we realized like, we need this generic tool to stop bad requests. And really, there's already in the Ruby world a great solution for this. And it's rack middleware. So now we get to the code section of the talk. Here comes some code. Get ready. This is an example of like the most basic rack middleware. Just really quick for people who might not be familiar with it. So middleware is basically like hugging your application, wrapping around so you have your Rails app or your Sinatra app. That is the app in this case. And you wanna sort of be able to do things to the request that's coming in from the client. That's the M. So every request from a client is gonna do this call method where you pass in the environment. The environment is like, I don't know what page their client wants or what their cookie is and all that information. And so the real magic of rack middleware is it lets you do stuff here with the request like you can block it in the case of rack attack, potentially. Or you can do stuff with the response. You can log it. You can cache it, stuff like that. So this is just a great pattern for managing, for sort of making easy architectures to do stuff with HTTP requests. So in rack attacks case, this is a sort of simplified version of the rack attack call method. We say for this request, should we allow it? If so, go ahead and pass it on to your application. Your application is gonna do potentially a lot of work. Maybe it's gonna spend a couple hundred milliseconds like querying the database and rendering views and stuff like that. So that's the expensive work that we wanna save if this is an abusive request. So if we shouldn't allow it, then we just return back this very fast access denied as a very simple and fast response to render. Rack attack can do several hundred of these access denied requests per thread that you have running. So like per unicorn worker or per hiroku instance or something like that. But so that's what you get for when you just use the rack middleware for free. But so we don't yet know what this should allow method should be. That's code that you sort of have to configure yourself of what do you wanna throttle on? So that looks like this. This is sort of a generic throttle that you might put in your, in an initializer to configure rack attack. The important stuff that's going on here is we are calling the throttle class method on rack attack. So that's just something we expose to let you plug into the middleware. We give it a name. In this case, we name the throttle IP. This is gonna determine how we track it. And that just has to be unique throughout your application. We're gonna give it a limit and a period. And so the period is how many seconds we're gonna be considering for the throttle. And the limit is sort of your quota for how many requests you get to make during that time. So in this case, it's 10 requests every five seconds. For the arithmetically inclined, you'll notice that this is not like a reduced fraction. We could say two requests every one second. The advantage of doing a higher multiple is it like allows a little burstiness. So these periods are basically dividing time up into these like five second long buckets. So in between zero and seconds and five seconds after the minute, like in that window, you're allowed to make up to 10 requests. And so by having bigger multiples and bigger windows, you can sort of get around some burstiness but the long-term average stays the same. Like the long-term, nobody's gonna make more requests than two every one second. Okay, so what's going on? We got the class method, we got the name. We had the limit and the period. And then to this block, we are passing along the request. Now, in the earlier middleware example, we talked that we called this the env, which was just like the environment hash that comes from the request. Request is just like a light little rack request object wrapped around the environment that just sort of gives you instance methods to call like dot IP or dot host or dot path or something like that. It just sort of, you use these in Rails controllers too. So it's just like a lightly wrapped request. And then inside the block, what the block returns is the sort of really important part. That's the discriminator that determines how we're gonna bucket up these throttles. So in this case, we are gonna say every IP address, every distinct IP address is going to get its own throttle limit. But we could throttle by something else. We could throttle by a parameter or a host name or something like that. Or an API token. And one thing to note with these discriminators too is like if this is returning a string, so it's always gonna be a truthy value and true value sort of enable the throttling. Like we are gonna throttle these requests as long as there's an IP address and there always is. If we return nil or a falsi value, we just sort of let the request go through and we're not gonna throttle it. I'll talk about why we might wanna do that later. But so now we have this issue of throttle state. Like we have these counters per IP address that we need to track. And so where do we store that? A pretty elegant and simple and obvious place for us was our Rails cache. So when you just use rack attack by default, if you have a Rails cache, it's gonna use it. But it really works best with memcache or redis. So I hope you're using that as your Rails cache. But if you're not, there are ways that you can build your own or sort of like plug in a different cache store. The great advantage about memcache and redis is that they have really good support for atomically incrementing counters. And that's the sort of key feature we need behind the scenes. So now we're imagining for every request that comes in, we need to like increment a counter per IP address. And so how do we do that? Like what's the algorithm? So this is the nitty gritty of how rack attack works. How it constructs that key. So remember how we divided the minute up into like little buckets, depending on our period. So to do that, we sort of take the current second. We construct a key that is the name of our request like IP in this case. We take the time divided by the period. So this means that that middle component is going to be, is going to increment every five seconds. It's gonna, so the key's gonna change. And then the final part is that block return value. So in this case, it's the IP address of the request, but maybe it's an API token or something like that. So at the end of it, we have this key that changes every couple seconds, every time like the period rotates. And this ends up being a very efficient use, a very efficient use of memcache or redis. Like this is, storing all this information is gonna take like a couple megabytes. It's like, don't worry about the impact on your cash store in pretty much every scenario. To make it even more efficient use of your cash store, we set an expiry so that in that like, in that bucket window of say zero to five seconds, we're gonna say that all those cash keys expire at five seconds. So at the same moment that the cash keys change, they also expire. So memcache or redis just ends up reusing the same memory blocks over and over. Even though they're changing in memory, you don't have as much churn as you would otherwise. And so then the rack middleware is really doing pretty simple stuff of we're saying, for whatever your cash is, increment this key with this expiry, that's gonna give us back the count of how many requests have been made that match that throttle. And if it's more than our limit, we're gonna return that access denied response. So we rolled this out. We were able to have this global throttle per IP address. We had, we start making a couple other features. And it was about a year later when we had a sort of redux of a new event that put rack attack to the test. So a new challenger emerges in the summer of 2013. This was a script called kicksniper.py. And this revealed a pretty interesting behavior that we call reward sniping. Actually kicksniper.py refers to it in the code as reward sniping. And so this is an interesting behavior because so I told you how Kickstarter offers these rewards. They can be limited rewards. So a creator says, I'm only gonna give away like a hundred of these and first come, first serve. So there was a pretty popular project where it was like a video game and the video game was offering these reward tiers that would be like for 50 bucks, you get like the silver level package and for a hundred bucks, you get the gold package and so on and so like ever more deluxe and expensive packages. And they were all very much in demand. So the early reward tiers like sold out super fast. And then occasionally somebody in who had those early reward tiers would decide they're gonna splurge and they're gonna upgrade, they're gonna change their pledge to a higher one. And now for that moment, like there's now one available of the lower tier. And so people were like hitting refresh, refresh, refresh, helping that they just noticed when somebody had changed their pledge and now there was one of these highly desirable lower tier pledges available. Some entrepreneur, enterprising Python developer says, I will make a script that does this for me. Sure enough. So he writes kick sniper.py that's in a tight loop trying to change his pledge on our site is saying like let me get that early reward tier, our active record validations were working fine. And we said, no, you can't change your pledge to that. The vast majority of the time, but eventually he got through and was able to get the pledge. It was such a great success that he goes on all the forums and says, hey, everybody just run this like Python script on your laptop. And you too might look out and get one of these highly desirable earlier reward tiers. So let's tell this story in a graph. So this is our master database CPU over the course of a day or so. We see at the very beginning it starts off between 10 or 15%. That's my happy place. That's where I like it to be. We have plenty of headroom for like, you know, big projects to sort of blow up on the site as they do from time to time. And I honestly didn't really notice that it'd been creeping up over the course of a day. Thursday morning, it crossed 30%. And that's when I get a CPU alert threshold. In fact, the whole dev team gets this email being like, hey, the master database CPU is pretty high. You guys should check that out. So we spend a little time. We're like, why is the database so high? Well, it looks like there are a crazy number of requests trying to change their pledge for this one project. We're able to sort of construct this backstory and like see what was happening on the database CPU. We see the form requests where everybody's like, thank you for kick sniper.py. And so, and we're like, all right, so how are we gonna handle this? Like, is it really that important that people are able to try to change their pledge like multiple times a second? What if they only change, could change their pledge every couple seconds, right? Like, I guess that's fair enough to the, like there's this question of like, what's the fairest way to allocate the scarce resources of like the pledge as soon as it's available? I kind of don't care about the answer. Anybody can get it. But we're like, if we start throttling these people, it's like totally fair, they're using an inordinate number of resources and people who are just clicking around the site are having a slower experience because our database CPU is so high. So we decided like, okay, you can make a couple requests per minute to change a pledge. It was one line of rack attack code. We deploy it, the yellow vertical lines here are deploy lines. So you can see that right here, about an hour after we get the alert that something was going wrong, we deploy and immediately our database CPU drops were pretty much back to the happy place. And so for us, that was like revealing the great success that we could have. Like it was so easy for, like once we figured out what was going on, it was so easy for us to write code that just like solve that problem. We didn't have to think about like, how do we optimize the edit pledge flow? Which could have been like a much bigger product change and taken up a lot more developer time. It was sort of a cut and dry decision of like, most people aren't gonna try to change their, like you're super confused if you're actually trying to change your pledge several times a minute. That's a bug we should fix, but it's really just these scrapers. It's no big deal to just say they can try a few times a minute. So that was a big win for rack attack, a Kickstarter. We feel like we sort of cemented that it's value in the organization. So now I'm gonna shift gears a little bit. I'm gonna tell you pro tips of general things you can do with rack attack that are probably useful for your application. Oh my gosh, I'm so glad that I got to use this GIF. This GIF is like pure condensed happiness for me. Okay, back to the code. So we talked about how to do like a general, a log, I'm sorry, we talked about how to do a throttle for all IP addresses. So like each IP has this quota of how many requests you can do. But in our origin story about the login attack, we wanted to be extra careful about login requests. Like those are something that you would want to throttle even more strictly than you would throttle many other things in your application. So this is a new throttle. And so we give it a new name of logins per IP. And this is saying that if you're making a post request to the login URL, then we wanna throttle you by IP to like this much, this lower limit. And so this is relying on the fact that we mentioned earlier that if the block returns nil, we're not gonna do the throttle at all. So if this is not a post to the login action, like we're not gonna check memcache, we're not gonna increment any counters or do anything like that. We're just gonna sort of allow this request right through. But if it is, we're gonna hold you, we're gonna say each IP address gets this lower quota of how many login requests they can make. Thinking of the same problem from a kind of different angle, you might wanna imagine a situation where an attacker is using many different IP addresses to try to crack passwords for one particular email address, right? Maybe it's the founder's email address or something like that. So putting on your security hat, you can be like, how am I gonna be safe for those kinds of requests? The only change here is what we're returning. Instead of the IP address, we're returning the value of the email parameter. So this is a sort of little different way of thinking about throttles of saying whoever you are, if you're trying to log in with this one particular IP address, you can only do it five times every 20 seconds. So those are two throttles that pretty much everybody should have that feature on their website. If you haven't been bitten by it yet, it's probably just a matter of time. Another pretty cool rack attack feature are blacklists. So these are requests that you don't even wanna throttle. Like you're not gonna allow them at all. Just access denied every time they happen. I was gonna call these blocks, but like blocks, I can't call them blocks because in Ruby, that's already a different thing. So hence the term blacklists. Here's an example of a pretty handy blacklist. Say you have an admin section of your website and you wanna restrict access to the admin section to just like your one office IP address. So this is, again, it's using the blacklist class method on rack attack to sort of configure this in the middleware. You would put this in an initializer saying that you give it a name like bad admin IP. And one of the things, like it's different than throttles in like we don't have to pass along a limit or a period because it just like, it doesn't apply to blacklists. But it has the same logic where if the return value of this block is truthy, we're gonna like just give them a very fast access denied message. If it's false, then we're gonna let the request through. So this is saying, if you're making a request to a URL that starts with admin and you are not from this IP address, we're gonna just give you an access denied. This is something that Kickstarter uses. We call it the star of the trolls feature. So this is, if you're one of our banned IPs that our customer support team decides which IPs get banned, you cannot make any request that's not a get request or put another way, you can only make get requests if you're from these IP addresses. So let's think about what it's like to use a dynamic web application if you're only using gits. You can't sign up, you can't log in, you can't post comments. These are, we sort of use this as a measure of last resort for people who are bad actors in our community. Any big community knows that this stuff is sort of inevitable to have a few rotten apples. And this has been really fast and effective for our community team to be able to just put these IP addresses into a YAML file. They leave them there for about a week or so and gives that person sort of time to cool off where they're not gonna go around signing up for a bunch of accounts and maybe doing bad stuff or like posting messages or stuff like that. So this is, I was really, I was sort of struck when we started doing this of like how simple this was in code and how much it helped our CSS or our community support team. So this is another example of like sort of an area where I wouldn't expect Rak attack to be very helpful but it ended up being very helpful. Another Rak attack nice to have feature is active support notifications. So every time, if active support notifications are in your app and so for any Rails app they're already there. We will fire active support notification event every time a request gets blocked or throttled. So this means you can have a subscriber to these events that's gonna log or graph the request and stuff like that. There are examples of how to do that in the readme on GitHub. So thinking of where Rak attack might fall in the set of tools you use to keep your site fast and reliable, it's not a silver bullet. Like it's very much compliments things like the IP table's firewall or nginx limit con module to limit the number of concurrent requests per IP address or if you have like a CDN or a web app firewall so I'm like, you know, hardware to keep your website fast and reliable. Like keep doing those Rak attacks, not a silver bullet. If you have a NTP reflection DDoS attack like it's gonna overwhelm your unicorn or Heroku processes pretty fast, you need something else. But what Rak attack really is good at is it's Ruby, it knows everything about your app. I mean, because it's in your application you can use other logic from your app. Because it's Ruby it's easy to test. You write integration test for it the same way you write test for the rest of your application. And it's easy to deploy because it's Ruby code. I don't know how you deploy changes to a CDN or a web app firewall but it's probably a different process than how you deploy your Ruby code. And this is something that everybody on our engineering team is comfortable doing. So that's where Rak attack can fit in into your application security mindset. I also wanted to call out and say thank you to my many GitHub contributors. These people are really awesome and they've taken Rak attack, they've added really cool features like allow to ban and fail to ban and they've cleaned up documentation and they've made the test a lot better. They've added Redis support. It used to be just memcache but these people are doing fantastic things with open source. They're from five different continents too which like it feels so cool to put code out there and like people from five different continents contribute to it because they find it useful. So more like that please. So sort of wrapping up the weird stuff happens on the web. It's inevitable. It's good in a lot of cases. I like that you know people write really innovative things and stuff that I would never would have come up with. Like that's fantastic. So I hope the web stays weird but I also hope that the website stays up and Rak attack lets you have the best of both worlds. So that's all I had. That's Rak attack at Kickstarter. If you have any question, I'd love to answer any questions if people have them and if you're more comfortable, hit me up on Twitter or find me after the talk.