 Thanks everybody. Yeah, so today we're going to talk high-performance Django basically taking your site from run server on your laptop to being able to get on the front page of Reddit. So my name is Peter Baumgartner. Like you said, I'm the founder of Lincoln Loop. We are a Django web agency so we build Django sites. We help people with Django problems and we help people learn how to scale and build new sites that are built for large, high-scale traffic. So we've been around since 2007 and in those seven years we have learned a lot about Django and how to make it run fast and handle lots of traffic. We've learned a lot of those lessons the hard way so we actually recently wrote a book by the same name high-performance Django that kind of bundles up all those lessons we've learned and packages them. In a nice little e-book now and probably a print edition later actually I have a couple here and we can give some out to people asking questions at the end. So if you're a reader of Hacker News or any any other web publications you may have seen this before. Django doesn't scale. How many people here believe that? Does Django doesn't scale? Okay. I'm going to be a little controversial and say that's true. Yeah, Django does not scale. You may say what about Instagram and Pinterest and discuss. All these people are using Django and they're you know at massive numbers of users and traffic. How are they doing it? Well I'd say they're not actually using Django as much as supporting cast of players. So it's their database Postgres or MySQL. It's Memcache and Redis doing load balancing with Nginx, maybe doing caching with varnish. None of these are Django, but they're really what makes Django site scale. You could use them just as easily with PHP. So all those servers, how do they work? How do you plug them all together? That's that's going to be the focus of the talk today. We're going to do a lab demo. I am basically going to take a site from running and run server and we're going to blast it with a lot of traffic, see how it performs and then scale up. I'm not going to talk at all about how to optimize your code today. I'm not going to talk at all about tuning your databases or how to do caching. We're basically going to work higher in the stack than that. So we'll be talking about how you serve your Django application, how you do load balancing and things like that. I can't do this on my laptop because I literally need multiple servers. So if any of you are doing massive bit torrent downloads now or watching cat videos or something, I'd appreciate if you shut it off because it's going to be a really lame talk if I can't get out to the internet. So like I said, we're going to be throwing a lot of traffic at these servers. It may look like we're benchmarking Django, but really what we're going to be doing would be a terrible benchmark. I'm going to set up a fake Django application. It's going to have fake data in it. I'm going to use EC2's network who knows what's going on there and I'm going to use Docker containers inside virtual machines on shared infrastructure. Again, neighbors could be doing anything. So don't take the exact numbers. We're going to see here to heart, but more of the difference between each different set up we're going to have. Okay. So first up, we'll take a look at Django. So let's see here. This is a EC2 instance I have set up. It's got a Django application on it. It's an M3 extra large. So that has four cores in it and about 15 gigs of RAM. So a decent sized box. It's not massive, but for what we're doing, it'll work well. I also have a RDS Postgres database server. It's a T2 medium. I think that is two CPUs and four gigs of RAM, probably a lot less than what you would use on a big production site. But for our purposes, it'll work well. So I'm going to use a tool called Fig. Fig, let's see, is a way to kind of manage Docker containers. So what we're going to do here is we're going to spin up Memcache D instance that is going to handle cache sessions. And then we're going to spin up our web server running run server. So that looks like this fig and we'll call our fig file and spin it up. So that creates our web container and our Memcache D container. And we're up and running. So I can show you what the site looks like right now. This is another box running in EC2 and it's going to be what we're going to throw all the traffic at it. So this is just an app. I threw in a bunch of fake data. It's user profiles and the users have a couple of foreign keys to a company and their job title. And they have a profile. That profile has a decent sized text blob in it and some links to the people they share a birthday with. So it's not a totally trivial hello world app, but probably not as crazy as what you'd be doing in production. Again, just kind of a demo here. So what you'll see in the real world is going to be a little different. Okay, so this is Jmeter. So you may be familiar with Apache Bench, AB, or Siege. They're good at kind of blasting specific page with lots of traffic. Jmeter is kind of like those on steroids. You can do really complex test plans and run them against your site. So what's going to happen is we've got this request object and it's going to go loop through everything that's under it. So first it's going to hit the home page of the site as an anonymous user. Next it's going to hit what I'm calling a hot profile page. So in real world site, typically you'll see there's kind of a, if you have a site with lots and lots of pages, there's usually a small subset of pages that really get hammered and then kind of a long tail after that that don't see quite as much traffic. So we're going to try to simulate that. So this is going to pick a random profile page between 1 and 50 and hit that. Next up we're going to, on 10% of our loops, we're going to log in. So that log in will basically, we'll hit the admin login page that'll give us the C-Surf token. We'll use that token and credentials to authenticate against the server. And then we'll post to create a new profile and then we'll hit the home page as an authenticated user. So we're kind of stimulating, you know, a site that's got some logged in traffic and those users are doing something. Another 10% are just going to hit a random profile page. So the database has about half a million profiles in it. This is going to kind of simulate that long tail of web traffic. So I'm all set up here to hit my web server. I'm going to do 50 current users and we're going to go through this loop 10 times and see what happens. So fire off my test here and we can see our response times. So what's happening is, we started off pretty good. We were below like 200 milliseconds, but really quickly we are getting really bad response times. We're now up to a second and a half. That's not really a great response time to serve requests out of Django. So basically what's happening is we're overwhelming the server. And if we go back and look, this is H-top. So we can see our, you can't quite see, but those are run server instances in Python there. And we're not really utilizing the whole server. At most maybe 30, 40% of any of the CPUs are getting used. But requests are queuing up. Basically, this is because run server is just a single process. As of recently, it's multi-threaded, but still we're only using a single process. So not really great performance. Basically, the server is not fully utilized. I can go back here and I'm going to pull and we're going to keep track of the data we're seeing here. So this is going to be, shoot, font a little smaller so it fits on the screen. So that's run server. 50 concurrent connections and our request per second were 28.2. And our average response time was 1,353 milliseconds. So about 1.3 seconds, that's not going to fly if you're on the home page or on the reddit. It's a little too slow. So let's go back and see what else we have. So normally you're not going to put aside into production with run server. That's a bad idea. Usually you're going to use a production whiskey server. We're going to use you whiskey today. You might be familiar with G unicorn, Apache bond whiskey. Any of those really are going to get you by. This is just the one we prefer. So I'm going to go back and kill off our run server container. And while that's happening, I'll show you our you whiskey configuration. So you can't quite see the edge of the screen there, but these are processes. So instead of running one process like run server, we're going to run six processes. And again, we have threading on. So multi-threaded six processes. We should expect some better performance here. The rest of this stuff is kind of boilerplate. You don't need to really worry about. So I'll show you as well. So this is how we are going to start off our you whiskey process here. Fig. All right. You whiskey's up and running. And we'll erase our previous test results. You was just going to go a little faster. So we'll loop through over this 20 times here and start off our tests. So you can see here performance is much better. We were pretty quickly over a second with run server. Here we're, you know, under half a second on almost all our requests. You can see the authentication takes a little longer. That's the password hashing in action. We actually want it to take a long time. So that's good. And we can see we're serving a ton of requests. Let's see how our processors doing. So we're utilizing a lot more the machine. There we go. So it's, you know, it's it's definitely we may have just finished the test run there. But as you could see, you know, we were hitting 90 percent CPU usage. And it does look like we finished. So let's see our results. So go back here, you whiskey. Again, 50 concurrent users and our requests per second are up to 72.4. So 150% better. That's that's a pretty good improvement. Our average response time is down to 279. So much better performance exact same server. All we did was with swap out run server for basically a real whiskey server. Let's see what happens if we take that same server and instead of throwing 50 concurrent connections at it, we throw 100 concurrent connections at it. And I think this is going to take a little longer. So for short on time, I'm going to bump that down to 10 loops through, erase our old results and fire it up again. So we should see here, we're pretty much maxing out the server. It's under a lot of load and the load average is just going to keep creeping up here. And if we look at our response times, we can see they're also jumping up. So last time we were hovering around, you know, half a second. Now we're hovering around a second. So in the real world, if you saw this on a server, what you would be saying is, you know, we're maxing out the server. If we throw much more load at it, we're going to start our requests are going to start timing out. We're going to start dropping requests. So this isn't going to get us what we need. So anybody know what the next step is? What's that? That might work. Anybody else? Cashing, we could do caching. So this server, you know, we maxed it out. We could get a slightly better optimized whiskey server that might buy us a little bit. We probably would be a really good idea to look back at our application and see if it's, you know, if there are places we can optimize it with caching and improve the situation. But let's just, you know, throw more money at the problem. You know, one server is not enough. Let's try two. So that looks like this. We're going to use engine X as a load balancer and put two servers behind it. Instead of engine X, you can use something like HAProxy, Amazon ELB. There's lots of options here. So I'm going to go back to my web server, kill it off. I'm going to bring up another one. So this is web two and web two. There we go. So web two looks just the same as the other box. They're identical. This time we're going to bring up you whiskey. You know, it's the other one that said you whiskey HTTP. This is you whiskey using the you whiskey protocol. So it saves us a little bit of overhead basically converting HTTP into what engine X wants to use and then back to HTTP and then down to you whiskey. So we should get a slightly slightly better performance by using you whiskey's internal protocol, which engine X can speak. So here's our load balancer. I'll show you this is our engine X configuration. Sorry. Nothing exciting here. These are some settings that are known to boost the performance a little bit kind of just boilerplate. Here's our server that we've defined. We're going to pass back to a you whiskey cluster that will define when the container spins up and this include you whiskey params does everything we wanted to do. So that looks like fig engine X. Okay, so there's our you whiskey cluster we defined. Let's see if our web server is running. Okay. So we're going to go back to our test plan here. We're going to loop over at 20 times and instead of pointing to our web server, we're going to point to our load balancer, which is called LB. I haven't put in our other you whiskey here so that we did run it with 100 and we got better throughput there 101 requests per second. But our average response time jumped up to 750 milliseconds. So yeah, we we kind of decided that we overloaded the server there. So let's kind of forget about that one. That's not good performance. Overall, so I'm going to erase this start up against our load balancer. And we can see we're already doing pretty close to double the requests here. Let's see our response time is our response times way back down. So kind of what we'd expect, you know, we we served a certain amount of requests with one server, we double the servers and we're getting close to double the requests. At the same time, we're probably throwing twice as much traffic at our database. So you want to make sure that your database can withstand all this extra stuff. I think there's some kind of funky stuff going on with the network here with these those those gaps, but hopefully we still get a decent result out of this. So that was 100. Let's see. So we have engine x 100 concurrent users. And we did 403 average response time. And 137.4. So we came pretty close to doubling our first option here. The response time, that's higher than it should be. I think we kind of had some anomalies if we ran the test again, I think we would see it's it would be really close to that initial you whiskey instance. Maybe a little bit overhead, but engine x is pretty efficient in proxying. So that's that's engine x 100 concurrent users. What if we have 200 concurrent users? Let's see what happens then. So I'm going to erase these results. Fire it back up. And see our response times are starting to go up. Let's see what our servers look like. So this is web one, it's pretty maxed out there. This is web two, it's probably also going to be pretty maxed out. Yeah. And let's take a look at what our load balancer is doing. That's nothing. So load balancers are super efficient. Really, all you need to give engine x is a big fat network pipe. And it can handle lots and lots of traffic on a small machine. So let's see how we're doing overall, kind of like when we bumped up you whiskey to, you know, more than it could handle. We're seeing about the same thing now. I'm guessing our average response time is probably going to get close to to a second here. And request per second, we did better 166.8. So 166.8. So a few more requests per second at 200. But our response time was 805 milliseconds. That means basically we saw with very little load, we should expect response times around 200. So if we're at 800, that means we're basically overloading our server processes are waiting. So let's let's strike this one out as well. So next up, we could keep adding app servers, right? So if two didn't work, then we could add three. And if we can't handle it with three, we can do four. But maybe we can get a little smarter here. If you keep adding app servers, you're basically pushing the problem down your stack. And having load issues on your database is is not fun. You know, you can throw hardware at that for a certain amount of time. But once you run out of hardware options, that problem gets a lot trickier. So maybe we can get smarter. Whoops. And this is going to be the last one that we're going to benchmark. So instead of using engine x as a load balancer, we're going to use varnish. varnish does the load balancing just like engine x, but it can also do caching. So when those requests come back from varnish or come back from our backend through the load balancer, varnish can grab a copy of it and serve that to other users. So with varnish, I'm going to bump up the number of times we're going to loop through this here and erase the previous results and fire it up. So pretty quickly, we should see the request per second jumping well above what we were at before. And let's take a look at our response times. Our response times are actually a little Oh, okay. I didn't switch to to varnish here. So that explains why we're seeing the same thing. So we're going to stop their web servers. I'm going to stop engine x varnish does not speak the whiskey protocol like engine x does. So I'm going to start those up again. In the HTTP with the HTTP protocol. So there's their first web server. Here's our second web server. And varnish is is really amazing. And I don't think it gets enough love in the Django community. I don't hear a lot of people talking about using it. So let's let's take a look at the varnish config and kind of walk through that really quick so you can see what's happening. Just like engine x, we're going to define our back ends when our containers spin up and include that file. Varnish uses a configuration language called DCL. It's sort of like the old maybe looks like what you would use to configure engine x, but it's a lot different. So you define these functions. VCL received is what happens when a new request comes in. So it sees a new request and what we tell it is if that request is to the admin URL, or if the request has a cookie called session ID, we want to bypass the cache. We want that person to always go through to the back end and access the admin and log in or something like that. If they're not in one of those, we want to unset the cookies. So varnish will look at a request and basically determine whether or not it's unique. One of the ways it does that is by looking at the URL. Another way is by checking the cookies. So your Google Analytics package is going to set cookies for a user and you may have other reasons that there's cookies set for anonymous users. The back end doesn't care about those so we wipe all those out. Next up is the VCL hit method. The VCL hit method, that's what happens when it finds something in the cache. So first, what we do is we check the TTL, the time to live. If it's still basically still valid, we're going to deliver that right from cache. What this next part does is, let's see here, there we go. This next part, we can also define a grace period on our cache. So there's basically two timeout values. One, if we're in the first timeout, we just deliver it. If we have passed the first timeout but are still within the second grace period, what we're going to do is serve that stale content to the user and fetch new content from the back end in the background. So the user doesn't have to wait for Django to return the response but any future users are going to get a new copy of that data or that page. So that's really nice. It can prevent, if you have a really hot page, like you're on the front page of Reddit and you have one page that's just getting hammered and hammered and hammered and then your cache expires, what's going to happen is you're going to have 100 users all flood through to your back end and request that same page before the cache refreshes. So that's, I might call it a cache stampede or dog piling. So this is basically protection against that. And then the last method here is where we actually said the VCL back end response, we're going to set the grace period and the time to live on the requests that are coming back out. Varnish also respects cache headers. So this is something you can define in Django. But for our purposes, we're just going to keep it simple. And we're setting a five second time to live and a five minute grace period. In production, depending on the type of site you have, you maybe could run those much higher. But when you're on Reddit, even having those set really low, five seconds might be enough for you to withstand that. Basically that means if you've got 100 concurrent users sustainly hitting your site, only one request every five seconds is going back to your back end. So that can be a huge load off of your servers. So all right, now let's spin up. Varnish. Oops. I did switch those over. Okay. So let's try this again. Erase our previous results. That's what I'm expecting. Okay, so if you can see there, our throughput skyrocketed. We're 550 requests per second. And if you look at our average response time, it's 178 milliseconds. So and this is the really interesting one to me is if you look at the average and median response times on our hot pages, two milliseconds. So you're never ever will get Django running this fast. Varnish is very, you know, it's a cache. So it does what you'd expect. No matter how much optimization you do in your code, your database, anything, it's never going to do this. You can have the full page caching on. So this is where Varnish really shines. So that test is already done. Let's add that to our list here. So we're running Varnish, 200 concurrent requests. And we did 456 requests per second. Oops. And then we did 256 requests per second. And our average response time was 203. So compared to engine X running on the exact same servers, we more than doubled the request per second we're handling, and we cut our response time in half. All the exact same hardware, all we did was change engine X with Varnish. So that's a huge difference between what Varnish can do if we were to throw 400 concurrent users at it. So there's a lot of traffic. This is 400 people simultaneously hitting your site. Most sites will never see as much traffic. So we're going to run that again. And then while that's running, let's take a look at what our servers are doing. So this is Web 2. This is Web 1. Load is a little bit lower, you'll see, than what it has been in the past. Before when we were overloading the servers we were really just totally spiked. They're getting used pretty heavily, but still not a ton. Looks like we might have finished already. Yep, so we're already done. That's, I'll add that to our list here, Varnish. That was 400. Oh wait. I think I might have had another one of these kind of blips in the network. Yeah. That's not normal. That doesn't happen. This is why this is a terrible benchmark. Usually that would be smooth. So just pretend you don't see that giant spike there. Anyhow, while that's running we can also look at Varnish. Just like Nginx, it's not even breaking a sweat. You put a bunch of RAM in your Varnish machine and you make sure it's got lots of network capability and it'll handle a ton of traffic. So it looks like it's wrapping up now. So that was 400. These results aren't great, but we'll add them in. I've run this like testing it a million times. So just take my word on it, but normally it's about the same as far as the response time and slightly better request per second. So 360, wait a minute, 476. I got these backwards here. And 364.8. So while we have this up, let's look at some other kind of cool things about Varnish. I've got a few minutes left here. So I'm going to go in and instead of looping over this 30 times, we're going to loop over it 100 times to give me some time to show you what's happening on the servers. So I'm going to start off our tests here and I'm going to go back to our load balancer and this will let me go into our Varnish server, our Varnish container that's running. So one kind of cool thing Varnish comes with is this Varnish histogram. This is showing us in real time the requests that are hitting the server. The dashes or the pipes, the vertical lines are ones that are hitting the cache. You see that 1e-5, that's 1 millisecond. And then the hash marks are the ones that are cache misses and hitting our back end. So those are getting returned around in the neighborhood of a second. It also has Varnish top. No, that's not what I want. Varnish stat. So here you can see hit-miss ratios, number of connections and all that. So a big performance win is basically just letting Varnish serve more of your cache and you can track that. It does a really good job of showing you what your hit-miss ratios are. And this is the really awesome thing. So what I'm going to do now is I'm going to kill Web 2 there and I'm going to kill Web 1 and let's see what's happening. So take a look at this. Our hot pages are not errors. We're still serving content on all those pages. So you've totally screwed up. You've deployed a massive breaking change to your live servers in the middle of being on the front page of Reddit and Varnish is still chugging along serving your content. And 10% of your users are getting errors. Yeah, that's bad, but your important content is still up and running. So you can see the error ratio shooting back up on all these, but Varnish is still serving our homepage and those 50 profile pages in two milliseconds. So we can spin back up our Web servers and Varnish is going to reconnect to them and basically fix all that stuff. So use Varnish. It's really great. That's the lesson of this talk. So we did about 450 requests per second with Varnish. If you were to do that sustain for a day, that's 40 million requests in a day. There's people that do a lot more than that in a day, but they do it on a lot more than three servers. And if you were to be on the front page of Reddit or something like that, that'll get you by just fine. You could probably do it on a lot less. It wouldn't be surprised if you could do it on one server running Varnish and Android application. So pretty good results from where we started with, hang on, so we started, there we go, with RunServer at 28 requests per second. So like I said, Django doesn't scale. Don't use RunServer in production. Varnish scales. Use Varnish. And yeah, that's all I have. So like I said, we wrote a book. It's called High Performance Django. You can check it out at highperformancejango.com. I also have a few copies here that are loaded on these nifty little USB keys. If anybody wants to ask questions, you can win a copy. Thank you.