 My name's Joshua Ginsberg, some people call me Jag, and I'm going to talk to you today about running Django under heavy load. So everything about Django focuses on getting you to 1.0, because if you don't actually have an application and it's not in the wild, you're not really doing anything. But once you hit 1.0 and everybody loves what you've done, you suddenly find your servers under assault. What do you do to give yourself a little headroom? So I'm going to give you some ideas. So first of all, who is Celerity? And who am I? Celerity is a business acceleration consultancy based in the DC area. We work with a lot of very large enterprise in that area. Personally, I'm the principal Python architect in charge of the Python discipline at Celerity, and I've worked for Social Code, Free Software Foundation, and a division of match.com. So I've had a fair amount of experience in running under heavy load. So what we're going to be talking today. First we're going to talk about just the logical problem of drinking from the fire hose. We're going to talk about measurement metrics, the difference between sledgehammer solutions and scalpel solutions, how to analyze the risk and potential disaster of any particular choice. And then I'm going to talk about three very common bottleneck problems that people run into and some ideas on how you might be able to solve them. So let's start with drinking from the fire hose. Congratulations. Now it's time to panic. So ultimately, when you have more traffic than you know what to do with, you need to chug faster. And there's five logical ways you can do that. You can handle fewer requests, which might sound like the opposite of what you want, but it actually can help. Ask fewer questions during your requests. So talking to your data sources, if you're not asking them as much, then they don't have as much to answer. But if they do answer, you want them to answer faster. You can also increase concurrency. You build capacity by doing things all at the same time. Or you can offload processing to your client. These all seem very obvious, but let's go into it a little bit. Handle fewer requests. Downstream caching can reduce your web head load tremendously. And downstream caching is better than upstream caching for this. If you are caching entire pages in your upstream cache, you're doing it wrong. The trick though is don't serve leftover salmon. What you do serve needs to be fresh by whatever your application and business needs definition of fresh is. And cache invalidation is one of those really tricky problems. Asking fewer questions during requests. This is where you optimize your database queries. You select related and prefetch related. You minimize the unnecessary queries and you work on denormalized and prepared or materialized views. Answering questions faster during your requests, that's where you use the upstream cache. Upstream cache is great for not having to ask as many questions to your database. You put on your DBA hat. You do your best to optimize your indexes, to analyze your query planner, and to tweak your server configuration. You don't lean on the database when you don't have to. I can't tell you how many instances of high load sites I've seen still store sessions, salary task results, and messages in the database. It turns out that those can be something like 80% of the rights you end up doing to your database. Consider multiple data stores and not just multiple data stores but multiple kinds of data stores because different data engines are better for solving different kinds of questions. If you can, shard, route, repeat. You can increase concurrency by throwing iron and consequently money at the problem, or if you have an application which isn't necessarily CPU bound, switching to an evented IO handler could absolutely increase your concurrency. You can create your own legal botnet by offloading processing to your client by pushing the template rendering code onto your client's browser, and putting more of the computation in that distributed computing you take the load off your own servers. The added advantage is that you get something up in front of the user's eyes faster. When I was at match.com we found that a 100 millisecond reduction in our page load time increased our sales 4%. Simply getting something in front of the user's eyes faster can actually make it feel faster and make it happier user. So where do we start? You need to identify your bottlenecks first, and you can't identify your bottlenecks if you don't have metrics. So prescribing a remedy requires diagnosing a particular bottleneck because every system, anything you're running anywhere in your ecosystem could be that bottleneck, and one bottleneck will bring your entire application to a halt. But proper diagnosis means you actually have insights and metrics and visibility into what your application is doing and where. So what tools? New Relic. I love New Relic. I love New Relic so much, and they're not even paying me. New Relic is awesome, and you should be using New Relic. There is no better generalized platform for getting a first look into what your application is doing, what your systems are doing, what your network is doing, where you are spending your time, what your users are asking for, than New Relic. If you're starting to diagnose a bottleneck, if you want to make sure you see them before they start, start with New Relic. I also like to use StatsD. StatsD is an open-source Node.js daemon released by Etsy, the arts and crafts eBay thing. StatsD uses a UDP protocol for counters, gauges, and timers, writes that information into a round robin database called whisper, and then relies upon a Python library called graphite in order to draw pretty graphs. Graphite is awesome. You can take any one of your metrics, any several of your metrics, put them on the same graph, use whatever statistical analysis you want, make one of those really kind of cool dashboard things to hang up in your office so when people come in they're very impressed, but you can look at your key performance indicators because you know your application and you know the things that might be your problems at any given time, and that's something that gives you more granularity and more application specificity than New Relic can. Additionally, BrightCove released a Python set of tools for system monitoring, CPU, memory, disk, network, all of that called Diamond, and I can also highly recommend that. Third, you need to be doing SQL statistics analysis. If you don't know what your database is doing, where it's spending its time, how many table scans you're doing, how many index scans you're doing, how many row scans you're doing, if you don't know what your database is doing, you can't figure out how to help it do it better. So in Postgres, you need to look at the PG stat activity table that has basically everything you need. And my SQL show global status does effectively the same thing. If you're using Oracle or Microsoft SQL, hire a DBA. It's complicated enough and you can afford it. And I don't know why I'm even talking about SQLite. Bottom line, if you don't know what your app is doing, when, how often, how fast, or how efficiently, you're flying blind. Because again, the bottleneck could be anything. It could be any one of your system metrics. It could be any one of your network segments. It could be any one of your services. It could be the application you yourself wrote. So you need accurate, detailed, insightful metrics in order to find the bottlenecks that are slowing your system down. Most of the solutions that can generally be prescribed are sledge hammers. Sledge hammers are broad, blunt objects that work in a variety of situations but lack any sort of fine touch to them. When we're talking about Django under load, unfortunately a lot of this is very, very application specific. But this is where you need to be the engineer and not just the developer. So blunt objects will get you done for a little while. But you really need to know your application inside and out to know the places where you can alleviate that pressure. And you're not trying to fix all of your problems at once. You're trying to buy some time until the next problem rears its head. Because the next problem will rear its head and this is a never ending cycle. Everybody dreams of doing a 2.0. I have yet to find anybody who's really actually done it. So some problems are common, asking around, asking other engineers, things that you've run into. You'll see the same kinds of problems and the same kinds of solutions being used. Search engines are fantastic for broad data scraping. Databases doing n-gram matching is not exactly a great idea. Columner databases are coming back into fashion, which is fantastic, because they're fantastic for numerical aggregation of data. Key object stores are great for single record access. If you find yourself going to SQL to get single records all the time, you might want to look into something like that. Doing query optimization can give your database more headroom than you knew it was capable of having. If your domain has naturally shardable data, shardability is fantastic. And you can keep in mind that not all data stores need to be available for live queries. If you're running Hadoop HBase, if you're running Neo4j, a lot of the time those databases themselves are not available for live querying by users. But instead, repeated queries create views against those data stores and save them into something more instantly accessible. But ultimately the point is to alleviate bottlenecks. And by alleviate, we don't mean shift them somewhere else. We mean actually adding processing capacity to your site. So the shifting bottlenecks is actually a very common problem. You can alleviate the problem one place by stuffing it all someplace else. For example, if you horizontally scale your number of web heads, double your number of web heads, well, that's great. You just put double the load on your database server. If you switch to an event at IO, you went from having 3G unicorn traditional threads to 100G event threadlets. So suddenly you're putting 30 times the number of connections on your database. And each connection has a memory cost, you need to be prepared for that. So what does a good solution look like? Because again, it's very application specific. A good solution is surgical. It narrowly addresses your bottleneck. It's the minimally intrusive way to do so. In building a new application, we always go for an 80, 20, 90, 10 rule, where 10 to 20% of the effort achieves 80 to 90% of what we want. In this case, you're looking for the 99.1. You're looking for the place where you can touch 1% of your code, and get 99% of the bottleneck removed. You don't want to add substantial operational risk or cost. And you need to know that whatever it is that you are rolling out, you are able to test, your developers are able to test against, you're able to migrate your data into, and that you have a disaster mitigation strategy. Because running in that sort of high volume production environment, you have to be absolutely conservative about your operational strategy. So you need to know how you're going to test, how you're going to migrate, and what happens in the worst case scenario. And you need to know that you have a plan that you can actually accomplish. So how do we do that risk analysis and plan for that disaster mitigation? Because every new system that we add, a new database, a new caching layer, any sort of new system we add has a cost. And it's a cost because it's a new point of failure. It's a cost because all of a sudden you might be introducing new race conditions that are nearly impossible to debug. It makes for more difficult testing. Your developers suddenly have to put more systems on their dev instances in order to make sure that they're working correctly with these other systems. You have more complicated troubleshooting because you're looking at more logs, you're looking through more systems, and you're looking at more interactions. And because the number of interactions between more system increases, not arithmetically, but multiplicatively, each new system you add has a greater and greater cost to the complexity and the risk of the function of your application. So important questions to ask are how are we going to bootstrap this into production? You've got everything in a SQL database right now. You want to bring Elasticsearch online. You've got to get your data into Elasticsearch at some point. Also, while you're getting your data into Elasticsearch, you're putting more data into SQL. And you've got to get that data into Elasticsearch, too. So bootstrapping any new systems, any new data stores, requires a lot of forethought and planning. How do we recover from disasters? Most of you know I love Redis. I don't let Redis write the disk. I don't like letting Redis write the disk. But if Redis crashes and it loses all the data that it has in its memory, I need to know what my plan is. I need to know that my ops guys aren't going to be woken up at 3 in the morning because suddenly the site isn't working. How battle-hardened is the solution? If you're installing something editable off GitHub that doesn't even have a 0.1 on PyPy, you're doing it wrong. You need to go with solutions that other people have tried and tested and found to be reliable. And even better if there's a company behind them that provides commercial support. Because somebody who knows the third party systems that you're using inside and out can provide customization, can provide tuning and can help you minimize the meantime between failures and the meantime to resolution. How much does it cost? And I'm not even talking about licensing cost. I'm talking operational cost. You have to set it up. You have to build chef configurations for it. You have to put monitoring in for it. You have to develop new metrics for it. You have to have your operational staff understand how to use it and build a runbook for it. You have to provision new servers for it and new network legs. Make sure you understand what it is that new systems you're bringing online are going to cost you. And it deserves asking, are we just shifting the bottleneck someplace else? Are we taking the pressure off this group of systems in order to simply put it on this other group of systems? But those are the questions that you should be asking yourself before you put in place any of these bottleneck mitigation strategies. So let's look at an example. Simplest one is throw more iron at the problem. You can go for bigger EC2 instances, bigger databases. You can go for provision IOPS. You can double your number of web heads. You can do all sorts of things to throw more money at the problem. You're not introducing any new points of failure that you didn't already have. You're not really introducing any new race conditions that you didn't already have. Testing seems to be pretty uncomplicated because you've already tested these configurations before. And it's very easy to bootstrap. Most installations can auto-scale and scale up, scale down as much as they need to. But it's expensive. And it's increasingly expensive. And nobody wants to have to justify why you suddenly went from spending $5,000 a month at AWS to $25,000 a month at AWS. Additionally, certain kinds of solutions can have diminishing returns. So adding a read-only SQL slave is a fantastic way to take load off of your master's and all of your reads to the slaves and the rights to the master's. And everybody's happy. But with each SQL slave you add because they have to continue to write the replication log, you get diminishing returns. And so after a while, that solution doesn't work for you anymore. And lateral scaling may be bottleneck shifting. Because once you widen the base, increase the number of things talking to data systems, network systems, web services, you're increasing the load there and you need to make sure that those systems themselves don't suddenly become bottlenecks. Another example, you want to add elastic search to afloat expensive SQL queries that really are clearly searches. I contains is not good for everything. But you're introducing a new point of failure. You're introducing a new technology. You need to bootstrap this new data system. You need to index your entire database. If you're using Haystack, that's a lot of database load that suddenly you have to query against your database. Index update lag could introduce new race conditions or if alternatively you do real-time search index updates, you could be slowing down a lot of the rights in your system. Testing seems pretty straightforward against elastic search. And it's reasonably cheap and scalable. So if these are risks that you're willing to take and solutions that you've found a way to plot around for your specific application, it might be a good solution to mitigate some of the database load. And the first and most common bottleneck is usually the database. Usually you're over querying. You're asking way too many questions. You probably didn't design your indexes right because what you thought you were going to be asking your database, you aren't actually asking your database. You probably didn't tune your database because before you have problems with scaling, you don't really think about doing that. And you probably never thought about sharding because you don't prematurely optimize. But that might be a good solution for you. So are you over querying? A lot of times in our Jango code we will use the object exploration features of the ORM recklessly. Prefetch related and select related are very helpful here. You should know the difference. I'm not really going to go into them because I think there are three talks about them at this conference. You might be doing queries inside for loops. You might also say, no, I'm not. But then figure out that you're calling a function inside a for loop that itself does queries. Or you could be writing really bad queries. So let's look at some examples. Are you over querying using reckless object exploration? These are all actual based on real world scenarios. Photo set as JSON. We grab a photo set and we build a dictionary of JSON representations of all the photos in there, as well as the owner of this particular photo set. So in this particular first example, we're doing three queries. We're doing a query for the photo set. We're doing a query for all of the related photos. And we're doing a query for the associated user. But if we bring in select related and prefetch related here, we can cut that down to two. We get all of the photos associated with the photo set and already pre-cached on that query. And we get the user object selected, select related. And so we eliminate that particular query. One big pitfall to keep in mind, and this is noted in the docs, but on line four over there, I still have photo set all. You cannot filter. You cannot order. You cannot do anything which clones that query set because you will lose your prefetched cache, in which case you actually just made your application slower. I see four loops all the time. Four loops make it very readable. It's logical to iterate over four loops and not think about what your queries are doing to the database. So for example, in the same application, they wanted the photo sets for a user as well as the most recent photo in order to make a thumbnail. Well, in this case, they're doing n plus 1 queries for every photo set for the user. That's ridiculous. This is a great example of where denormalization can come into play. So if you were to, for example, on photo set, create a new foreign key to photo, and either on photo save or with a SQL trigger, update the reference, since we're always going for the most recently created. We always know that it's the most recently created, in which case as soon as you grab the photo set, you already have the reference to the latest photo object. And you can grab it using a select related with only one query. Your queries might just suck. This is another real world example. Grabbing episodes related to a television series ordered by title that are happening in the next n days. So it might be more readable the first way. I mean, we are like the generative syntax, the four in if. But it's really much more database inefficient. I can't even count how many queries are in there. Rewriting it, focusing on the episode first, is a far more efficient way to do a single query and get all the data that we're getting out of there. Now a special note on this one, it's a little known fact that in Django, the top one was written for Django 1.4. So it's not their fault. The series to episode relation is a generic foreign key generic relation. In Django 1.4, you could not query across those. In Django 1.5, we suddenly silently introduced into the query engine the ability to do multi-column joins. And the only place we ever exposed this was in the generics library. However, we didn't do it all the way through. So in 1.5, even though these kinds of queries were used in the multi-column join, you couldn't use the double underscore join syntax to filter across them. In 1.6, you could, except you couldn't do it in the reverse direction. In 1.7, you can now query both directions. So it's not really that developer's fault, but by upgrading to, well, actually, we're running Django 1.6 without one patch backboarded. But by running Django 1.7, you can change your query around, lean on a multi-column join, and have a much more efficient query. Are your indexes appropriate to your questions? So you need to know the database you're working with. You need to understand how the database planner is picking indices to use. You need to know how these indices can be used for different types of queries, what indices are going to be more effective than others, and if you are using any indices that you're never leaning on at all. So first of all, know your database engine. As much as they all speak SQL, they all have very different uniquities about them. For example, my SQL limits a pro index size to about 700 bytes, 700 something bytes, which might sound fine unless you have a unique constraint against several large barcars. You could exceed that very quickly. N-O-D-B is terrible at subtree locking during writes. So when you're updating an index in N-O-D-B, if the node you're updating isn't a leaf, it actually locks the entire subtree against any other writes, which is why my SQL has far more deadlock contention than almost any other database engine. I might say PostgreSQL with its infinite row limit could absolutely crush that, but PostgreSQL doesn't have an infinite row limit. It only has an 8k per column per row, and it does some magic underneath the hood called a toast table, where if you put, say, a text field or a blob field in there, it chunks it into 8k blocks and stores them as separate rows in a magical alternate table. When you query for that text field, it joins against it and recomposes the data for you on the fly. Don't always need to query against your text fields, but you can save yourself a join in a lot of processing time if you don't have to get the XIF data every time you query a photo. Multicolumn indexes are highly underused in Django, partly because they aren't exposed in the model API as something that you can create. But what's interesting about multicolumn indexes is even if you have multicolumn indexes, you can still use the same index for queries that constrain against subsets of the member columns. So if I have a three column index and I want to query against columns, constrain against columns 1 and 2, I could actually use the three column index to do that. In my SQL, the order matters. So I can't do 1 and 3. I have to do 1 and 2. In Postgres, I can do 1 and 3, but it's not as performant as if I do 1 and 2. But putting DB index equals true on 80% of the fields in your table, you might actually want to see which ones you're querying against together and build multicolumn indexes in order to reduce the number of indexes you're having to create. Can't tell you how many times I've seen Boolean field DB index equals true. That column is a cardinality of 2. You're not getting anything out of that index. So if you have a field in which you have eight different values, it's an enumerated type. It's ridiculous to build an index on it because the index isn't going to save you anything. More indexes means slower writes because you have to update every single index. And it means greater lock contention because you're competing with other writes. And it's silly because when the query is executed, the database engine only uses one index per table per query. So even if you're constraining on three columns, and all three columns have single column indexes, you're only using one of those indexes in your query. You're not gaining anything by putting indexes on the other two fields unless you make it a multi-column index. And if you want to know what indexes you're leaning on, you need to learn how to use the query analyzer. It is honestly your best friend in making your database faster. Every single query that you're running on any kind of regular basis, you should explain it. The explain syntax will tell you exactly how the database plans to execute that query the next time that you run it. And by just simply going to print query set dot query in the Django shell, you will get the exact SQL that Django is about to send to your database engine. Of course, the output of it is somewhat cryptic. In PostgreSQL, they talk about different kinds of index scans. So a plain index scan scans the B tree of the index. And for every row, then goes and gets the associated tuple from the table. An index only scan all the data that it needs to query is in the index, so it doesn't actually have to go to the table. And a bitmap index scan, it actually goes through the B tree instead of going tree, row, tree, row, tree, row. It pulls everything from the tree, hashes them into a table, and then goes and gets all of the associated tuples from the table all at once. Those are good. It's great. You're leaning on your indexes. You are getting index scans to get rows faster. If you're getting sequential scans, congratulations, you are scanning your entire table. You better hope it doesn't have a lot of rows in it. So if you're doing a where constraint against an unindex column, and that's the only constraint you have in that table, you're scanning the entire table to make sure you have all those rows. Those joins, though, will give you loops. They can be nested loops, in which case you're looking up candidates that meet the cross-join constraint, or they can be hash joins, in which case, similar to the bitmap index scan. You're building a hash table in order to reduce the number of lookups you have. There's a lot of different terms in the PostgreSQL query planner. They're all extremely well documented. All the core developers have written detailed explanations of what exactly is going on in the hood. And you can reduce the complexity of your queries just by trying different permutations. You also need to tune your database for what you're doing. Nobody should be using an out-of-the-box configuration for your database. If you have not tuned your database, you're doing it wrong. But resist the temptation to simply jack up all the settings. The biggest reason why is that most of those settings are per connection. And at scale, you're doing 200 to 500 database connection concurrently. So if you tell it that you want to have a join buffer of 50 megs, and you're doing 200 connections, you're going to OOM your entire database. And that's not going to be good. So you need to actually consider which of those metrics are per connections, which ones are per server, which one of them addresses the types of queries that you're running and which ones don't. And tune it very carefully, incrementally, and test the impact on the performance of your database. Larger queries, sometimes every join and sort needs memory in order to do that compilation. For larger queries, both MySQL and Postgres use the disk. They write out temporary tables to disk and do the sort, or the join, on disk. If you can do them in memory, it's a lot faster. But you have to know what in your query is going to force it to do an on disk join or sort. For example, in MySQL, if your query includes a text field, you're going to disk. Every time, it doesn't matter what you do. You're going to disk in order to do that join and sort. Understanding what it is in your database engine that's going to send you to disk versus doing it in memory can help a lot. I know almost all of us are in the cloud. But if you do run your own iron, you better be battery backing up your RAID controller on your database. Because then you can go into the database. Out of the box, the database won't consider the transaction complete until the physical medium of the disk has said, yep, got it. When you have a battery backed RAID controller, you can tell it as soon as the kernel has accepted the right, not necessarily the physical medium, you can move on. And if the power suddenly gets cut to the machine, the battery back on the RAID will continue the right buffer until everything's out of it. Also, to make your temp partition faster, use SSDs. At the match.com subsidiary, we found, if I remember right, 20% to 30% increase in the performance of our database server just by switching to SSDs for the temp partition. And it's not that much space. You can use small ones. Should you be sharding? Well, some data sets are naturally shardable. Perfect example would be Flickr. Every user's photos, photo streams, comments, all of those really can be contained to, you're only going to look at them in the context of this one user. So you can put almost all of one user's data on a single database server and break it up across many different data servers. Django DB routers are perfect for handling this routing. You can round-robin based on some identifying information inside that record to know which shard you need to be working with. The unfortunate thing is this means you need to switch to using UUIDs as your primary key as opposed to incrementing integers. Otherwise, you will not have unique primary keys across your multiple shards. This can cause some problems with third-party databases because the Django foreign key model is basically expecting it to be a positive integer on the other side. If you do switch to sharding, plan for the future. So if you take your one database and split it into two, what happens when you need to split it into three? You're talking about moving data between the shards. You're talking about shifting the algorithm by which you assign the shards. You need to make sure that the data you are sharding that you have a plan for when you need to have more shards. And finally, if you are sharding, you should never be doing cross-shard queries. Even though there are ways to do it, PGPool2 will do it. It's not a good idea. What you ought to do is any data which involves cross-shard queries you should either denormalize or store in an alternate data store. So for example, in the Flickr example, if I want to list all of my friends, then that probably crosses shards. If I want to, anything we are listing multiple users is definitely going to be crossing shards. So store that, a copy of that, in an alternate data store so that when you need to hit that, you're still only hitting one particular data store. And the third major strategy, well, second major strategy we're talking about is opening up the ORM marriage. SQL is great at what SQL is great at. And for 99.9% of all applications out there, you will never need anything better than SQL. SQL is easy to make asset compliant. That durability part is the really important one. It has an infinitely flexible query language capable of constructing any number of ways to look at your data in ones that you can't even think of right now. And the ORM is awesome. It's been around a long time. It's great. It's easy. It's fantastic. And we love it. And the RDBMS is space-age technology, back when we actually sent rockets into space. Since the 1960s, we've been perfecting the RDBMS technology and it's extremely reliable versus something which was invented in the last couple of years. But at the same time, SQL is not great, the things that SQL is not great at. SQL sucks at aggregation. Columner databases are far faster at doing that. SQL sucks at search operations. iContains has a very limited use. SQL sucks at single record access, whereas key object stores are far better at doing that sort of thing. SQL sucks at highly dimensional data, because you're involving a whole lot of joins, a whole lot of duplication, a whole lot of iteration, whereas a database which supports highly dimensional data out of the box, a document database, might work much better. SQL sucks at graph traversal, because you have an unknown number of joins. Neo4j and Titan are fantastic solutions for if you have a database view that is better looked at through a graph. But the tricky part is that you need a single source of truth. You cannot have multiple data stores without a single source of truth. That means the authoritative record, the thing that ultimately, if there's any contention, you fall back to. The single source of truth is what you use to bootstrap other data sources. The single source of truth is what you use to recover from disaster. Anything that's an alternate data store needs to be considered a derived data store. It needs to be considered cache, denormalized, dimensioned, something you can throw away and rebuild. The hard part about multiple data stores, especially across different architectures, is race conditions. So for example, if you are keeping a count of something in Redis that our individual rose in Postgres by incrementing and decrementing a counter, it's very easy to have a race condition where your counter comes out of sync with the rose that you actually have in your database. Personally, I use Redis even that for locks. There's a pretty well-tested set of atomic operations you can use to do timestamp-based locks, because we're all running NTP. And oh yeah, timestamps are reliable. But Redis as a single threaded server with atomic transactions is probably the easiest and cheapest way to ensure locking across a subset of your data. And if you do go to other data stores, don't forget that we're all engineers here, and that we're all going to be hiring new engineers, and we're all going to be teaching this to other engineers. So if you actually do this, model your data sanely, appropriately. And the best you can, follow the example of Django, because you're going to be hiring Django people in order to work with these alternate data stores. Haystack is the most fantastic example of it, the way that it mirrors the query structure of Django about as closely as it can in performing searches. It's a very shallow learning curve in order to switch from a Django query set to a search query set. And the last general strategy I'm going to suggest is that 200 is OK, but 304 is awesome. So a basic refresh of how downstream caching works. Responses from your origin contain three interesting headers. Last modified, eTAG, and cache control. The downstream caches make decisions about freshness based on what you returned. So when clients come back to that caching server, they're going to provide the headers, if modified since, for the timestamp, and if none match for the eTAG. If the cache has a copy that it deems fresh, meaning the max age is not expired yet, and the corresponding timestamp or eTAG from the request header's matches, it simply returns the cache copy. Doesn't even need to talk to your origin. But if the cache has a copy that is stale, meaning max age has expired, it doesn't automatically go and pull a fresh copy. It actually hits your origin server again with if modified since, and if none match. If your origin server comes back and says 304, the cache considers its stale copy fresh again, and it serves it to the client. Quick caveat, because I was wrong about this for good long while, in cache control, you can say must revalidate. That doesn't do what you think it does. It does not require a caching server to revalidate the data that you have every single time. Max age equals zero is the only way to do that. But this requires going beyond the Django cache middleware. It's a blunt object. It saves bandwidth, but it doesn't save processing, because it has to go through the entire view to figure out if the eTag has changed. The eTag is based on a hash of the content that comes back. You know your views. You know what data it's accessing, and you know how that data is related to one another, so you can do better. And so by wiring into your views shortcuts, to simply check to see if the data has changed since the last time the user asked, so you can return a 304, you save yourself a ton of processing time, you save yourself a ton of database access. But this is tricky. And of these solutions, this is the trickiest. Cache invalidation is hard, because it's not just keeping track of the timestamps of individual records. It's keeping track of the timestamps of every record you're going to return in a particular view. So one way to do this is with Redis. You can keep the modification timestamps for particular objects in Redis. And again, you have to be careful about updating related content so that when a new request comes in, all you do is check the timestamps of all the required records in Redis and whatever the maximum last modified timestamp is, is what you need to compare with. If that matches the request coming in, you can just say 304 and be done. You could also keep an updated date timestamp on each one of your models and touch them across related objects using either signal handlers or overriding save methods in order to, again, figure out what is the max updated timestamp so you can quickly return a 304. If you want to make sure, absolutely, that you never serve stale data back to your users, you can return your cache control with a max age of zero. That forces any and all upstream caches to come back and revalidate every single time. But there's some fairly bad pitfalls about this. You can't use per user cookies to determine content because multiple users will be getting the cached content. And you're not varying just based on the cookie. You could vary based on the cookie, except if you're using any analytics or advertising platforms, they're changing the cookie every single time the user hits the site. So you're basically not using your cache at all. So this makes using per user or session preference data extremely hard. It makes mobile versus desktop detection extremely hard. This makes local specific data that isn't represented in the headers, like time zones, extremely hard. Those workarounds, the browser needs to be the one to look at the cookies for the user's preferences and not Jango. By the browser looking at the cookies, it can take the data out of the cookies and include them somewhere in the URL string of what it accesses over XHR. In which case the cache server, since the unique data is in the URL, will vary the content based on those values. And on top of that, it also means that you can provide different cache lifetimes for the surrounding Chrome, which probably won't change so much if it doesn't actually have the user-specific data in it, versus the XHR request, which does have very user-specific data, which will probably be more volatile. So then we train back around the station. We're talking about chugging faster. You can handle fewer requests by leveraging downstream cache more smartly. You can ask fewer questions during requests by optimizing the queries that you're asking. You can answer them faster by making sure you're leaning on appropriate indexes and that you've appropriately tuned your database server. You can increase concurrency, in some cases, with evented IO. And you can use JavaScript and inside the client's CPU in order to offload some of the processing and take pressure off your web servers. So lastly, the pitch. You want to work with me. I've got eight Python engineers, and they're the eight finest Python engineers I've ever worked with. We hire rock stars, and they're all really good. But we have not just Python folks. We have a .NET shop. We have front-end JavaScript. We have graphic designers, UX, information architects, QA, APMs, and all of the projects that we work on are with large, influential, and often public-facing websites. So if you are interested in working for a cool consultancy in the DC area, please do feel free to write me an email. And if you've got anything else, write me anyway. Thanks.