 This is a simple hack to make your Django website faster by Jitinder Agarwal. Jit is the CTO of Slide Rule. He is a graduate from IIT Bombay and has been building complex web products for 15 years. Am I audible? I was very impressed with the last talk, colorful presentation. This is my first IIT talk. I'll be compelled to buy an Arduino and Raspberry Pi now. That's a big hole in my pocket. So I'm part of the Slide Rule team. You will call me Jit. That's the Python way of renaming variables. Let's get down to Agenda. So Python or Django will try to, since it's a Python, will sway to our tune. I know they can't hear, but we'll still try something. I'll try to say cache 200 maybe more times. We'll discuss why websites are slow. We'll look at some optimizations, basic optimizations that Django queries offer. And then we'll get down to caching. And what Django offers. What other types of caching can you use to make your website seem faster? And then we'll get down to Q&A. Why websites are slow? These are just some of the reasons. Bad code, bad indexes. Bad indexes is very, very common, surprisingly, but it is. And too much traffic. You probably have five servers and you are trending on Reddit and you need 10. So yes, you may not have ample hardware or your architecture just supports one server. You did not consider distributing, right? And now you're bad host. And there are many other problems. These are just some of them. We'll try to tackle maybe just too much traffic in this talk. Before that, yes. If you have a site that gets like 20,000, maybe 100,000 visits a day, you probably don't need a lot of caching or optimization. You can probably even do bad code and it will still work. So I had one case where I had a Django website and I was running on an economy $3.98 server on GoDaddy. And I just decided that it makes sense for me to just generate static pages from Django. And I was hosting my sites and we could do 300,000 visits per day easily without doing much. Because I decided to generate static pages. So yeah, pre-mature optimization is the root of all evil. If you try to optimize your code before your features are developed, it's not good. So let's look at some of the SQL optimizations. Before that, you want to... Can I assume that all of you are familiar with Django? Who uses Django here? Oh, wow. Nice. So no converts today. You're already following Django. Yeah, so the first thing is, as I said, your SQL queries might be slow. You want to find out what SQL queries are running and ORM kind of hides it from you. I read a quote somewhere saying ORM basically fools you into believing your coding and you forget SQL. So this is a simple method. You just add a login configuration in your settings.py and you would be able to see the SQL queries. Or there is a easier method. You can just install Django toolbar and click on the SQL queries tab and you will see all the queries for a particular page. One of the first problems we can solve is double evaluation of a BigQuery. So you want to get count of a product. Let's say this is a simple model with products and it has some fields. I have not shown them here. I didn't have enough space. I looked at... I attended the first talk and I could not see the code. So I removed a lot of code on the fly. So yeah, this is a simple model. It has a count similar product method. This return and the space is actually return. This returns the whole count. These are not two different lines. And you will probably do the same thing in your view. Then you will probably do the same thing in your... in your templates and maybe the count could be printed twice in the templates. So your query is being evaluated multiple times. The same query unfortunately is running multiple times. That's not good. So you can probably save that simply by using something called cache property. You can cache a property and now when you run the same thing, the cache property will actually come at just above count similar products method. It's a decorator that will cache the property for you for the life of that object. Every time you call count similar product for a product, it will simply go ahead and return that variable to you. It will not count again. If your database changed in between, you will not get it. But for the life of that object, for that query, for that page load, if you are doing the same query again and again, which would be very easy to find if you are printing a SQL queries. So that's the first thing you can do. I don't think you can really overdo this. But yes, anytime you are calling any foreign keys, many to many keys, this works brilliantly. Then let's say another model. We have pizza and topping. I think I copied it directly from the Django documentation. And you have a topping and a pizza and you are trying to use a simple STR method that also prints the toppings names along with it. Every time you are trying to print the object pizza, you will be doing all these queries all over again. In fact, what will happen is you will do one query for pizza object and you will do, say, if you have 10 toppings, you will do 10 queries for toppings object. Even if you do a cache property, you have done, let's say there is one pizza with 10 toppings, you have done 11 queries. Even the cache property will not really save you much. I will just ensure that every time you are printing the pizza's name, you don't do those queries again, but you are doing those 11 queries. So what you can simply do is you can do select related toppings. You know that you are going to use toppings in future, so you can select them at the go itself. So any foreign key, if you are planning to use the foreign key later, I recommend doing this. Any product or any order has products, products are foreign keys. I recommend that you do a selector. It will basically do a join. So there is a join query that will be sent to your MySQL or Postgre. And you will save those 11 queries, one for pizza, 10 for toppings, and you will compress that to just one query. That does not work for generic foreign keys. This does not work for many to many fields. And for that we have prefetch related. So when you do prefetch related, it will get those toppings and it will actually do the join in Python. So it will do two queries, one for the pizza, one for all the toppings, it will do the join. So yes, prefetch queries probably should not be done when you have a whole lot of data. But one, if you just had foreign key, 11 queries would get reduced to one single SQL query. If you have many to many queries, many to many fields, 11 queries would get reduced to just two queries. That's simple basic SQL optimization that I believe everybody should be doing doesn't hurt much. Now let's get to caching. So we saw that basic SQL could be basic SQL queries or large queries could be cached with cached properties. The caching that I'm going to talk about is slightly different. But yes, it is caching after all. What is caching? It's a process of keeping often requested objects closed. So cached property, what it does is basically assigns an attribute when it is called the first time. Whenever it is called the next time, the decorator will basically check for that attribute. If the attribute exists, then it will return that value of that attribute instead of calling all the queries and the method again and again. So one such example is Chetan bucket books. If you go to any big store, they'll keep them very close to the counter and cashier, including razors and stuff which are very frequently requested. I'm assuming Chetan bucket books are very frequently requested. And almost all computer systems, you use caching at many different levels. Your processes would have L1, L2, L3 cache. Your rotating disks would have caching. Your OS users caching. Your STD IO library uses caching. DNS, your cache servers, databases, browsers, and websites. That's what we're talking about today. So caching is essentially, if you're trying to do something again and again, and you're trying to use resources for that process, you can probably just keep results of that. You can execute the process one, keep result of the process saved somewhere, which is fast or faster than your normal way of executing that process, and then use that saved value next time. Let's see how Django can help you with cache. Django offers a whole lot of different caching mechanisms. You can cache the whole site. You can use the caching middleware. You can cache views. If you know that your homepage view or contact view or team view does not really change until you fire someone, you can use caching views. If you're not happy with caching the whole view, if every user gets a different value, then you can cache different template fragments so that a user loads your site again and again, but the values are not changing. You can serve the same value to them by using template fragments. Sessions, raw objects, and Django offers many types of caching backends. You can use cache t-readers with the right libraries installed, packages installed. You can also cache in database. So you can take a full page that took two seconds to render. You can cache it in database, the full rendered page. That is possible. I would prefer caching it in memory over databases. But yes, it is possible. So a simple way to use caching, you define cache's dictionary in your settings.py and you say where your caches are lying. I'm using a memcache t-example here or you can use database cache. There's an indentation error here. There are two different examples. So you can use memcache t-readers. You can use file system. I've used two here. And in case of memcache t, you can use multiple locations to work with a memcache t-cluster. All right. Site's caching is kind of useless. If you really need site's caching, you probably, as I said, generate static pages and that's good enough. But yeah, per view caching works is usable sometimes. So for example, this popular view that I'm talking about, you can just say that this should be cached for 600 seconds. That's seconds. You can use a variable there. You can use a setting. A variable can be defined in a settings. We use the dictionary at slide rule. So you can use that. It's not very useful if you want to serve pages based, different pages. Same URL should serve different pages to different users. We'll get into how you can do that. But yes, if everybody gets the same thing, then this works very well. It also sends expires headers. It also sends last modified headers with max age depending on the cache value that you've set. It also allows you to do some client-side caching at this level itself. It enables client-side caching. Next, template fragments. As I said, you can have a page, say your home page where you're probably showing a user's cart. So everything else remains same, assuming you're not personalizing your home page. Everything else remains same. Just the user's cart keeps changing with every refresh. So you can potentially try and say that, hey, cache part of the page which is constant for all the users, and I'll only calculate the part of the page or I'll only re-render part of the page that is distinct for every user. You can actually say that for this user, let's cache it. So this user refreshes it every time. So the same fragment is cached. So probably it could be used for somebody who is making an email server where your number of emails remains same unless you get hundreds of emails every second. So your number of emails could also be cached within a fragment. Are you guys with me here so far? So we have defined two fragments here. One is a home page footer which remains same for all the users. The second is a footer with a user name that changes. Use them inside blocks. Multiple cache blocks within a single block are possible. If you use template, inheritance, and Django, you can define blocks, and each block could have multiple cache blocks internally. And use lazy objects because if you are not using lazy objects in your view, then potentially you are evaluating the query anyway. So you want to reduce your queries to your database server which you, after hearing my talk, you'll probably believe that all sets are so because they use database. That's partially true. But yes, essentially what you need is lazy objects that don't get evaluated until you hit the templates. If you are already evaluating them in your view, then your template caching is more or less useless. Template caching also allows you to delete caches which is very nice because, as I said, if you have a team page and you fire someone, then you need to refresh your template when you delete your team members or when you disable your team members profile. Sessions, of course. So for sessions, you don't have to write any code. If you want to cache that, just define a session engine and that's good enough, a serializer and an engine. Assuming you have already defined a cache. So you have done full site cache. We have done views. We have done template fragments, sessions. Let's see how you can do individual objects. You should be able to cache individual objects, something like number of videos on your YouTube website, number of products on your website. You probably don't need to do... You want to do a select count star every single time somebody hits your page, right? Number of objects in a particular category, products in a particular category on an e-commerce site, right? So you can cache individual objects. You can cache any key value pair, which is what most caching engines like Memcached or Readers would support. Readers would support much more, but for now, we'll just stick to key value pairs. You can say Python, Django. You can do get. You can do set, get, delete. You can do set many. Not all backends may support this. And you can also do increments and decrements. So the count that I was talking to you about, every time you add a new product to your database, you can probably do a count increment instead of calculating the count again and again, right? Now, invalidating. As we discussed, you can do set, get, delete, or in case of template fragments, you can find the template fragment key and then delete, right? That's called invalidating. So you have your homepage. You uploaded a new video. You have a YouTube kind of site. You uploaded a new video. You want everybody to see new, your homepage with the new video that you uploaded, and that should come first, right? So your template fragment that you had made for your homepage is now invalid. You want to make sure that your cache fragment that you had cached should be deleted, right? And your post signals, post signals are very handy there. You can also use periodic crons or salary tasks to do that. I've seen people doing it. But signals work very well. You can just use any create signals or delete signals to change counts. You can use change signals, any save, post save signals to, yeah, I'm using post save to cache objects again or just say that this object should not be cached anymore and whenever somebody else loads the object again, the product again, you can cache it again. So invalidating can also be invalidate and populate the cache again. It's your architecture choice. But invalidating is not easy. In fact, somebody said that it's one of the two most difficult things in computer science. Caching, so that's all for Django. Django offers you a whole lot of caching at every single level. But you can do more. Your web servers can cache. Ingenics can define a simple cache object for you. For example, Flipkart's menu, the category menu that gets shown on every single page, every single load. If there are 10 million page views every single day, everyone is accessing that, right? They can probably, or they're probably already doing it. They can probably just use Flip server cache to, it's a simple JSON, right? You can use something called Varnish in front of your machine and Varnish can cache results. You can say that Varnish should cache for X number of minutes. You can use a cron to probably, you can use a cron to go to all your pages once and Varnish will have, you know, all your pages in the cache. Varnish also allows you to punch holes for user personalization. We'll discuss that in next slide. You can use CDNs. They do caching at CDNs who would cache as close to your customer as possible, right? And browsers use caching. If you refresh a page, your assets are not downloaded all the time. All the images are not downloaded. Your jQuery's are not downloaded all the time until you, unless you say, you know, get it again, definitely get it again. So if that's what people do, you basically ask your browsers to cache your images, your logo, your JavaScript, your CSS. Somebody comes on your site first time. They probably take time. The next time onwards, they don't download the same resources again. Yeah, the browser is caching it, all right? And you can cache in database, right? MySQL allows a simple one line change that says, I want a 500 MB of cache built into MySQL. Any query, in fact, you've probably seen it. 16 MB is probably by default. You've probably seen it that you run a query on MySQL. It takes two seconds. You run the same query again. It gives you a result very quickly. That's MySQL query in action, query cache in action. MySQL can cache a lot of queries. And in some cases, MySQL query cache is sufficient for a lot of people. Just do a slave instance, MySQL slave, and just do query caching decently big size. It doesn't work very well. If you have a site that gets a lot of inserts and a lot of updates, because every time something gets updated, you delete the whole cache. That's what MySQL does, right? So invalidation happens very fast, and that's why we still need to worry about caching on our end instead of MySQL handling it first. All right. Then there are other methods. You can do edge side includes varnish. I talked about varnish and akamai. Both allow you to do edge side includes. You can, just like template fragments, we were able to say that a part of the page is unique for every user, different for every user, and should not be cached with the whole page. You can also do the same thing with ESI. You can say that there is one full page that akamai or varnish should cache. And then part of the page should be fetched from another location. That's edge side includes for you. And then people said, hey, we are putting in a lot of efforts in doing this. Essentially, a page gets loaded, and then akamai sends us five requests for loading different objects, and then they realize they can probably do something similar in browser. And that's how Ajax became popular. You send HTML template, you send a rest query or an Ajax query, and then you render your data. So your template is still cached. A lot of your data can still be cached by the browser. Just the values that you want to change for every user can be fetched by a custom JavaScript, and the page can be populated. And this is becoming really, really popular. Almost every website does this. They'll give you first HTML really, really fast, and then they'll make you wait for, they'll say loading, and that loading is essentially loading the data. That allows them to cache the whole template, cache the commonly used variables at browser level. The server doesn't even get hit. Now, since you're talking about making websites faster, caching can only do so much for you. For inserts, big inserts, big updates, caching doesn't work. Caching is primarily meant for read heavy websites. What if you have a write heavy website and every time you make a change, you write 10 different queries, or something like Twitter, which probably writes into 5,000 different tables when one single tweet is sent, tables belonging to your followers. That will be inside. So what you can do is you can do those parts later. All the operations need not be done synchronously. You can say, yes, your tweet has been submitted and will now spread it to your followers. Confirmation emails. Somebody registers on your site. You want to connect to an SMTP server, or you're using an API which in turn is connecting to an SMTP server. What if the SMTP server is down? You can keep trying, maybe two or three times. Your user is waiting. Instead of that, you can say, hey, we'll send your email. You'll soon receive. Anywhere the messaging is, you'll soon receive a confirmation message from us. It need not be done with the request. You can just say, you can schedule a request to send confirmation emails later. Elastic search or solar updates. If you are using solar or elastic search for search, then every time a product changes, a description changes, you need not update solar on the fly. You can say that solar gets updated later. Large report creation. I've seen that with the paper. If you download a huge report, they'll actually say that we'll send you an email. The report may take a couple of minutes. We'll send you an email once the report is ready, assuming your paper store is doing well and you get hundreds of transactions. If you have 10 transactions, they'll give you a report right away. But most of the time, they'll say, hey, we'll send you a report later. They understand that report is huge. Users are willing to wait because if you just make their browser wait, then essentially what you're doing is your browser will time out. So instead of that, you can say, hey, we have taken the request. We have processing. And we'll send you an email later. So you can use Celery or G event or any such process to schedule your events in the future. Just don't make the users wait. And we've talked all this, but you need to probably try all this. I'll use a very good quote here that just by telling you that these things exist, nothing will happen. You need to continue. You need to try these things. You need to simply configure a cache and do a bit of get put. And that will make you an expert. All right. These are some of the references. A lot of Django caching documentation you can probably see when you go to the site, you'll say, hey, I use the same examples that Django has used. Yes, I copy pasted from there. Yeah, they're pretty good examples for select related and for caching decorators, right? Any questions? When you are trying to cache in a template, there was an issue that I faced. It was like there's a form which is common to everybody. But when you cache it, the CSRF token remains the same for everybody. So you cache it for every user. So you... Let me see. If you make two requests to the same page, the CSRF token has to change. So if you cache it for the same user also... The other thing is your request to use the name. Oh, if you are talking about CSRF token, then you probably have a very small caching window or you use Ajax. So at slide we will use Ajax for all our forms. We scratched our head a lot on this one. Yeah. Another thing was templates. When you include a lot of templates, compile templates and all of you would like to say a little bit about that. You use compile templates when you are making a template when you are making a lot of smaller templates. Oh, I haven't used compile templates in Django yet. I need to make a note. I'll need to learn that. Thanks. Hello. Doesn't Jinja use compile templates? I thought Django templates are not compiled yet. They are. I'll learn that. Mostly I have spoken about SQL queries, right? So what if I want to use with no SQL databases? Why not? No SQL databases are equally slow. They are slower than RAM. No, they are pretty good. So no SQL was probably an attempt to make databases or I'll say no SQL are SQL databases that are pure indexes. You're probably just doing index queries. But yes, if you have any large queries that you're doing on no SQL, you can catch it. Variable not recognized good. People call me Jeet. In that case also, our casting mechanism is going to help. Like in your case, you have a MOOC aggregator. And there are like a lot of things changed on a daily basis. Yes. So how do you deal with this kind of scenario? No, so if you have a very, very volatile data, use something like a readers, which is an in-memory database with disk storage. If your data is very volatile, friends who do gaming APIs and gaming leaderboards and scoring and all, they use readers. It just makes sense. You don't save it in SQL. That's the architecture point. Hello. In your experience, what is the best cloud hosting provider for Django? For? For Django. For hosting? There's no right answer really. I've used AWS. I'm pretty happy. They're managed services. We use DigitalOcean right now. We are happy with that. I haven't used Heroku. People are very big fans of Heroku. But yeah, it doesn't really matter a whole lot. Almost all big ones are good. Any further questions? Thank you so much.