 Hello. Thank you. Like you said, my name is Jacinda Shelley. This is my second time speaking at DjangoCon. I'm very excited to be here. I hope you guys have all been having a great time so far. I've been using Django for a little over four years. I've given a couple of tutorials on the admin. Love reading and rowing, though I haven't had time to row much recently. And I found in the tutorials that I gave that performance improvements were always a topic of interest. And in particular, there were a few really easy things to miss and a few really easy things to change, particularly around the list view that people always seemed to find interesting. So I decided to develop a talk around them. The basic progression is this. You just started using the admin. You're relatively new to Django, and people have been telling you it's this killer feature. First you feel like this. I am a god. I only wrote one line of code, and I have this whole web page. Now people start using it. They start asking you to make feature changes. It's really easy. You change one line of code, and they have what they want. You say, they say, can you show this column? Absolutely. Can you order by this column? Absolutely. All of these tiny little changes that are normally somewhat difficult if you're doing them by hand, and Django has provided all of this to you for free. But then, there's always a but then. You haven't changed any code, but all of a sudden, your users start talking to you about how this page seems to be loading really, really slowly. Your database has grown, so you might suspect that it's something to do with queries, but you're not really sure. You're new to Django. You're not really sure how to figure out what's going on here. And now you're thinking, oh, I'm going to have to scrap the admin, and it's going to be a month or two months or three months of work to rebuild all of this functionality from scratch. And you're starting to tell your team lead or your client or your boss how much time this is going to cost and how many features you're going to have to put off doing so you can rebuild the admin. And they're like, no, no. So what do we do? In this talk, we're going to go through, by example, exactly this case, and we're going to use a debug tool that is very commonly used in the Django ecosystem and that if you haven't heard about it yet, this talk is probably worth it for just telling you about Django Debug Toolbar. The example that we're going to be using is a library. In this library, we have users, we have authors, and we have books. People can check out books, and that is represented by the loaned books module, which is a mini-to-mini relationship between a book and a user. So you can see this here. Down towards the bottom, we have our books relationship, which is mini-to-mini for our library user. Everything else is basically just default user fields. We have an author model, which is incredibly simple. We just have the first name and the last name of the author. We have a book model, which has a mini-to-mini relationship with authors because a book can have many authors and an author can have many books. We also keep some ancillary information that's useful to a library like the title and how much we would charge you every day if you returned your book late. Finally, we have the actual through model. So when you have a loaned book, you have a relationship between the patron who checked the book out, the book itself, when it's due, whether their fines have been paid or not, how many times it's been renewed and things like that. The original code for this has more comments in it, but they basically just say what I told you right here. So I removed them so that I can make the slides bigger. Next, Django Debug Toolbar. It's on GitHub. The installation is very simple. Just add it to installed apps for a local development. So if you're using Run Server, typically all you have to do is add this to installed apps and you get this nice side bar that when you click on each of the fields will tell you additional information. So you can see how long it's taking for the page to load, what your settings are, what we'll be concentrating on is how many queries you're running, but some other useful things in particular like templates. If you've ever had a case where your template isn't loading and you're not sure what directory is being looked at, this will give you that information if you click on that kind of sub field. A quick tip if your local development environment is a virtual machine. There's an internal IP setting with Django Debug Toolbar. What this is is a white list of IPs coming to the server that you want to show that side bar to because for example if you have a dev server that's sort of accessible by all of your devs and might be public facing, you don't necessarily want the Debug Toolbar showing up or a demo server for example where it's safe to have the Debug Toolbar but not, but you don't want it to be shown to everyone visiting, you can restrict it via internal IPs. Also if you're using a local VM, it sometimes wants what it perceives the VM of your local system to be. Generally never use Django Debug Toolbar on a production server unless you're absolutely certain that it is not facing the world because it exposes a crap ton of information about your system that you don't want exposed to the whole world. So obligatory warnings out of the way. The first thing that our users asked us to do is display a list of all of the books that are out on loan. So in here we see the loaned book, our Unicode definition had the name of the user who checked out the book and the title of the book itself. This is all randomly generated. Django by default will show 100 rows and you can see here we are executing 204 SQL queries every time we load this page. This seems excessive. So what can we do about that? Well if you click on it Django Debug Toolbar allows you to see what queries are actually being executed and in more recent versions it helpfully includes information about which queries are actually being duplicated. So the resolution on this is a little poor, but what's happening here is for every row in that list view it's requesting the library user object as well as the book object for every single row and this has a lot of overhead especially if you're not using a local system, there's network latency, there's overhead to actually set up the connection. Zoomed in you can see the specific instance happening. So book and then user over and over and over again. So to recap the problem we're seeing is this was our Unicode definition. It's a join of self.patreon and self.book, so the user in the book. Now because it's in a Unicode method Django can't go in to see that these are foreign keys and it should be using select related. So we have a couple of different solutions that we can use here. Solution one is to use list display and explicitly include those foreign keys because if you use a foreign key in list display Django is smart enough to know to use select related in that case. It doesn't know to use it if the foreign keys that you're accessing are inside of a callable only if you use them in list display. Solution two, if we wanted to keep it in a callable and not use list display we can also set list select related to true. What this does is for all of the foreign keys on your model it will fetch all of the information for all of the foreign keys that Django can reach from that model. Solution two B you can actually be a little bit more explicit as of Django 1.6 plus with list select related and specify which foreign keys you want. So if in the list view you want to say you're using only one of those foreign keys in the list view but not both of them and you don't want Django to just go randomly fetch all of the foreign keys that it can possibly reach which it will do and which can also be costly performance wise. You can specify specifically the foreign keys that you're using in that list view. So say we apply solution one this is the output that we'll get. We now have two columns one for patron that shows the name of the user who's checked out the book and one for book which shows the title and we've gone from 200 for queries to four queries and more importantly we have a 20 x speed up from around 28 to 30 milliseconds to around 1.5 this is using SQLite on a local connection. If you had a real production setup where your database server is located on a different server from your app server and you had network latency and connection overhead and Postgres this would be even more significant. So 20 x speed up with one line of code. All right. So couple of notes because we can dig a little bit deeper on this. What if we had used the second solution and if you remember I said that you can specify which foreign keys to use. And we left out list display went back to using the default Unicode and just specified patron. What happens is we end up with 104 queries because for each row in the column you're going through and it knows enough to use select related to get all the users but you didn't tell it that you still needed the books so it's going through and it's executing a separate query for each row to get the title of the book. So that the reason that this is important is sometimes your requirements will change and where previously you'd only wanted to show two of the three foreign keys in your list view. Your users now ask for you to show the third foreign key. Don't forget to update it in your list display or you might have complaints about how all of a sudden the page is loading really slowly. All right. So full solution for to be same thing four queries. But one thing that is interesting to note and that is another part of Django default behavior that a lot of novices or beginners don't know or don't realize or don't realize the implications of is that for every query Django gets all of the fields on your model. No matter how many of those fields there are and no matter how you're doing it whether it's through the admin or through the ORM. If you are getting a model Django will cache all of the fields by default unless you tell it not to. So we're only using book title user first name and user last name but we're getting all of these extra fields. Let's see what we can do about this because in some cases you'll have fields that are very large and Django is fetching them from the database even though you never use them they take up a lot of space and memory it's very slow. What you can do is use a custom query set. This is one of the most powerful features of the Django admin in my opinion and something that I actually use relatively often. I could have alternately titled this talk custom query sets for fun and profit. So what we're doing here is the ORM has a method called only which allows you to specify specifically which fields from your model you want to fetch from the database. So in this case what we're doing is we're overriding the get query set method. We're taking the query set from the parent and then adding only patron first name, patron last name and book daily find. Yes book daily find is a mistake that's on purpose. We'll get to that. You can modify the query set to your heart's content as long as it actually returns a query set at the end because if you forget to return a query set you'll get a very uninformative error that says that your database is not properly configured. I may or may not have run across that and scratched my head for a few minutes when I was working through the examples in this talk. There's also a corresponding method called defer that does the opposite of only. So if you use query set, objects.defer in a query set what that will do is it will load everything except for the fields that you specify. So if you only have one really large field that you want to avoid loading because you're not using it for that query set you can use defer for that instead. Now I did make a mistake here previously we may have shown the daily find and that worked great but I'm supposed to be showing the title but what happens if I leave this mistake in? End up with 104 queries again. Are we sensing a pattern? So what's happening here is the query set is doing what we want. In that first query that's listed we're getting first name, last name, the primary keys and daily fine and that's it. Now what Django has to do on every row because you did not cache the title that you needed is on every row it now goes and fetches the title because it realizes at the last second that it needed that title. This is inefficient. It's very easy to fix just put the correct reference in there, book dunder title instead of book dunder daily fine but I just wanted to point that out because often times if you're using only you really want to check your queries after that to make sure that you're not using something ten lines down in the function that you forgot to get in the only earlier because then Django will have to go and fetch that for every object in the query. Alright so once it's fixed we're back to four queries it's great. We have a minor speed up from our previous speed up it's about 0.3 milliseconds in this case which isn't a lot because we don't have any large fields and we're not saving a lot by leaving out the other fields but if you did have large fields this would be useful. Alright let's try something else a little bit more complicated. One of our users has come along and asked well this is great but in the books I would like to see a list of all of the authors so in the books list view I want to see all of the authors. Alright this is easy we add a callable called authors display. We add the authors display callable to list display and because we learned our lesson from last time we're using list select related and we're setting it to true. What could possibly go wrong? Well 104 queries again. The reason for this is because authors is a mini to mini field not a foreign key. Select related is for foreign keys. Prefetch related is what you want if you're going in the other direction. You can see duplicated 100 times because it's getting the authors every time. So for going in the opposite direction if you have a mini to mini relationship or you're doing a reverse foreign key lookup you want prefetch related and unfortunately there is no switch for this in the Django admin the way there is for select related probably because it would be more difficult but you know maybe that's something someone will look into someday. Instead what we do is another custom query set save the day where do the same thing except in this case we take our book admin get the query set that Django is using. Prefetch related authors and now five queries. This is the actual query that it's running Django computes in advance all of the primary keys that it needs to get for prefetch related and you're off to the races. So this is useful. You can put more information in your admin without sacrificing performance and you can keep the admin around for longer which means you can develop features and get your system up and running and off to the races. Alright last example that we're going to go through is counting. So now your users have come back to you and they're like alright what you've done so far is great. We want to see how many books the author has written that we have in our library in the list view. Alright we can do this. Add a callable self.works.count. Add the book count to the list display and we have 104 queries again because we have one count query for every row. This one's a little trickier right? Like how do you deal with this? There's no foreign key here. There's no reverse foreign key relationship so select related, prefetch related, not going to work. Are we going to have to do custom caching? What do we do here? Well the answer is we use Django's aggregate functionality. So one of the interesting things that you can do with aggregates is annotate. And what that does is it takes an aggregate and actually adds it as a field sort of. And Django knows how to deal with this. So this one is probably a little bit more complicated to the beginners in the room so I'll step through it. But we still have book count in list display but in our query set what we did is we used annotate and we said that book count should be the name of the pseudo field that we're using and we're doing a count of works. So now you can actually reference that in the SQL and Django knows how to refer to that and pull that out. So now that we have this we are back to four queries once again. And you can see here the queries that are being executed where you have that as book count that allows Django to refer to this within the query itself. So you can do everything in the database in a single query. And as a side note you can in the author model where you've defined this callable. You can't remove the callable. Django kind of freaks out even though I don't think it should. But you can set book count dot admin order field to be book count and then you can actually sort by how many books an author has written. And you can order by that which is pretty nifty. A few random notes before so that's the end of all of the examples. A few random notes if you have an issue and you just can't get things working fast enough and you want some really quick hacky solutions. You can decrease list per page which will just decrease the number of rows that you have on a single page and maybe give you a little bit of extra time to figure things out before like all hell breaks loose. So by default Django will do 100. You could reduce that to like 25 and then your queries will be decreased by 4x until you actually figure out a better way to do things. A show full result count is if you have filters. You know how it shows the count of how many objects you have in total. That's if you have a really large number of rows that query in itself can be slow. So show full result count was introduced in 1.8 and it removes that. But it doesn't remove the default pagination at the bottom which also does that query. So you need to remove pagination if you're doing that. And I don't recommend any of these really as a long term solution. They're just quick hacks if someone is breathing down your neck and you need a way to get some time to actually fix the problem. And that's the end of my presentation. Thank you all for listening. I hope this is helpful. And I think we have maybe a couple of minutes for questions if anyone has any. Okay. So you've shown us a bunch of stuff here. How much of this could be baked into Django's core? How much of it is just something you need to be aware of? Like is there actually room to improve Django's core admin here to avoid some of these problems? So that's an interesting question. I think a lot of it would have to be introspecting callables because that's where you end up with some of the problems. I don't know how, I'm sure that you could build that optimization in. I don't know what the priority of that would be compared to like other features that people really want because basically what you'd have to do is like introspects Dunder Unicode methods to see if people are using foreign keys and actually look at all the callables that people are putting in list display to see if they include things that could be automatically select related, prefetch related. I haven't looked into that enough to know if it could be baked in. It might be possible to do something clever with that and include, have Django detect that automatically. Thanks for the great talk. I've used Django Debug toolbar to sort through queries of, you know, a custom web page but I never thought to use it for the admin. So it's a good application. I'm wondering if you have any advice on, you know, sorting through the list of SQL queries because it can be, you know, expansive like you were saying 104 or upwards. And how can you distill that down to, you know, exactly where is it calling all of these queries within your code? I've had some trouble with that and I'm wondering if you have any advice for that. So I think within the admin it's usually easier to figure out than if you're on a custom web page because it's a lot, especially with like the list view, the queries are all going to be in order. If you have issues it's generally because you're executing queries on every row and Django Debug toolbar will, in the most recent versions I mentioned, helpfully indicate which ones are being duplicated. Short of that the way that I typically approach it is to look for patterns where queries are being executed on things that I think should already have been cached in the code. And there's probably an opportunity there for someone to write a tool to actually go through and analyze some of these queries and make recommendations specific to Django about, you know, how you could use the ORM to figure this out. But it's, right now there's not much that is, that you can do beyond being familiar with your models and familiar enough with how queries are executed by the ORM to be able to see the patterns. One thing that you can do that I found helpful is to purposefully write queries and look at the SQL that's generated just as a learning experience because that will help you get a sense for the relationship between things that you're writing in Python and the number of queries that are actually generated. If you're using a debugger, can you pause halfway through and look if Django debug toolbar is counting half of the requests? I don't know if you could do that with Django debug toolbar, someone else might know. But I do know that if I'm using the debugger and I have logging turned on so that it's actually logging raw queries, as you step through, you can see which queries are being generated at each step. So in terms of what Django could do out of the box, it's perhaps like a query count warning like in debug mode. It sees that you're doing over 50 queries just, you know, print out, you know, this page is doing 300 queries. That's a good idea. Sprints are coming up. Also, just a obligatory note, the company I work for, Dr. On Demand is hiring a Django developer. So if anyone's interested, get in touch. If there are no more questions, thank you all. I'm sure everyone wants to get to lunch.