 Hi Today, we're going to be talking about high performance database queries and WordPress and For some reason we're doing this before the coffee break. Sorry. I Would not have set up the schedule that way This is going to be we're really going to geek out today. This is this is going to be I think a fun talk Normally, I like to kind of shoot just above people's heads today. I'm going to maybe shoot a little bit higher I really want everyone to come out of this with some With some new knowledge. I also want to warn everyone that even though it's you know late 2018 And this is a word camp presentation. We're not going to talk about Gutenberg. So you've been warned. You're hoping for talk on So a little bit about me. I'm a partner at alley. We're a full-service digital agency. We're one of the WordPress.com VIP partner agencies We're hiring. It's an important note We're a distributed company. So you don't have to be any one place to to work for us and We build some of the biggest and most complicated WordPress sites in the world So this is a talk on Mostly lessons that I've learned the hard way Okay jumping right in and this is from the book high performance my sequel third edition It's an electrifying read What makes a database query fast or slow? I really liked the way that this book put it the most basic reason a query doesn't perform well is Because it's working with too much data Let that sink in it sounds, you know pretty basic, but it actually Explains a lot I'm not suggesting that when somebody's complaining about the speed of a database query You delete most of their database that probably wouldn't go over well There are other ways to get my sequel to work with less data This goes on to say Some queries just have to sift through a lot of data and can't be helped That's unusual though. Most bad queries can be changed to access less data So if the most basic reason a query doesn't perform well is because it's working with too much data What is the most basic way to speed up a database query? Reduce the amount of data that the database has to read Which brings us to our first topic indexes this is also from high-performance my sequel third edition I promise this is not a book report. This is my last excerpt from this book so Just a footnote specifically B tree indexes. That's the type of index that my sequel uses most of the time Indexes reduce the amount of data the server has to examine. It's so simple, right? We need my sequel to look at to examine less data Indexes do that. They also help the server avoid sorting and avoid using temporary tables and they turn random IO into sequential IO We'll dive into all this a little bit more as we go Indexes do have some downsides you should be aware of Indexes significantly affect the speed of write operations if you have many indexes on a table a write can take 10 20 times as long most of the time that's not a big deal because you're only doing writes for A very small segment of your users people who in the WordPress world are using the WordPress admin Indexes significantly affect storage space the storage space of your database server the memory usage of the database server and Indexes aren't all together flexible a really good index is going to be very well targeted to specific uses We'll look at some indexes in WordPress Next and we'll get a better sense of really what that means, but at the end of the day You can't just index everything if you index everything in a database table then the index is as big as the table And now you've not reduced the amount of data that my sequel has to look at in order to get you your information So Next let's look at some of the indexes in WordPress I'm just going to focus on three tables that relate to querying for posts So the first is the WP post table We have Let's see if I can get my cursor here. That's obvious. So we have the primary Index which is just on the ID field. There's the post name index on the post name field This is primarily used for for querying for posts by by a rewrite rule through your pretty permalinks There is the type status date index, which is a multi-column index That index is the post type the post status the post date and the ID that order is actually Significant, but this is the index that is Is most commonly used on front-end queries any time you're looking at a listed post basically You're going to probably end up using this index. There are two other indexes on the table post parent for generating hierarchical queries getting like Children of a page for instance, and then there's an index on post author for the author archives and used bunch of other places, too Next looking at the WP post meta table. There's again the the primary index on the meta ID field We never actually use the meta ID in WordPress. There's an index on post ID and an index on the meta key Those two are super important. What's also noteworthy here is what's missing Which is that there's not an index on the meta value column We will talk quite a bit more about that later Next the WP term relationships table the primary index is a multi-column index on object ID and term taxonomy ID this is so object ID is post ID Technically taxonomy terms can be related to more than just posts that you didn't know that so object ID think of it as synonymous with post ID and This is this is the main index that we end up using here And then there's also an index on term taxonomy ID And term taxonomy ID is a little bit of a of a relic nowadays term taxonomy ID and term ID are the same thing A brief word on joins. So if you've ever spent a lot of time in the database server And looking at database queries, you might have the idea in your head that joins are bad Are they always bad? No, they're actually not But they certainly can be joins have to be well crafted Join order is important as is proper indexing specifically indexing on the columns that forms Furthermore many joins in one query can cause a lot of CPU overhead this kind of Pokes at the the edge of the scope of this presentation But under the hood the way that my sequel does joins is with nested loops So when you have a join my sequel has a nested loop where it takes Goes row by row on one table and for each of those it loops through the other table So with each join you have to do another set of nested loops. This can create a lot of CPU overhead If your database server is also warming your house, I guess that would be a good thing So next let's talk about Very specific WordPress queries Starting with archive queries. This is like if your home page is showing your say 10 most recent posts This is going to be that's going to be example of one of these queries But basically anytime you're looking at a list of posts in WordPress. It's some some form of of an archive query and Unless it's a custom query It's going to filter by post type filter by post status and then order by date So remember that index we were looking at the type status date index. This is where that comes into play Can everyone read this okay? cool so Diving into this query specifically we have So just ignore the sequel calc found rows. We're selecting the post ID so this is this was cribbed from a home page I set up a a WordPress install on my laptop. I installed I added not manually I promise you a hundred thousand posts to it and I pulled a bunch of queries from the site as they were running Also, just ran a bunch of time pulled out the averages of how long these queries stuck in and will will dive through So on the home page this query was being run Select the post ID Where the post type is post the post status is either publisher private Private is used here because I was signed in on the site the you know a typical visitor would only be looking at published posts and Then ordering by the post state So when I went to my site, which has a hundred thousand posts on it This query took 73 milliseconds It's like that that that very fast, but at the same time it's kind of not that fast We want database queries to be like one millisecond or a singular number of milliseconds So why is this so slow everything about this query is actually crafted really well? Let's let's go back to it. So Think about the index. It was a multicolumn index type status date, and then ID was the last column that index Every field that we're looking at in this query is in the index So my sequel doesn't even have to touch the database table itself when running this query This is why WordPress queries for the post ID even though when we're generating that page We need more than just the ID. We need the post title. We need the post name to generate the permalink We need the you know the excerpt maybe Even though we need all that when WordPress runs this great. It's just querying for the ID That's going to keep this query running nice and fast And then we'll use the ID layer to inflate the rest of that post object to get the rest of the information So Why is this query not as fast as we really ideally like it to be we can do an explain on it If you've never done an explain on a database query It's pretty simple just add the word explain before whatever query you're trying to run and my sequel will give you some information about what's going on We're gonna skip most of this. It's outside the scope of this talk Maybe if I had longer than 40 minutes, but we're gonna jump right down to extra We're using where using indexes are really good things and then we're using file sort this is not necessarily a bad thing, but it is a flag and And ultimately that tells us what's really going on here My sequel also says that I had to examine 49,702 rows. I promise there are 100,000 rows there Either that's all that my sequel had to look at to get things sorted or It's guesstimating which my sequel is pretty bad at doing this So using file sort means that My sequel had to And I try to stay closer to the mic so that my voice gets picked up on video My sequel had to sort these results. So when it ran this query, it got a hundred thousand results Let's go back up to the query At the end we're saying Limit zero so give me the first ten posts We only want ten posts my sequel has to look at a hundred thousand Again, it's still working just within the index. So this is pretty fast, but it has to look at a hundred thousand Then it has to sort them all by date then It takes the first ten and it throws away the next 99,990 so if if we were to Remove the order in the limit from that query that query takes four milliseconds It also returns a hundred thousand results that are not sorted, but it only takes four milliseconds So that's more what we would expect to see out of a database query like this So we know unequivocally that what's taking time here is sorting the results so Can we have PHP sort the results instead of let my sequel do it if that's what's taking the majority of the time Can we do that ourselves I? Know you asked that I didn't ask that, but I'm glad you asked For academic purposes, I try to do this Now remember if we're going to sort by date now we can't just select the ID We have to select both the ID and the date we have to pull that all down into PHP So between the ID and the date times a hundred thousand It's probably like one or two megabytes of data that we have to sip through we have to Convert those dates into a format that we can easily sort or we can just sort them as strings that works fine, too And then slice off the top ten Just to deliver that one to two megabytes over a local connection This is all happening on my laptop the database servers on my laptop the web servers on my laptop just to deliver the data Takes longer than 73 milliseconds so Sorry, we doubted you my sequel. You're actually doing a very good job better than they could do All told it took about 14 times longer to do that sorting and and slicing myself Okay, so we'll we'll get back to To the archive queries a little bit later Let's move on to taxonomy queries So a typical taxonomy query uses index columns It uses a single join and the join happens on index columns So looking at this query it looks very similar to the last one that we looked at except that we now have a join on the Wp term relationships table We're joining on the the post ID matching the the object ID of term relationships and then in our where clause we're looking for Posts that are in the term taxonomy ID of two so all hundred thousand of my posts are in This one taxonomy term This query takes about two hundred milliseconds to run. So quite a bit quite a bit slower than the last one What if we were to? run Multiple taxonomies and one query so now we have two joins instead of one And then in the where clause we we have the same the same Clause for a term taxonomy in an ID to there are hundred thousand posts in that term and then there's term With this with the other tables join TT1 that term taxonomy ID in nine so In that term with the ID of nine there's only one post So we've increased the complexity of this query the last one was 200 milliseconds any guesses on how long this query is going to take No guesses. It takes about a millisecond. Why is that? It's because my school doesn't have to do any sorting here. It doesn't really need to do any work at all the taxonomy queries Especially when you're working with large data sets on large sites. They get a bad rap They're actually not so bad because the way that the query is crafted Every single column that we're looking at here is indexed. We're only working with indexes We're working with two indexes That's why the last query was 200 milliseconds versus 73 milliseconds when my sequel had to do all that sorting it has more data to work with and we know that queries speed depends on how much data my sequel has to look at and work with in this case because At the end of the day my sequel only has to deal with one post with one result This query is lightning fast and the I was using the application sequel pro to run this query I think that it The lowest number that it'll show is one millisecond. So it may have even been faster than one millisecond Let's take a look at Meta queries. Oh and by the way with the with this query Just to further the point that it really it all comes down to having to do that sort and Having to deal with a you know having to sort 100,000 results if we remove the group by the order by the limit here Those are all things that that cause my sequel to have to do some some form of sorting Then this this query takes a singular number of milliseconds So just like the archive query it's really It's Lightning fast Once you tell my sequel once you have data. Just send it to me. You don't need to sort it. You don't need to touch it All right moving on to meta queries. So typical meta query Leverages index columns the meta key in the post ID Join happens on the next columns But as I mentioned before meta value is not indexed or at least out of the box looking at a query. So I I'm running a custom query here where the meta query is looking at the meta key of city and the meta value of Portland and I have I Added that meta key value combination to 99,999 posts and This took about 410 milliseconds so slower than the the taxonomy The yeah slower than the taxonomy query about about twice as long and more than four times about five times as long as As the archive query What happens? So 99,999 the other post of my hundred thousand I set the Meta key to city in the meta value to Boston So now my sequel only has to only has one post to work with So We're moving the sorting right how long is that query gonna take and he guesses 340 milliseconds What the heck now? There's only one result. Why is this so slow? Why isn't it like a singular number of milliseconds because at the end of the day my sequel still has to sort a hundred thousand results here before It sends meta value is not an index column. My sequel pulls out all the results using the index columns And it says okay. I have a hundred thousand results now Let me get the meta meta values for all those and I'm gonna go row by row and See if the meta value matches and so even though In in my second case, there's only one post with the meta value of Boston It still has to look at and access a hundred thousand results because there are a hundred thousand posts Where the meta key is city? So when it comes to meta queries a query is either fast or slow Not based on the uniqueness of the result set but on the uniqueness of the meta key Because the meta key is what's indexed When there's only one post with the city key So instead of one post with the meta value matching meta value if there was only one post matching the city key That query only takes one millisecond Which segues us into a really important part of a database performance Which is planning and data architecture know thy data Understand what the data is going to look like not just today But also tomorrow and if you're working on an open source plugin you have to operate assuming the worst We'll talk about some some strategies very soon on how we can help meta queries perform a little bit better Just briefly I want to touch on search queries Simply put searches in WordPress are not scalable if you have a very large Site in WordPress. I recommend you use something External to run search some WordPress and I'll point out an option or two later on Okay, the fun part of the program refactoring so the first way that we can refactor our code is to add date limitations Thinking about that archive query. We're using the type status date index Then we're sorting so we're filtering by type and status sorting by date We figured out that what takes the most time When running that query is doing that sort taking a hundred thousand posts Sorting them all by date and then throwing away and 99,000 990 999 990 of the results So how do we get that result set that hundred thousand number down? So here's an example something you might you might build and say you have a widget in your in the sidebar of your site That shows like the five latest posts If you were doing just the basic query basic archive query like we were looking at before you and you have a hundred thousand Posts on your site. You know how long that's going to take I mean your database server might be faster than my laptop But you you have a good idea that that's going to be a slower query than you really want to be but think about your data is there any chance that In those five posts there would be a post that's older than X number of days or X number of months Because if you have a hundred thousand posts on your site, maybe those posts go back 10 years 15 years is A post from 12 years ago ever going to show up there. No, so why are you making my sequel sort it? so if you were to add take that That the query that that widget is running and add a date query on it with a safe range Let's say that you can say be on any shadow of a doubt that the oldest of posts would ever be in there is Four or five days old So what if you pick a safe range like six months and say okay? I only want to look at the most recent six months of data here Even that if your data spans back 10 12 years is going to trim that result set considerably Here's an example of what that might look like in code. So you have your query the little ellipses just indicates Whatever's in the query previously. We're going to add a date query here and we're going to say after negative six months is is just Shortcut to saying six months ago So we're saying only look at the results from the past six months You might turn that 73 millisecond query Into a two millisecond query or a three millisecond query because now my sequel doesn't have to look at and sort a hundred thousand results Maybe you don't have to look at and sort a thousand results or maybe even better. Maybe only a few hundred If this was a a taxonomy query or a meta query now you're in the in the end You're probably talking the same amount of result time like three or four milliseconds But you're going from 200 milliseconds or 400 milliseconds to get down there. This is This is a great option for for cutting query query time database query time on queries that When you step back and think about it You you start to get an idea of why why am I looking at 99% of this content when I know I'm never going to need it? I'm just a real-world example where I've used this I in the so this is for a news media client I Was building them a home page curation tool and there is a like search auto complete field where they could search for posts and curate them and these searches and For this purpose, I was just using WordPress this course search these searches were taking over 10 seconds to run They had millions of posts in the database and these searches were taking over 10 seconds to run I was like well, how can I speed these up? And I and I thought of this and I reached out to my client and I said would you ever put up a content on the home page? That's that's like older than six months. They're like no the content on the home page is only ever gonna be within the last few days I'm like, okay, so if I had like a 30-day limit No, that's never gonna confuse anyone right? Nobody's ever gonna search for content that's older than 30 days from the home page. No, why would they do that? That's that's ridiculous as a news site So I did that and those those searches were instant. They were one or two milliseconds So that cut it down from over 10 seconds to Basically no time at all next tip add an index to meta value This is easier said than done But we can lean on our friends at automatic who run WordPress comm VIP They do this on for their for their larger sites on one of their platforms if you Download the presentation. I'll bring the URL again later if you if you didn't go to it This is a link to to their code What they what they do is they're adding an index on the meta key and then the first hundred characters of the meta value Here's why you can't just add an index to better value meta value can be absurdly large I forget what the limit is I want to say it's something like two gigs It might even be 16 days. It can be massive You can't index that my sequel just won't let you and if my sequel let you you would take down your database server if you had any Any values that were anywhere near that size So but if you're running a meta query, you probably aren't going to be typing out some You know absurdly long value the first hundred characters And that's completely arbitrary. They say as much in their code somebody pulled that out of a hat Is more than enough to to do Probably every meta query that you would you would ever need to run You know There's a lot of especially on a large site a lot of meta post-meta gets written. So this can slow down Your your database rights and thus the performance in the admin something to be aware of Probably tested in a staging environment before rolling it out to production and affecting People's lives in the real world Next refactoring tip get creative with meta keys. So again a tip for for meta queries Remember when we looked at that meta query the problem that we were facing is that we had a hundred thousand posts all having the same Meta key even though meta values might have differed the problem is that they all have the same meta key so You can Increase your uniqueness limit your cardinality by getting creative with your meta keys. So as an example, let's think about post relationships You're you have two posts and you want to store relationship between them like these are related posts And so the way that you might have done it is to have a meta key of related posts and then store a value of the post ID like one two three four five Instead of storing that key value combination You can store the key related post underscore post ID So related post one two three four five and then put anything you want in the value and now Your you have a much more uniqueness to that meta key and your meta queries are going to be significantly faster It's not going to work in every case, but This this comes up quite a bit I find that this this can be a really creative solution to cutting down meta meta query time Next is to leverage caching wordpress has a a Cache system Call its transients API, which is a a persistent cache that wordpress has out of the box The transients API might store data to the database It might store data to an external object cache if you have uninstalled like men cache or redis But it's guaranteed to always be persistent Here's an example of what that code might might look like to leverage the the transients API if a cache key you load the transient with that cache key if the If the transient doesn't exist if the cache doesn't exist then you you generate you run your query You generate that data notice here. I'm calling fields IDs when caching data You don't want to cache like entire objects or entire result sets You just want to cache a minimal amount of data the the minimal amount that removes the heavy left in this case once we had the IDs we can inflate those later to using getPost and And we don't have to worry about caching these these massive WP post objects So in this case, we're storing this query for for 10 minutes But ideally you would even cache indefinitely and then only clear the cache when the data actually changes You can't take advantage of this Always sometimes the the data is just a little bit too too fluid for that. You can't you can't really Surgically say whether or not the the cache should have changed but in cases where you can There is an action clean post cache which will fire anytime that WordPress knows it needs to clean a post cache And it passes the post ID and the post through that action So here we might run some logic determining if the cache should actually clear and then once it does Delete our transient The problem with caching is that you end up with an unlucky user here and there Somebody goes to your site. There is no cache. So they're the ones who trigger the the heavy query that That takes a long time So if you don't want any unlucky users to have to do this one approach is to kick off an asynchronous task using WP Cron to pre-warm that cache behind the scenes let the server have to be the the unlucky user who warms that cache up a Very simple example of what this might look like so on save post You know here, we probably run some some logic to determine if we should actually pre-warm the cache the result of this post saving But then we run WP schedule single event. This is this is not going to schedule a recurring event Just a single event I'm setting it to five seconds from now in other words like right after this request finishes from this And it's firing the action warm cache And then on when the crown runs when warm cache fires, then I can run my laborious query and cache the results there Since we know that meta queries are slower than taxonomy queries for the most part We can convert meta queries to taxonomy queries from time to time sometimes meta may make more sense as taxonomy terms I Think one thing that gets in a lot of people's way is the admin presentation I have a meta box with a text field that should go to post meta It doesn't have to you can save that to a taxonomy term instead you control the data so, you know know thy data and stored in the way that's going to ends for using it down the road Another trick relates to this. I like to employ is hidden taxonomy is so taxonomies that have public set to false and and or show you I said to false the users using the word for sad men have no idea that this taxonomy exists it only exists behind the scenes and Then on front end queries we can use it. We can take advantage of it But it might make sense to do that to duplicate some data that we know We can do a faster taxonomy query than we could a medic query or or something else and then Sometimes you'll just encounter situation where my sequel is not well suited for their job Search queries were one of those examples But there there are others really complex queries where you just can't help that It's going to be a medic query with a lot of rows Elastic search is a dedicated search engine, but not just for keyword searches. Don't think about like Google's text box I type in anything I want. It's also good for structured searches every database query that we've looked at so far I would call a structured search Find all posts where the post type is post in the post status and blah blah blah That's a structured search elastic searches really fast with this and elastic search and my sequel are pretty complementary software titles software platforms We use them side by side all the time and and find that when bicycle can't get a job done Elastic searches is really good at it. Some five or six years ago. I put together a couple of plugins One's called search press the others ESWP query that that help you do that So that's all I got. We got five minutes for questions Anyone have any questions? Yes Yeah, so the question is what about For the purpose of the video what about search plugins that leverage my sequel do you know perhaps like keyword indexing tables instead of having to use an external Another service that you have to maintain support make sure it's running yada yada yada like elastic search Personally, I've not used them It's it's a great idea and if it works for you then that's that's awesome I I recommend giving it a shot and At the end of the day all all of these things wordpress and all the plugins These are all tools in the tool belt find the tool that works best for for your particular use case and You know run with it. So I looked at I think it was it's WP search I want to say is Is one that that does that it creates additional tables and it does keyword indexing I haven't you know again. I haven't used it myself, but at a high level it looked pretty solid. I Saw another hand here. Yeah You Yeah, so the the question is how about Sorting by meta values that might not exist Performance ways to to accomplish that This is another situation of know thy data sometimes it'll it'll make sense to Always store a meta value on that key Whether you want it to exist or not and then running like a WPC li query some sort of bulk operation to backfill all post that That might need that so that you So that you know that there's always going to be a value there sometimes, you know Just an empty string knowing that's there is is all you need If you know that your result set is going to be relatively small another option is to do that that sorting and filtering in PHP just return the full data set if you're not returning a hundred thousand rows like I was and you're instead returning like you know Maybe a hundred It's it's fine to do that that sorting and limiting in PHP and you know It'll probably be a lot easier a lot easier and probably more performant than trying to do an additional database query Those are probably my best suggestions does that help cool any more questions. Yes Examples you gave a break with the queries and the timing so far out in the wild our own project How do we see what queries? That's a great question Yeah, I probably should have mentioned like some plugins that you can use to look at real-world queries So the question is what what tools can we use to look at database queries in the wild so that we can pull them out We can benchmark etc One of my favorite WordPress plugins is query monitor by John block porn That's what I use to crib these queries from from my site the debug bar plug-in is another option that will list all your database queries You can also if you can connect to your database server the command show process lists or show full processes We'll show you all the queries that are running at a given time And so you you can pretty easily identify the really bad ones since most queries are like just a few Singular milliseconds you'd you have to like hit the command at the exact right time to see that it's running because again It's like that that that But the really bad queries that take multiple seconds to run It's pretty easy to target them when you when you do a show full process list and X debug can do profiling that will give you a lot more information than than you need but it'll help you target database queries as well and External services like New Relic for instance Are just immensely helpful for that New Relic can do full page profiling on It'll you can set targets and say any page that takes longer than a second to run You know flag it and it'll do a full stack profile of what took time in that request like it'll go into which functions in PHP took the most time and it'll pull out which database queries or which external API services Anytime I have a page running slow if I have New Relic active. That's the first place that I look We're probably about out of time, but any more questions try to squeeze one more in before they kick me out Awesome, well, I'm here all day and I'll be at the after party. I'd love to answer You know any more questions you might have that come up throughout the day. Thank you all very much and Again, there's the the URL to the presentation at the bottom