 Hey, how's everyone doing this afternoon? So you have made it through the first day of WordCamp Atlanta, 2019. Congratulate yourselves. Everyone looks mostly awake. I hope that there was like coffee and soda out there to get this afternoon. This is going to be a little theoretical on the shorter side because it is warm in here. I'm Gary Kovar. I'm a back end developer at Modern Tribe. If you've heard of Modern Tribe, you've probably heard of us because of the events calendar plugin. I don't know anything about the events calendar plugin because that's an entire part of the company that I don't work with. About half the company is product and half is agency. So I work on the agency side. We're mostly doing WP sites or WP sites with a lot of weird third party things happening. We work with like some big clients like Harvard and Stanford and Disney and Microsoft, and those are fun because those clients do things like say we have crazy unreasonable time to first bite requirements or we have actually scoped everything, which is neat. But we also work with a lot of medium and smaller sized clients that through the process of looking at what their requirements are, we find some neat ways to clean up their really messy data and make it work faster. So this talk is going to be about alternative data stores in WordPress. This is a table of contents. I kind of wanted to put this out here, mostly for my own recollection as far as what I hope to accomplish in this talk. There's no prerequisites here. So if you're not like a hardcore dev in PHP every day, we're not going to look at a ton of code. There's definitely a version of this talk that has that. But we're just going to talk about these tools as options for speeding up your site. So we're going to talk about how WP currently stores data, just a quick look at MySQL. We're going to talk through three different categories of alternative data stores. So custom tables and views, elastic search, and then brightest memcached, generally caching type stuff. And then I want to walk through three real-world examples where I've used something other than core tables and then just some real quick implementation notes and then we can get into questions. Also, because it is May the 4th, Star Wars Day, I have picked all Star Trek images. So how does WP storage work? So WordPress, when it's not a multi-site, when you install it, has around a dozen database tables. If you've got an older site and you upgrade, you might have links and stuff hanging around. But generally, the way they're related is you're going to have the object and the object meta. So users, user meta, post, post meta, term, term meta. Following that logic, WordPress really has two models of data storage, two things. I'm going to omit users because users is a very concrete thing and pretty well mirrors how a post and post meta are set up. So posts are a thing that have very defined properties as evidenced by, if you look at the post table, right? Look at the post table, there's an ID, there's a title, there's an author, there's a date, post type, et cetera. And those are all very sensible things that you would need for a post or a custom post. With that, then, any extra attributes that we want to define generally end up in the post meta table. The other model is taxonomy terms, right, in that relationship. And the sole purpose of taxonomy terms is to provide an indexed way, a fast way, to find those posts, right? These are super sensible defaults. So as WordPress developed, this became a nice way to say, here's a general purpose CMS that you can extend however you need. The nice part about this is that WordPress, in addition to that, then builds in all the APIs you need. So to register a CPT, like, you don't write a single line of SQL. I mean, generally, you just call one function and it's a handful of strings and an array. But there are situations, right, where those defaults don't make sense and using alternative data stores can help us in those situations to keep sites performant. So the three, I keep saying three, because something about off by one errors in programming or whatever, right? We're talking about three things. My SQL, which is structured query language. It's relational, immediately consistent. Elasticsearch, we're gonna talk about a bit more. It's really like a search database, no SQL thing. You just pass it documents. It's eventually consistent. Memcache and Redis will, you probably think of them as caching data stores. And that's, in general, we're gonna use them. There's a benefit of those being available and we'll get into that as well. So heading to the first one. When I explain how WordPress stores data to non-devs, the easiest way to explain it is think of it like a giant Excel document with a bunch of tabs. So the tabs being the different tables. People immediately understand Excel because it's nice and square. I think MySQL spooks folks when you think about, well, I've got these weird relationships and stuff happening. But if you can paint it as each table's its own thing and they're related to each other, it becomes a bit easier to digest. So interacting with custom tables and views, we already have the stuff to do that, right? It's built into the WPDB class. So you can write direct SQL and do whatever you need to do. So the positive for using a custom table or a view, and I'll talk about how to create those a few more slides. The positive about those is you know that you already have the tools you need. Like it's built into WP core. You can work with tables and views and custom tables and views. The negative is it's still MySQL. So you can still shoot yourself in the foot and write slow queries and that kind of thing. I wanna quickly differentiate between custom tables and views. I think custom tables is something we're all probably familiar with. Views might be something that we don't use often. So if you think of a view as nothing more than a virtual table, combining other tables, right? So instead of like writing a massive query that's always has the same subquery somewhere within it, you write the subquery once, it becomes a view. In MySQL, it's no more efficient to do it that way, but it keeps your code a bit easier to understand. So Elasticsearch, it's evidenced by the name. It's great for search. It's a document store. So instead of passing like an array or object that has like a key and value, you give it this crazy nested JSON object and you define some attributes. And then Elasticsearch does this thing where it ingests it and it sends it out to different nodes in Elasticsearch because it all requires like more than one piece. And eventually it's consistent. Eventual consistency pops up a few times in here and what that means is when you put that document in there, you might ask with a search for something that that document should return back with and you may or may not get it right away because Elasticsearch has to replicate across all its nodes. So the node you stuck it into might not be the node that you're directed to when you ask for it. Eventually it'll show up, but it may not be an immediate thing. That's the limitation of Elasticsearch that pending load you may never run into. Under high load it's something you'll definitely run into and limits how we use ES. Great, so Redis memcached, this is the third one. I'm gonna group these two together a lot. The first note indifference that really matters, actually really, well there's a couple of notes. The first one that really matters is memcached has a limit of like one megabyte of data per thing it stores and Redis is 500, does anybody know, I'm pretty sure it's 500 megabytes is the limit. So both of these things are a key value. So you give it like a unique key, you give it whatever string, so serialized or JSON or whatever thing you're working with. When you need it later on, you say, hey I want the value that's at this key. It's fast because it's all in memory. So it's a great way to work with things. The other difference, started on to Redis and memcached, Redis is not just a key value data system, it does some cool lists and that kind of stuff. So if you're working with a more complex data structure, Redis might be a value. This is like a MySQL though. Most likely you have this available with your host, one or the other, which means most likely you can use some kind of composer package and interact with these things. Okay, so we're now gonna get into the part where we talk about like, how can you actually use these things, right? So we're gonna pretend that we have a client that says, I have a geolocation requirement and that's like I have some stores, right? And I want to store the latitude and longitude of those stores. We're not gonna talk like how that actually happens. We're just gonna assume that we've created the CPT. We've created a meta box that captures this latitude and longitude, either address and figures out after the fact or whatever. So like the normal method we need to get this, oh, hold on, I duplicated my notes. Cool, so if we save it this way, right? We put latitude and longitude in post meta. The stinky part about post meta is that the meta value is set to long text. So long text, we're kind of getting in the weeds here. I said we weren't, I lied. Long text is, the limit is four gigabytes of data. So none of us in this room would do this, but it's possible that there are some plugins out there that would be abusing the post meta table and slapping tons of data in there. If you're using text fields for search in MySQL, search queries are not performant because the server has to call each object individually and scan through the entire thing, right? Whereas if you're using like integers or in the case of latitude longitude, you define it as a float. It's super fast for MySQL to zero in on this. So long and short, like any lookup you do in post meta, the meta value, has the potential to be slow. I'm not gonna say it is slow, has the potential to be slow. So this is like the code. This is, if you were doing geolocation, right? This is the formula that you would pass in to figure out the distance and miles from the latitude and longitude that was passed to you by the user. However the user passes that in. This would give you like 20 results where the distance is less than 25 miles. So I don't, this really doesn't matter except to get in your head that you're looking at that whole sub thing down there, right? Where you're doing like cosine and all, sine and radians and all sorts of math stuff that maybe like computer science folks know. I don't, I don't know. It's like magic, right? But it gives you the distance no matter what the latitude and longitude is for whatever is queried. So if we were gonna do this, right? With the default WordPress post meta table, you would want to figure out a way to create a query that would put that latitude and longitude in line with the ID. It's gonna be a gnarly looking thing. You're probably gonna have a couple left joins or a couple sub queries and at the end of the day you've written something that's not going to be continuously performant, especially as you continue to add things to post meta. So the first option is we create a custom table, right? The benefit is that when you save you can say all I need is the ID and the latitude and longitude. That becomes a sub query and then you can pass to your WP query on post in arg the results of that sub query. So just three columns ID, latitude, longitude and it makes that query look that two screens ago a lot faster. If you haven't created a custom table before the function you're gonna want to look at it's called DB Delta. First check if the table exists. If not, you can create it and off you go. It's that simple to have a custom table available. So in this case we would assume that on post save we're just gonna stick that into the latitude and longitude into the custom table. So the next alternative continuing that same path is if we were to create a view. This is a bad idea but like just for the point of conversation we'll assume we did that. Let's assume we created a view using the query from just standard post meta that now looks like the table we defined in the previous screen. The problem here is all we've done is hide a non-performant query in a view to make ourselves feel better as a dev looking at this simple to read data versus what was there previously in post meta. It's, yeah, you could do it. There's just no good reason to. So elastic search, this is like the right solution in this case. Elastic search has this concept of geo points. So if you're using a plugin, probably let's say an elastic press is kind of like the de facto standard or if you've rolled your own way of ingesting posts into elastic search, you can assign the geo point based on latitude and longitude. And so on subsequent searches after ES has indexed that it's gonna be like an immediate search. ES generally responds in like a dozen milliseconds for most sensible stuff. So this is gonna be like amazingly fast. It's gonna feel a bit like magic. Especially if you're coming to it from like a post meta lookup. The last one continuing, so like the Redis memcached. There's like not a good benefit of using a key value situation here because you're gonna end up putting so much data into Redis or memcached. And you run the possibility of evicting data that you do need in Redis or memcached. So I'm only including this for completeness in the four or the three that we were talking through. I will say one caveat to this is if you have only a few locations and you're using this location information like as a JSON object, at this point you could just use Redis as like what it's built for as an object cache or what we use it for in WordPress generally as an object cache and just dump that string in there and pass it to the front end without having to do any kind of query lookup except for getting one value from Redis. So that's the first example sample. The next one, this is the one that excites me the most, right? So the task order, like let's assume the client comes and says, I don't know, my site's slow when people check out, right? That's like a very like broad loaded like, well there's probably a billion things we can do. Here's one that's like a simple win that I've started doing on most sites. So we implement the idea of a task runner. I want you to think about like when you go in WooCommerce and you check out, there's three emails that are sent, right? There's like the purchase confirmation receipt and then the email to the site admin. Every single one of those calls WP mail. Generally every single one of those is going to have to connect them to an external service and send the message. So you're making three external requests that when the person clicks check out like all that has to happen before they see the check out page. But for humans like there's no difference, you know, if I get an email immediately when I click check out or if it shows up in my inbox five or 10 or 15 seconds later, like it's not part of the request in browser. It has not diminished my browser experience. So the concept here is we take the WP mail and instead of sending it immediately, let's drop it into a queue and then process that task later on. Processing that task later on will get into generally roughly, but oh, I tend to jump ahead on the slide sometimes. Yeah, so serving with an action and related arguments. And then you can either run it with WP Cron, System Cron or there are systems built specifically for this things like Gearman or TaskRabbit that you could say here are things I need to do. And it will handle that. Here's the neat thing, WordPress has this built in. So if you've used WP Schedule single event and passed like something that's slow to that, you've been using WP Cron as a TaskRunner. The big gotcha with using WP as its own TaskRunner is that WP stores all Cron in like one giant serialized blob. So if you have like a bunch of repetitive tasks and maybe you're importing a CSV and you break every line of the CSV into an individual task, which is one of the things I commonly use queues for. Now you've just made this Cron serialized thing like huge. And the same deal in the options table, it's long text. So we could be one of those like abusing the long text column. So the simple thing to do is say, let's create a custom table. And let's drop the tasks in there, mark them as complete or remove them pending what you need to do. I think after like Norcross's conversation on GDPR, maybe just delete them, right? Yeah, so this is simple, create the table. Instead of running the task immediately, put it in there and then figure out what your async thing is that consumes those tasks and runs them. Don't use Elasticsearch for this because Elasticsearch has that, it's eventually consistent, but if you're sticking stuff in there that you want to process quickly, that's counterintuitive or counterproductive or counter something. So with Redis or Memcached, you can handle this a couple ways. For both of them, you could say let's drop a key in with like some kind of unique ID for the task and then let's create another key value pair that has like whatever the key is for list of tasks and then a string or serialize something that says, hey, here are the tasks we need to run. The flip is on Redis, Redis has the concept of lists so you can actually accomplish this with just one key in value. You can add things to the list and pull it off and Redis is super great because now you've removed one requirement on yourself for like, well, I took this thing, I need a mark that I've taken this thing. Redis says, well, you've got this thing. I've popped off the list, it's already gone, which means it's on you to return it if for whatever reason the task fails. Yeah, so that's it on task runners. That's like a quick way to recover a lot of time. The last one I want to talk about like real world example is content syndication and this is one of those things I've done, I don't know, three or four different ways and it really comes down to like what the client means when they say content syndication, right? There's a lot of things to consider like from an SEO perspective, like duplicate content obviously. Like we're not gonna dig into that here but that's a piece of it. If you're syndicating content and what I mean by that is taking something that's posted on one site and using it on multiple other sites, very much a hub and spoke model, you could easily just say on all the spoke sites, let's just consume like an RSS feed or something. But let's see if there's like some better options than that that don't require us to make some remote requests and whatnot. The first one is on a smaller, let's say it's a multi-site network, on a smaller multi-site network, you could on post save on the hub, go ahead and switch to blog and insert it on all 15 blogs or whatever the number is. The negative here is switch to blog is slow. Probably doesn't matter so much in WP admin but you run the risk of hitting timeouts pretty quickly if you're running any real quantity of sites. You could also then say well, let's just use direct SQL then. So I know what I inserted into this primary blog, let's go ahead and take it and then I know the blog ID that I wanna insert it into. So I'll just change the tables and go from there. The thing we lose there is all sorts of great WP core goodness, right? Like we don't know where we are in term count, we don't know where we're on taxonomy count so we need to stick that stuff in there but then we also need to get the result and then map term to taxonomy. We need to map the post meta to the post. All that stuff that just happens automatically using WP core methods we miss out on by doing direct SQL. So contents education with my SQL, like you can do it but you're creating a lot of hurdles. So this is where views really shines and this is like my favorite solution for syndication. It only works if syndication requirement is hub and spoke because you wanna limit the amount of tables you're working with but on your spoke sites instead of using posts or post meta, terms, term meta, et cetera, create a view and then hook to the query and replace the table name. And the view will be a combination of the primary site and the hub site and the spoke site. Combine them. Only like potential issue here is that you need to make sure that you set the auto increment on the hub sites on the ID high enough so that you're not gonna eventually create a conflict. But at that point then you have effectively merged two sites in multi-site. It is an increase in one query but it's still very performant. So using elastic search. This is a good solution and I see this a lot of times when we need to syndicate stuff and we're not doing within multi-site. We need to syndicate from one WordPress site to a few others but there's a lot of business logic into what needs to syndicate. In this case you can also get rid of the concept of hub and spoke. Like you can set some parameters in there so that it doesn't really matter whose hub and whose spoke. You can share content pretty easily between sites. The negative to this obviously is if you have a bunch of sites in use and they're all sticking data into ES, ES is gonna be thrashing so you're gonna eventually get to the consistency but on the sites that you're pulling from elastic search you need to know that just cause I pulled something this time the absence of it next time doesn't necessarily mean it's deleted. So there's that weird consistency thing again on elastic search. Okay so implementation notes. Elastic press like I said is kind of like the de facto standard if you're working with elastic search but if you're doing something more complex you have more complex data models. It's not hard to work directly with ES. There's some cool composer packages. Elastic search has one and it's not difficult to create the array. Elastic search, the package handles the logic of then passing it to ES. Predis and Memcached PHP are also the other two composer packages you'd use to interact with Redis and Memcached. So that is it. Well all right, let's get to the after party y'all.