 All right, so today we're talking about anthropomorphic design, making WordPress better with jazz hands. Exactly. Actually, no, we're not. We're talking about HyperDB, MySQL, performance and the flavors of MySQL. I'm a data engineer, which basically means a DBA that does MapReduce and also is afraid of Apache Zookeeper. I've worked with WordPress for about five years and know more about MySQL than I care to admit. We have a long, long history. Disclaimers. I believe that Google exists, that you can use it, that you probably don't want to read a bunch of code snippets, and more than anything else, you're probably looking to just sort of get an idea. What are the questions that should be asking in the first place? Probably none of you are DBAs, and for that matter, I've talked to some of you earlier, and you don't want to be DBAs, and that's fine. But in the very least of it, I'm hoping that I can sort of level up which questions you should be asking in the first place. That's really more of my goal today, and also to raise the level of CalVal in WordPress, because I think that's obvious. It needs it. What is HyperDB? It is a very advanced database class. I have no idea what that means, actually. It supports replication, partitioning, load balancing, et cetera, and by very advanced database classes, maybe you should make you wonder, like, what are they talking about? If I just count the number of lines of PHP in these different plugins, HyperDB weighs in about 1,400 lines of code, WordPress itself, 255,000. Hello, Dolly at 81. I mean, if you guys know what that plugin does, I think we can all conclude that Matt Mullenweg must write really crap PHP or something, and not really. And then this is just for reference. I mean, Yoast Google Analytics, not the full-blown Yoast plugin, but just the one that really adds a Google Analytics button, I guess, to your website, and I think it does something else, like pulls metrics in. But that's 16,000 lines of code. So this is a horrible way to figure out complexity or even how complicated something is. I mean, obviously, this is not how many lines of code did you write? It's a terrible productivity metric. It's a terrible way to say this is advanced or this is not. But in the very least, should probably suggest that what's complicated about HyperDB isn't actually the code itself. And it's really not. It's working with HyperDB is a lot like working with WP config. You're going to move a couple of things here and there, specify some host names, some settings, et cetera. But it's really kind of like a glorified WP config. What does that mean for us? Why do we care? Well, it means that, A, we need to understand my SQL. Well, this is a WordPress conference. And what the heck do we care about my SQL? My argument on this would be that, you guys know Tom McFarland? Did anybody go to his talk yesterday? Cool. He posted something a little while ago, this article, about why WordPress salaries are so low. And sort of the three bullet points or the three opinions he walked away with, one, people really aren't familiar with what WordPress can do. I don't think you guys have any experience with that, or you talked to somebody and thought, oh, WordPress is just a blogging platform, for example. I've heard this a hundred times. I'm sure you guys have as well. Personally, I think you can do a lot more than that. But this is certainly something that you run into. Part of it, I think, also relates to the fact that it's really easy in WordPress to be a software implementer, not a software developer, which is very, very common. And then I think also we as a whole in the WordPress community have not really done a great job educating employers, customers, software implementers, et cetera, what WordPress actually can do in the first place. So point being, yeah, maybe we should pay a little bit more attention to our database. What it can do for us, if we don't want WordPress to just be a blog, or if we don't want WordPress to just be this sort of mini-development kind of, you're not really developer sort of, we need to start paying attention to some of the other tools in our stack, MySQL being one of them. So again, more into what exactly I'm saying by that, I think we don't need any more social media icons, guys. Like literally we need that about the App Store needs fart apps, it's just, we've got enough. And the things that are really pushing the WordPress envelope, they're pretty much all database specific. Membership sites, Brian Crossgarden, am I pronouncing his name correctly? He was talking about membership sites yesterday. These are extremely database intensive as a rule. If you've ever done anything with e-commerce solutions in WordPress, they're extremely database intensive as a rule. Buddy Press, Forum Plug and BB Press, they're all very, very heavy in your database and you kind of need to understand some of your fundamentals in order to really use them. But you also, if you're gonna try to create something that pushes the envelope of what WordPress can do or is doing, odds are pretty good you're gonna run into the database before very long. In fact, I was talking to a gentleman over here earlier who had some pretty interesting ideas about how WordPress could be used with Angular or with other technologies in the stack and he's running into some questions about how do I make this scale? How do I make this high availability and so forth? So if you wanna do really cool stuff with WordPress, you gotta pay attention to the database. That's my second argument. HyperDB for practical and six and purposes, it's just WP config, I already said that, no need to say it again. And if you wanna figure out how to install it, the digital ocean article, literally it can do better than I can do in this entire presentation. It's a really good article, you should read it if you're interested in setting it up. If you're not interested in just figuring out how do I set this up, maybe you're interested in understanding what it actually is doing. And that's where, this is where I'm gonna spend a little bit more time, I've been sort of flying through some of this so far, I wanna spend a bit more time here. How many of us are familiar with replication in MySQL? How many are not? Let me go with that instead, how many are not? Are you seriously not familiar with this? I don't know how many MySQL. Okay, that's interesting, okay. Partitioning, how many of us are not familiar with that concept? Okay, failover, does that make sense? Everybody pretty comfortable, load balancing, okay. The people who are familiar with all of this stuff, by the way, if you're completely bored out of your minds, feel free, there's a great speech on Roots and I like to always caveat that, when I walk into a room, I have no idea the technical background of the people who are here. So I may have completely misjudged it and if that's the case, I will not be offended if you leave and I think Julian's talk on Roots is probably very good. And you can also throw tomatoes at me, I don't mind. So in replication, what we're dealing with is literally, I've got this one database here, typically you'll have, and we're gonna start with the simplest case, which is master to slave. And so there's this one database here that can accept reads and writes. And it's sort of your system of record. If I create a new user, if I create a new page, this is the one database that's always gonna be getting that new write. Then it'll have a slave database over here, which you typically create as a read-only database. And it's designed to just sort of share the load with the master database. And its job is literally just to stand there and read what's called the bin log off of the master and say, okay, I see that you did this, I see that you did this, let me make sure that I'm copying this stuff as well. Replication is really pretty easy to set up with, I believe it was where my SQL 5.5, they moved away from, it might have been 5.4. They moved away from statement-based replication to row-based, which is way more reliable. You're actually, I've never seen replication not break, given a long enough period of time. But you can actually go a week or two without bringing into a lot of replication errors now, whereas before it was like every day we're hitting replication errors. You can also set up more complicated systems like master master or master master slave. So I mean, there's all kinds of other configurations that you could potentially do with my SQL and replication, but the general canonical rule is you're talking about master slave or master slave, et cetera. By the way, if anyone ever says that you should be multi-master replication, be very cautious about that. That's not actually a very good idea for most of the time, just caveat that. And if you'd like to know why, by the way, please catch me afterwards or ask me in the question section. Partitioning, this is one of those things that will either really save your butt or it will kill you. I've never really seen it do anything in between now. The idea behind partitioning is there's two ways you can do it. One is you guys are all pretty familiar with the tables in WordPress itself, WP users and so forth. One way you can partition is you can say, all right, I'm gonna put half of these tables over here and the other half of these tables on this other database. What could go wrong? In a physical setup, you could get a physical failure, you could have a logical failure, or you could have... So basically what you're getting at is hardware failure, right? That's one risk. You now have two machines that could fail as opposed to just one. And literally half of your tables are running. That is certainly a true statement. There may be something way more obvious than this, like fundamental, I think, in fact, yes. I have access to multiple sites, I'm not sure. I mean, if you did it wrong, but that's not really good. I'm sorry? It'd be pretty hard to do that by... So the concern, if I understand it correctly, is that if I were to move some of these tables in a standard WordPress configuration from one database into, like, literally just split the tables in half, I'd say. Would that somehow mess with the permissions inside of WordPress? Not probably in the way that you're thinking. I'm gonna come back to actually... I need to... I found the database to find the table. What's that? I'm looking at the wrong database to find the table and it's gonna be messaged. Well, now, in HyperDV, you can actually define which tables are located where, right? So in the partitioning, what's one of the things that HyperDV does? Well, it's one of the... You asked what one of the concerns are that could possibly... It could be looking at the wrong table. Absolutely. Guys, I need to actually take a step way back from this. Can I connect your connection problems? Yeah, keep going with that. Connection problems. What else, okay. Think about latency. Should we clarify the question? I'm sorry? Maybe you should clarify the question. Yeah, absolutely. Should I split the tables in half? Do you mean having both? There's multiple, yes. There's multiple ways that you can handle replication. One of the ways, I'm probably leaving the camera right now. One of the ways, let's say I've got WP, I'm just gonna stick it with two users, right? I've got WP users, and I've also got WP user meta or whatever that table's called, right? Option one, maybe I put this in database A and this in database B. That is technically partitioning, rather. That's one way to do it. Another way to do it is I could say, all right, every ID that starts with an odd number go to that table. If it's even go to this table, right? But either way, we're still partitioning our data. Think about latency. Have you ever seen this graph? We all know, I'm assuming, memories faster than disk. What about network? That's fast, right? No, it's absolutely not fast. And so the odds of you creating a partitioning your data in a way that basically turns your bottleneck. You're no longer dealing with, oh, my database is the bottleneck. You're gonna turn your network into a bottleneck. And this happens time and time and time and time and time and time again. It's especially common, by the way. If you were to move your WP users' tables and so forth, if all of those are contained in one database and maybe your posts are in a different database, you might avoid a lot of those network latency issues for quite some time. What if you did something really stupid? Like said, every user whose last name starts with S goes to database one. Every day, you know, if it starts with the last name starts with T, it goes to database two. If it starts with U, it goes here. V here, right? You're literally defining your partition scheme based on the observations of the data itself. What would happen? Your index would suck. Yes, your index would suck. Why is that? Data distribution is not even, right? So basically you're hashing. You're partitioning your data based on a, you know, a non-consistent hash. Last name. I don't know what everybody's last name is, but I'm willing to bet that, you know, not that many last names start with Z or with B. That's a very real problem. And this is something where, keep in mind, if you start thinking, oh, I need to partition my data, it's not easy to get it right. And it's so not easy to get it right that no SQL exists. That's just, you may need to do it. It's very entirely possible that you will need to partition your data at some point, but do not underestimate the complexity of that. It is very, very, very, very difficult to get it right. Failover, I think that's a, you know, pretty obvious statement what failover is, right? My database goes down. Okay, crap, read the other one. Makes no sense. Load balancing. Now I need to just go ahead and be explicit here. You can load balance your database or you can also load balance your web application. To be honest with you, before I would even consider load balancing my database, I would start with load balancing my app. Again, it's not 100% across the other end. I would actually evaluate the problem and see where I'm seeing latency issues in the first place, apply aggressive caching and so forth. But nine times out of 10, I don't really need to load balance my database as much as I need to manage connections to it. You need to terminate them quickly. Maybe I just need to have my workers on the front end doing a little bit more work for me. Better caching, for example. Performance basics. I'm just gonna ask everybody to take a look at this slide and are there any terms in here that you all, are you guys all familiar with these terms? Is anyone familiar with these terms? Two, three people. Okay. Yes or no? That was the point. My name, my other three, yes. You're familiar with the terms. So you guys know what these things mean. This, by the way, is what DBAs do all darn day. We literally stared at this and go, wonder if I flip my buffer pool a little bit higher, what would happen? And so I was talking to this gentleman earlier that on a Saturday, he picks up React, JS, or Angular, or something like that. And DBAs, on the other hand, are the guys that read through system settings like this and we're incredibly boring people. The performance space, I mean the high level, the very first thing that you may wanna do prior to calling a DBA, right? Or the very first thing that you should do, if you do call a DBA and they start saying things that you're just trying to figure out, is what they're telling you actually true? If they start mentioning these things, it's a good chance they are. The NODB buffer pool size. Are you guys familiar with LRU? Mean anything at all? La, least recently used is what it stands for. And so the idea behind least recently used is it's a cache. It's an in-memory cache system that is going to say, all right, keep evicting keys if they're not used as often. So if I've got one page on my website, for example, it's sitting in a cache, everybody comes to see that page, like for example, your home page. That's probably gonna be sitting in a cache somewhere, maybe a front-end cache of some kind, but if you're only doing database caching, that's gonna be in your LRU cache for sure, because everybody visits that page. Your obscure, this is my PGP key in the bottom of my website, so that may not be quite as popular, and it's getting more likely to get evicted from your cache. The buffer pool size is basically defining how much cache MySQL's allowed to use. And typically the setting that we go with is about 70 to 80% of your total RAM. That's assuming you don't have a patchy sitting on this same server, by the way. If you do, you guys know what Oomkiller is, right? If not, really? I would almost, really, okay, out of memory killer. So it's this process that runs on Linux, and it says, oh geez, I'm out of memory, and it starts killing things. Pretty much without fail, it's gonna kill MySQL every single time. It's not gonna kill a patchy, it's gonna kill MySQL again and again and again and again. So if you go and switch your NODB buffer pool, especially if you're using a patchy, 70, 80% is a bit too aggressive, right? Put your MySQL somewhere else. Run that on its own server somewhere. Crank it up to 70 or 80%, you're fine. NODB buffer pool instances. The idea behind this, you guys are familiar with threading, right? Multi-processing versus multi-threading, we're all familiar. The idea is that sometimes we'll run into bottlenecks with these threads, and if you just split your cache into two, the threads will have a little bit better, you'll run into less contention if you were bottlenecks. So it's a pretty good way to speed things up under certain circumstances. Enter to be file per table. Guys, did you know this? If you delete, let's say you delete a bunch of transients from your WP settings, and these are completely make-up numbers, but let's suppose that you're, is it WP settings or is it WP user settings? It's WP options, isn't it? Were the transients restored? WP options. Suppose you've got somebody who's gone crazy, they've had their website for 10 years, and all of a sudden there are, I don't know, 500,000 rows of transients. So let's say this entire database takes up about, you know, 10 meg, or this particular table takes up 10 meg, and you go delete all those 500,000 transients. How much space did you free up in MySQL on your server in general? None, zero, not a bit. The reason is because when you delete data from MySQL, especially in NABB, you're not freeing up anything on a disk. It's still saving in the bin log. If you actually wanna get rid of it, you're gonna have to truncate your tables, potentially, you're gonna have to restart MySQL, rebuild its indices, and the only way that that will even work for you is if you turn on this file per table. So I like to tell people, go ahead and turn on NABB file per table. It's set on the button default in 5.6, if you're, for whatever reason, using something earlier than that, 5.55, turn it on, because otherwise, you are completely out of luck when you need to save some space. Max connections, if you ever have a DBA that says you need to up that, be very skeptical. You should probably check your PHP first. This is all incredibly boring, and I can look around the crowd, and I can see that people are sort of glazing over on their eyes, so I'm just gonna actually skip the log bin, slow query log, we've roughly familiar with how that works. It's not turned on by default. Turn it on. And also, every couple months maybe, set the slow query time to zero. Logs, things for a day or two, see what happens. And in particular, analyze the slow query log. Are you guys familiar with Percona Toolkit? Anyone? Now, yeah. It's two hours ago. What's that? That's a two hours ago, exactly. If you use, all of this simply matters, you have a VPS or dedicated server, right? You're not gonna go on to GoDaddy or whoever, or any managed host of any kind, and start running this stuff. You're not gonna be like, I wanna use MariaDB instead. They're gonna tell you no. So I'm assuming you've got a VPS or something that you can actually shell into and install software. If you do, and you're not using Percona Toolkit, I really don't understand why not. I mean, it's such a beautiful tool. Let me go ahead and flip ahead. This little PT query digest. That little one statement will go and analyze all of my queries for the last, since November 2014. And it'll give me this nice little output that looks sort of like, is that actually showing up by the way? Yeah. Cool. It'll give me some output that looks, very difficult to read on this screen. But it'll tell me, okay, these were the, you can't even read it. That's horrible, I apologize. Well, what it's gonna do is it's gonna analyze your slow queries and tell you where you're spending all your time. How many queries are being called, how often they're being called, which ones are slow, what percentage are slow, which ones are resulting in a template table creation, which ones are doing full table scans. Brilliant, brilliant tool that I highly recommend using. Best practices in my SQL and just in general, monitor everything. I mean, literally hook up New Relic, Datadogs, Addicts, Graphite, anything you can think of, keep track of what's going on. If you're old school and you wanna use like a Nagios monitoring system, by all means, but monitor your system. Absolutely keep the track on what's going on. I mentioned this one slide here. This is a pretty interesting way to keep track of metrics on my SQL. So just specifically, if you guys are into StatsD and Graphite, that's a pretty interesting article. Analyze your queries. Use MySQL don't slow. Use the Percona toolkit, the PT query digest. There's some plugins actually that do this. For example, New Relic has a MySQL plugin that will actually analyze which queries are spending most of your time on. Use them. Do you guys know what the query execution plan is? When you write explain in front of a query, yeah. One important caveat, it's not 100% accurate. What it's gonna tell you is what MySQL thinks it's going to do. It may do something completely different. In fact, there's sort of well-known ways to trick it. But it's a good idea to understand what this query execution plan looks like. That's sort of your first line of defense when a guy like me at DBA says, you're writing really bad queries. This is probably what we're looking at and this is probably what we're referring to. So know how to reason about it yourself. And then schedule DB maintenance. This should be obvious, but people fail to do it all the time. There are a couple different flavors of MySQL that submit a sort of gloss over as well as I can tell I'm also putting everyone to sleep. DBAs do that. There's MySQL, of course. When MySQL was purchased by Oracle, MariaDB, the guys that built MySQL in the first place actually forked it. And they said, all right, fine. We don't want Oracle's involvement in this. We're gonna build our own. And Automatic is an official, I think an official sponsor of MariaDB. I don't exactly know if that's a monetary support. I don't know what the arrangement is, but I know that they are one of the official sponsors of Maria. And then there's Percona. Percona is very similar to MariaDB. I, in personal opinion, I think you can fine-tune Percona a little bit more. If you're really doing database-intensive stuff, I'd probably lean towards Percona. If you're just trying to get better MySQL out of the box, it's pretty hard to beat MariaDB. Couple special mentions. Again, of course, Percona toolkit. I cannot recommend highly enough that you install this on your server and that you start using it. It's beautiful. I also cannot write, I mean, this is completely tangential, but Nginx, if you're not using it, I don't understand why I really, really don't. I know there was a talk earlier about the stacks. I don't actually know who won. I was- That's G and H, H, B, M. That's interesting. Okay. Or is she? What's that? By a very small market. Okay. How much tuning was involved in that? Nginx was stock and Apache had max request workers too. Okay. So very, very good. Gotcha. Was that your stock? Yeah. Okay. That may be a very, I'm not as familiar with HHBM, but definitely I would not use straight out of the box Apache. It's, I've never had a lot of luck with it and I've seen a lot of, especially if you got MySQL and Apache sharing the same server, I've seen a lot of DBs go down because of them killer. I would really strongly recommend something with asynchronous workers, which I believe HHBM does have. I don't actually recall, not so much. Talk to the stacks guys. Asynchronous now. Nginx has that for a long time. Fair enough. And then another thing that I, I mean, this is a boring topic. I mean, obviously, who wants to be a DBA? These are the guys that you stick in the basement. You forget that they exist. But at the same time, yeah, exactly. And now you see why. But at the same time, you really should, I think, as a WordPress developer, kind of get outside of that WordPress bubble a little bit. Go figure out what the DBAs are thinking about. Go think about what those Ruby guys are saying as developers, because a lot of times I do think the overall, there are some brilliant, brilliant, brilliant WordPress developers, several of them are here actually today. There's also, it's very easy to be a very mediocre developer in WordPress. It makes a lot of things very easy. I would recommend that you kind of get outside of that bubble. Learn about things that other people are talking about, in Ruby, in Python. And a great example of where this kind of taking other ideas and applying them to WordPress, Roots, who, I can't even remember saying Julian, is speaking about right now, took the 12-factor app. Are you guys familiar with that? Basically, figure out how that could be applied to WordPress. Try to do some of those sorts of things. Even if it's boring, like learn about how databases work, you'll be a better developer, and I think you'll make WordPress a lot better for it. With that being said, I appreciate your time and thank you for putting up with all this boring stuff. If there's any questions, I'm happy to answer them. And if there's any tomatoes to be thrown, I'll stand somewhere a little bit easier to pick them. Exactly. So are there any questions? Yes? So when you were talking about the partitioning, what would be the recommended way? So the question is, if I was talking about partitioning, and what would I recommend? There are, there's no, I'm not aware of any like hard and fast rules. Partitioning, in my experience, there are certain problems that are very easy to solve with partitioning. In fact, GitHub has one. GitHub is one of the largest MySQL clusters in existence, it's in the top 20, I believe. And their data kind of limits itself towards partitioning. I can put all of these users over here and all of these users over here. And as long as each one of these servers has a sort of complete view of the universe, they don't have to deal with any of this network latency crap. It's a pretty nice problem to deal with. If you're dealing with that problem, where you can kind of isolate, or you can treat each server as its own little miniature universe that has all of the data it needs to answer whatever queries are coming to it, partitioning is fine. If that's not the problem that you're solving, I actually recommend that you don't even try. Just move to the NoSQL solution at that point. An example of this, so I work in the advertiser analytics and one of the things that we have to answer is, who's advertising where? I can't like, I've got publishers, networks, and advertisers that I need to know basically which combinations are occurring and where. I can't really partition those data and answer that question because I need these advertisers to know about these publishers and so forth. If your data looked like that, you've got to go to NoSQL and I can do it. There's just no way around it. And I was confused with the term partitioning. I've always heard Federation, which is a good concept. But with WordPress, it isn't true that unless you're kind of asked, you're probably not gonna have to worry about partitioning WordPress. Absolutely. And if you get to that point, you've probably got enough money to hire a DVA who understands it. Absolutely, so the question was, are there really a lot of good use cases for partitioning in WordPress? Well, there's certainly one. I mean, WordPress.com, automatic. These guys are, they have a very well-known use case they're using HyperDV. Most users probably not. And the reason that I mention this, and I mean, to be honest with you, I submitted two talk ideas and this is the one that won. I was a little bit surprised because in point of fact, I mean, my recommendation with HyperDV is there's probably a lot of things that you can do before using HyperDV that will really get you a long way. So I didn't really touch on a lot of stuff in HyperDV specifically because I think a lot of this stuff, you probably would get a lot more mileage up in your buffer pool, for example, or moving MySQL to its own server using dedicated hardware. I mean, there's a million things that you can do prior to needing to partition your data, especially with MySQL. And if you do, you're gonna have an interesting time. It's not an easy thing to get right. It's not anything to do at all. Fortunately, you probably never will have to. Yeah, so. And I wasn't trying to contradict you. My problem is, it's very difficult to get it right. It's incredibly easy to get it wrong. Yes. And I've seen people say, oh, I'm gonna grow, so I need to go ahead and federate now. And that to me is the, I mean, I'm glad that. My MySQL to its own droplet on digital ocean. And that did more for my WordPress performance than anything I've done since moving to PHP 5.6. So for the sake of the cameras, I need to go ahead and repeat what you just said, but the point that you're raising is exactly true. I mean, if you move MySQL to its own server, it can even be a cloud server. And nine times out of 10, that is all you need, especially for a WordPress site. There are definitely cases where you do need to partition your data even within MySQL. I'm sorry, within WordPress. They're not very common. And you probably will run into a lot of other, I mean, aggressively cache everything you possibly can on the front end. Terminate those database connections as quickly as you possibly can. That's another thing that I mean, as a DBA, the thing that I see most commonly from developers that just makes me want to screen, terminate your connections, guys. You'd be surprised how many of these connections are sitting there for like, you know, 300 seconds because PHP didn't terminate the connection. It's a very, very common thing. And that can tank your server pretty quickly. So, I have three questions it looks like and I have absolutely no idea whose camera's up first. Go ahead. Hopefully mine's quick. One thing that I hear a lot of people mention is that on a WordPress multi-site, the standard is 2,000 sites per database. And I guess it's just a standard, but I guess it's based on the load or how many people are, how heavy that, mine's to all that basis. There's no, so the question is on a WordPress multi-site, is there some sort of hard and fast or specific line in the sand about 2,000 sites as opposed to 5,000 or 500? The answer's no. You do have to be mindful of the number of connections going to MySQL at any one point in time. The default as of 5.6, I think is 100 concurrent connections. Does anyone know off the top of their head? I've seen people take it as high as 1,000. Not very commonly. I guess the idea is you should look at doing some separate analysis on your own and then make that. Yeah, I mean, for example, a multi-site, this is actually a really easy scenario where you could partition, right? I mean, if you're really running more than 2,000 sites on a network, run two servers, not one. I mean, the odds that these connection problems would start leading to very real performance degradation are pretty high. And that's certainly not something 2,000 users blogging in and blogging about whatever they're blogging about. That's hard for any database to maintain. Yeah, but there's definitely no line in the sand. That's not a hard and fast, this is the law kind of number. I think that's your question, is that? Yeah, that's it, thank you. Okay, yes? So I have like three questions, but I'll do one and then let the other person answer the question that may be. So, and I think maybe the problem was I came in just a little bit late. We were all supposed to, so this guy came in late, we were all supposed to stand and point at this guy and make him feel bad for it, but since he admitted it, I'll just make him feel bad now. Oh yeah, how dare you come late to my talk? The HyperDB, it's not real clear what it does and why you would use it. Okay, thank you for letting me know that. So HyperDB, he's saying it's not very clear what it does and why you would use it. HyperDB is, to be honest with you, it's really just a glorified WP config. All it's doing is letting you define it. If you download the code read through, I mean, you'll literally be able to just use it. It'll let you say, okay, I want my SQL servers to be not my SQL, anyway. I wanted to point at this server and this server and I want to send 50% of my request at this server and then 50% at this other server. Or I want to move table ABC to here and table DEF to here and it just lets you define that sort of logic. It also lets you define the failover protocols. If this database goes down, use that one. Or this database is a master, read off of the slave, for example. It just literally lets you define the configuration of whatever MySQL instance you may have or MySQL cluster, maybe behind it. In terms of why you would use it, again, it's one of those, if you're WordPress.com, if you're automatic, it's very obvious why you use it because you have a lot of people using WordPress. If you're CNN, maybe, if you're, you know, one of these large, high-traffic, or high-volume sites, there may be very good reasons why you would need to have multiple databases. I'm maybe on this very simple question. It was more, technically, why would you use it, not just... To take advantage of the things that MySQL, I mean, WordPress out of the box is extremely simple, right? It's saying, okay, here's my, it's using MySQL, but it's kind of using it in a SQL-like sort of way. I mean, it doesn't take advantage of a lot of the really advanced, not quite, but it's not taking advantage of a lot of the things that MySQL can do out of the box, like replication, partitioning, failover, et cetera. Oh, not failover, but it doesn't really leverage any of the things that make MySQL really actually good at what it does. Keep in mind, I mean, you guys know what Facebook is built on, right? It's MySQL. So, can MySQL do a lot of things? Absolutely, if you take advantage of it and really use it the way it's supposed to be used. HyperDB is a way to modify the default WordPress interaction with MySQL so that you're actually taking advantage of all this really cool stuff that MySQL can do. Cool, if you're a DBA. There was a question over here, and then I saw your question. It was more of a comment question. One of the use cases that I got into WordPress for was when you've got a web app that needs a published print. So, you've got all this, I build a custom plug-in that processes all this boatload of data. And I don't have that many people hitting the website, but I'll just put one up. Let's say it's an inventory system for a massive e-commerce solution. So, got these custom tables, we're gonna be running all sorts of reports. Hey, we got 100 people hitting them. Sure. In the site. So, one of the reasons why you would take advantage of HyperDB, for example, is that there's things happening behind the scenes, be it mostly at the plug-in level, like your e-commerce settings, or perhaps buddy press, for example, or BB press, membership settings, et cetera, that are extremely database. I absolutely agree with that statement. Because to me, and that was actually why I came to this, we call it gone deep and wide. We buy more hardware, or do we rebel? And, yeah, so the question that's been around for years as long as system admins and DBAs have been around is, do you buy a bigger box, or do you buy more boxes? And HyperDB is one way that you can buy, I mean, it's, believe me, it's not an actually easy thing to split my SQL into 20 different servers. It's a relational database, relational is in math, this is relational theory. You start breaking that mathematics pretty quickly if you start splitting these servers up. I mean, the secret sauce in MongoDB, Sharding, it leads to a lot of unintended consequences if you don't understand what you're dealing with. I'm not actually a huge fan of partitioning, by the way. It's one of those, I'd much rather go to nice no SQL if you haven't already figured that out by now. It's the second people start saying partitioning, I start leaving SQL in. But yes, I think a very good use case, in fact, is suppose I've got an e-commerce site, suppose I've got a forum, a buddy press, something like that, membership site. It's not at all impossible, or even unlikely, that I would have a lot of people beating up the database. And just being able to split that load on the backend, that makes a big difference. I mean, it really, really can make a huge difference for you, so definitely the plugin. And by the way, that's actually, that's kind of a pet peeve for me. I mean, I really am tired of seeing social media icon plugins, like we don't need anymore. Which just, you know, build me a plugin that lets me manage a stock portfolio, build me a plugin that allows me to check in somewhere. I mean, are you guys familiar with apppressor at all? Yeah, so it's literally you can run WordPress websites as applications on an iPhone, for example. I believe it's an Android as well. This really kind of, to the extent that WordPress can take advantage of these cutting edge technologies, I think actually understanding, not necessarily because you want to go out and be a DBA because it's so much fun, but you should understand a lot of the things that are going on behind the scenes. And that way, when somebody asks you, hey, well, can we turn WordPress into a portfolio management system? Can we write a plugin for that? Well, yes, it's just PHP, MySQL, big deal. What do we need to think about? Hardware, how much we're hammering the database, how many writes we're making concurrently, and so forth. That's kind of my bigger hope, and why I hope this, what I'm hoping you come away with is there's probably a lot of things that we could do with WordPress, but they're gonna involve a little bit more work on the database side, I think. You had a question, and then you had a question. So since you're a huge fan of partitioning. Yes, I love partitioning. I heard you talk about large multi-site, kind of scalable multi-site. If you do go down the path of provisioning your databases, would it be more efficient to have that on one kind of powerful machine with multiple databases on there, or kind of get cheaper DPSs and kind of spreading that out across multiple? Now, obviously, you have the maintenance of the multiple machines versus I.O., I.O., I.O. So the question in a nutshell is if you have to partition your data, what's the best strategy in order to do so? Do you use lots of machines, lots of little machines, a fleet of them perhaps, or do you use smaller but more powerful machines? The problem with partitioning is there is actually no one correct answer to that question. And that kind of goes back again to what your goal with partitioning should be is to treat each, you basically want each server to be its own little universe. If you can partition your data in a way that server A doesn't technically need to be aware of the data on server B, or on server C, or on server D, your partitioning, the effect of partitioning your data is actually gonna be very minimal. And in the extreme, I suppose that instead of partitioning my data, say I'm partitioning the user table, instead of saying S's, T's, U's, D's, I take an MD5 hash, right? And based on the content of that MD5 hash, I say you go here, you go here, you go here, you go here, et cetera. I'll get consistent hashing, relatively consistent hashing, using something like an MD5 or a CRC32 to spread my data around the cluster. As long as everything that I'm spreading around the cluster is contained, like if all of my, these four guys live in their own little universe on this server, and these four guys are on their own little universe, on their own little server, it doesn't matter. You can have 100 VPSs, you can have two really powerful servers. That being said, I think that one of the biggest technical challenges is getting to that little island scenario where you're in your own little universe, a lot of times you are gonna have to go with better hardware, you're gonna need to keep all of my posts on all of my databases and then some of the user records, for example. You run into those sorts of situations a lot. An example would be membership. You would probably need to, you would probably want to put all of the membership content on all of the different sites, that way you're not dealing with network latency. The bigger point, I guess, there is, it depends. There's no one answer to that. Do I do more, do I do fewer? It's, you are gonna have a very long day. It's basically the best I can tell you. It's not an easy decision to make. And it's one that people tend to make correctly, even if they know exactly what they're dealing and exactly what trade-offs they're dealing with. Because it's really hard to predict the evolution of how a technology is being used. And it goes wrong a lot. So there is no good answer to that. I apologize. You have a question, I see another question back here. Sure. Have you ever seen a case where you would do something like partition your database for a reason other than performance? Absolutely. So have I ever seen a good use case for partitioning your data set? Sure, so maybe you have two companies and they form a partnership. So they each have their own existing data sets, but maybe there's something they wanna share. I actually misunderstood your, just wasn't thinking when I heard your original question. Have I seen a good use case for partitioning your data that's not related to performance? Which isn't, by the way, related to performance, it's related to keeping a machine breathing. You run out of space, you have to partition. Or you run out of connection limits. It's not like I'm trying to increase my throughput, it's I'm trying to stay above water. No, actually, I really have a hard time. For you mentioned the example, what if I'm dealing with two companies merging together? Would they maybe partition their data? I guess maybe, but I mean, if I were the DBA on that project, I'd say use different databases guys. Why do you need to share that? If that were really a concern, like oh, we need to share credentials, use an API. I would not lean that direction. I would, I don't know of any good reasons to do so. So Amazon RDS or running SQL on an EC2? Am I, so the question is, do you use Amazon RDS or Amazon EC2? I don't know if I can, like if part of my speaker protocol allows me to answer questions like that. I can talk to you about that afterwards. I definitely have an opinion about it, but I'm not sure if I can officially on the record say, use this one, not this one. I don't know. Does anyone here know if I can answer that question? It puts us hands down. It'll be tweeted. It'll be tweeted? Whatever you say, we'll be tweeting the entire world, though. Exactly, and then I'll get a phone call from Amazon. They're on the record. You're on the record. Exactly, and then I'll get a phone call from Amazon, you know, saying cease and desist, you owe us $30,000 or something like that. Yeah. Yeah, that's a risk you're gonna have to take. I'm not gonna answer the question. I'm afraid you're gonna have to hear it. I'm still using Amazon services, so can you ever put your money into one bucket or the other, Amazon? It again depends. I don't have a tremendously, I think you can do a lot better with EC2. There are certainly some cases where RDS has a lot of great use cases. One of them is, you know, we're moving to a no SQL shop, but we're not quite ready to throw our billing data into Cassandra and hope for the best. So we need a MySQL cluster that we can kind of set up and fire and forget, and that's exactly what we're in the middle of right now at my company is we're using less and less SQL and more and more no SQL, but we still want our billing data in, you know, Postgres, because we care about asset compliance and so forth, and those sort of fire and forget, don't really want to tweak it, don't really want to get a whole lot out of it. RDS can be a very good deal. That being said, I believe all RDS storage is EBS based, I think, and that's, you know, EBS has an interesting track record. So that's certainly a lot of network latency, a lot of noise. I can't say for sure, like, you know, definitely use EC2, definitely use RDS, but more use cases than not, I'd probably lean towards EC2. Yep. So, database API, so WB, UB, or WQ, so how do we close the connection and... The WordPress, how do you close the connection? The WordPress database API will close the connection for you. The WordPress database API will close the connections for you. I mean, it's very well written. Where you start running into problems are where developers don't know the API, for example, and they're just like, I'm just gonna write some SQL. When you say just write some SQL, I see you're referring to not using WPDV, but instead actually using the MySQL command. Yes. Yes, okay. And that happens. So the question is, am I referring to people who basically do it the wrong way? They don't use WPDV or they use the... What's that? Well, we can do that. Why are you even talking about it? I've never seen it happen. Ever. No, I mean, basically, and this is, know your tools, you know what the WordPress database API is. It's pretty well written. There probably are some edge cases that wouldn't terminate a connection. There may not be, I actually don't know. But the instances that I've seen it, where connections aren't being closed, it's because some guy just wrote a PDO and started firing off database queries. I was worried that I was sitting there going, I can't believe I'm not doing this, but I'm just saying. You probably are. Third question. Yep. Third time's a charm. You were late. What was your preferred talk? My preferred talk was, you guys should be in all kinds of trouble. EC2 versus RDS, and probably got an email from Amazon right now. The WordPress end of the day, so I was basically not even trying to do a developer talk today. I was gonna try to talk to intro developers or intro designers, for example, about, okay guys, what functions PHP again? What's the core functionality plugin? Sort of raise the bar in terms of these best practices, and if it's not, I mean, again, this is a fairly boring topic. It's an important one, but I think it's a very boring one. But I think one of the things that I'm hoping everyone comes away with is, there are best practices, and there's a lot of technology here in the stack. I mean, PHP alone is a very big animal, right? It's a very big and complicated piece of software if you really invest in it. Likewise, my SQL, DDAs exist for a reason. HTML, CSS, et cetera. Try to, something that I've noticed a lot with WordPress developers is the number of people who don't know any other languages. Go learn Python, go learn Java, go learn something else, just because it will actually make you think about all of the code that you're writing a little bit differently. Usually in a good way. I'm a huge advocate of that, and that's I think if I can get two takeaways from my talk, one is look into your database a little bit, think about what can I write plug-in-wise that's not another social media icon. And then two, learn something that has absolutely nothing to do with WordPress. Hopefully later, it'll probably come away a much better WordPress developer for it. So, I don't know if I'm out of time or not. I'm certainly boring, and I apologize for that. Yes, you have another question. So you mentioned that plug-in developers don't write good queries. I can't, I mentioned that plug-in developers don't write good queries. None of them do, period. There's no, I can't make a broad sweeping generalization like none of them ever. Is there a lot of bad SQL out there, yeah. So what resource do you recommend? The WordPress Codex and the database API is really good. If you just follow the recommendations of the WordPress Codex on that topic, you will not go wrong. At least if you will, I'm not aware of how. That's kind of a boring answer too, isn't it? I'm sorry. You're like, man, that's boring. A lot of people don't do it. Basically, they just take PHP and they do PDO comps that you want to jump or say, yeah. People don't use the Codex like they should, yeah. So you're just like, just use the PDA API. Absolutely. If you're writing queries the way that the WordPress Codex says you should, you're not the one creating problems. It's the guy that writes a PDO function and starts just randomly submitting SQL queries which may or may not be subjectable. Sometimes they don't even write a PDO. They're just like giant gaping SQL injection attack waiting to happen. There's a lot of stuff out there, so. So can you go to the place on WordPress Codex that has what you're talking about? Absolutely. There's one there and I'm not seeing. Sure. Now you guys are going to take it out on me, right? Like show me where it says that. Take it out on me if you feel like it. Exactly. Payback. Exactly. Hang on a second. Does anyone else have it or not? Yeah, but it's just like turning my off. I can't even find my Wi-Fi icon right now. Sir, I will find you after this talk and once I have internet or I'll find it on my phone or something like that. But are there any other questions? Not so much. Well, thank you very much for your time. Sorry for being boring.