 All right, so welcome to this session about scaling, MySQL and MariaDB. I guess I have both here because the things I'm gonna talk about apply pretty much equally well on both, but there are some differences and when they come up, I will mention them. So who of you all were here at Peter's talk this morning? Okay, not a whole lot. So Peter talked about sharding and when to shard and when not to, so this topic is very similar, except that I'll talk more about the technologies, how it's actually done with MySQL and MariaDB and not so much about the holistic pictures as Peter was talking about. So they do compliment each other quite a bit. So a few words about myself. My name is Max, I'm from Finland and I live in the US, I live in Georgia. I don't have a Southern accent yet, but maybe one day. And I work at MariaDB, so obviously I'm not totally objective in all of these things and I used to work for MySQL AB back in the day as well, so I've been doing this way too long. And yeah, I have no comments about this picture. It's a marketing team that put it here. So what we're gonna talk about today is scaling or scalability, so I kind of looked it up in Wikipedia and this is the phrase I found that kind of best described what we're doing or what we're talking about. So scalability is the ability of a system or network to handle a growing amount of work and then in a capable manner. So I guess it means that we're able to, as load increases on our machines or our system, we're able to cope with it in some form. And then what that is, of course, depends on your definition of scalability and what the limits are. Should you be able to cope with five X of traffic without the user noticing anything or do you have other constraints? So that's up to the service provider to define what you need. It's a bit similar to high availability where the definition is pretty loose. It depends a lot on your requirements. Do you really need the system to always be the same and fully available and ready to be used at any point in time? Because typically, the harder your requirements, the more you have to pay. Well, some in money, but maybe not only money, but also time and admin work and so forth. And it's the same, of course, with scalability. The more you want to be able to scale, the more you have to invest in time, money, hardware, and so forth. Right. Then typically when we talk about scaling, we differ between horizontal and vertical where horizontal just means you add more hardware to your machine and vertical means you, sorry, the other way around. Vertical means you add more stuff to your machine and horizontal just means you add more machines. And typically in the MySQL world, MySQL world world, we've been doing a lot of horizontal scaling where we add more machines. And that's pretty much what I'm gonna focus on here. It's a lot less interesting to just add stuff. But for those of you who work at Peter's Talk, typically when you want to do horizontal scaling, it adds pain, especially if you do sharding. But typically when you add more nodes, it becomes more complex to manage. So adding more resources to one machine is typically a much easier solution. But it's less interesting, that's why I'm not gonna talk about it. So when do you typically need to scale? Well, to scale, especially in particular with horizontal scaling, of course, it's when the resources of your machine is not enough. And it could be that your data set is very large. So you have several terabytes of data. And, well, performance is not as good when you have several terabytes. Like InnoDB is typically very good at scenarios where your memory size is the same range as your data size. So if you have say 500 gigs, 256 gigs of memory and your data is about 300 gigs, InnoDB still performs pretty well. But when you start getting to 10x of data compared to your memory size, InnoDB's performance typically drops, right? So it has an impact on your performance. And then the question is, well, what performance degradation can you live with and when do you need to do something? All right, so that's typical when you need to scale. And there's a few examples here of what that could be. So then when you look at what are the technologies available in MySQL and MariaDB, I'm gonna talk a bit about doing sharding, sharding in a different layer than doing the sharding in the application and so forth. But the technologies available to do some kind of scaling are replication, which most of you probably know, Galera clustering, and then you go into these sharding technologies. And that's pretty much what I'm gonna talk about here these technologies. So let's start with replication. Who has used replication? All right, actually, who has used MySQL and MariaDB? Okay, who hasn't? That's a better question. Who hasn't? Okay, everyone has used. Good, you're in the right talk, then. So replication, most of you has used replication, right? So replication was added to MySQL a long, long time ago. 15 years ago, almost. So it's been there for quite a while. It's been upgraded a bit, changed a bit throughout the years. But kind of the basic technology is the same. You have a master. You have a master, everything you ever write or every query that changes data is logged in a binary log. And then you have slaves that are connected and they basically ship these log events to their own relay logs and then they implement these changes as well, right? So it's an architecture that's totally slave driven. The master doesn't enforce the slaves to do anything. The slaves basically connect and they keep track of their own what they're doing and so forth. But from a scaling point of view, what's interesting here is that, first of all, this is not synchronous, right? So the slaves are never, well, there's no way of guaranteeing that a slave is in sync with a master. So basically you don't know. There's a delay. It might be large, it might be small, but it's kind of undefined. And the second thing to notice is that all nodes have to perform all writes, right? So if you do an update on a master, every single slave will also do the same update. So you cannot scale writes with replication because every node will have to do every write. So replication can only be used for re-scaling. Right, and there are, well, interesting. So that's kind of the basics. So if you look at the phase of replication, it's basically three phases. You have a, for the standard replication, there's a, when you commit to a transaction, it's at the same time, it's written to the binary log on the master. This is actually optional. You can do this in two phases, but let's say if you care about your data integrity, you don't wanna have it optional, you wanna have it in one phase. There's an option called sync bin log on in MySQL. Who knows that option? No one. So sync bin log is the option that actually forces MySQL to synchronize your binary log events with your commits. If this option is not enabled, your binary log is not necessarily synchronized with your transaction log, which means that if your master crashes, you might have transactions in the binary log that were actually never implemented on the master. So, well, if you care about your data integrity, you should always have the sync bin log on. I'm wondering if the default has changed. Colin, do you know? What? It's on now, it's five, six or five, seven. So before five, seven, by default, it's off. So by default, your binary log and your transaction log are not synchronized. All right, so that's phase one, and then phase two is shipping the changes to your slave, and then phase three is applying the changes on your slave. So it's three phase, I guess. Well, so it's the redo log. I'm talking about the redo log on innerDB, which is the crash log. I mean, this is not a problem if your master never crashes, but if your master crashes, your binary log and your redo log might not be synchronized, which means that you have a problem. Or I mean, your data is not the same, basically. If you're using something else than innerDB, well, typically you already have a problem then, so if you use my ISM, there's no integrity already on the table level, so it doesn't make a difference anymore. All right, and then, so that's replication, there's a thing called semi-sync replication, which basically ensures that one slave has received the changes, so basically this phase one and two are made one phase, so when you commit on the master, it will wait for a slave to acknowledge that the slave has received these changes, so it adds basically a small weight here, but it doesn't change the nature of replication at all. Right, so what about using replication for scaling? So you can use replication for scaling, it was actually originally developed for scaling, so replication exists in MySQL for scaling because MySQL used heavily on websites, and typically on websites you have a lot more reads than you have writes, and that's why we developed replication where the customer who asked for this feature to be able to scale its read traffic. So the idea is that when you have a replication like this, you direct all your writes to the master, but the reads can go to any of the slaves. Now, the problem here is that the slaves are not necessarily in sync, right, so you don't necessarily read the latest data, so you need to have an application where this is possible, so for example, if you store blogs and it might not be so important that every user gets the latest comments to each and every blog, so there you can easily read from the slaves, and if there's a delay, there's a delay, but let's say you're dealing with monetary transactions, there you don't wanna have delays. You don't want someone to see that their account has more money than it has or something like that, so that's good. So, I calculated some numbers with regards to using the application for read scaling, so if you have an application where your read-write ratio is 95% reads, and if you then add four slaves, so you have one, and you compare that to having just one machine, basically your load goes down to 24% of having just one machine, so it's a fairly good distribution, you went from 100% on one machine to 24% on each machine by adding four slaves, but now the read-write ratio on each machine has dropped to 80%. So if you now add four more slaves, you don't get the same benefit anymore because the read-write ratio is heavier, so basically replication will incrementally decrease in value as you add more nodes per node, and if you start with something like 50% read-write ratio and you add four slaves, you still have 60% load on each machine because you can't scale the writes, so here the benefit is much, much less, and then, well, after doing this, you're at 16% read, so it's useless to add anything more after that anymore. So, from a scaling point of view, replication is great if you have a lot of reads, if you don't, it doesn't help pretty much. Then you can use replication for other things like having a hot standby or something like that, but for scaling, it doesn't help. No, it goes down for all of them. Well, because the slaves have to apply all the writes. So that's the problem, basically. And now, it's not one-to-one because, so in the beginning, what you ship was the actual transaction, so in the binary log, if someone says insert this, you would ship insert this and you will re-execute the statement, so it's actually exactly the same on the slave as on the master. But now, there's a row-based format for the binary log. So it's row operation, so it's not statements anymore, so it's not exactly the same operation on the slave, but in time, the difference is very small. So you still have to execute every single operation on the slaves. So that's the problem. All right, so that's kind of replication. Any other questions? Yeah. I mean, 50 is definitely, I would say, you gain so little by addling machines that you probably pay more. You probably want to look into some other, I mean, first doing some kind of vertical scaling on them, trying to get as much as you can out of the one box, and then after that, you have to look at sharding options, which is basically, some kind of sharding is the only way you can scale rights, pretty much. So I mean, I think it depends on your use case, but 50, but then, and again, you can use the slave for other things as well. So you can use, if you have two slaves, if you also use them as a backup or something, then it makes more sense as well, right? But one thing, of course, is also to add, to better hear that when you have this, so you need to have a proxy or somehow your application needs to be aware of the setup, right? Because you need to be able to direct all the rights to the master and all the reeds to the slave. So either you have a middle layer or your application is aware of the underlying topology. And that's kind of the same with Galera cluster. Who here has used Galera cluster? Okay, only a couple of guys. Okay, who has heard of Galera cluster? Yeah, I mean, they mentioned it every talk, well, many of the talks today have talked about it too, right? So Galera is the underlying technology and you can get it in Percona XRDB cluster or MariaDB Galera cluster. And it's basically a clustering technology where nodes are kept in sync. So when you commit a transaction, before it's actually committed locally, it's certified by all nodes in the cluster and they at the same time implement the transaction as well. So you basically get a cluster with all nodes in sync. So it's mainly an HA technology. It's there so that you could have several copies of the data. And the main difference between replication besides that it's built into innerDB or it's below innerDB so it doesn't use the binary log anymore. I'd use a WSREP library that basically has the PAXOS communication. But the main difference is that it's not its synchronous replication. So you don't have to worry about slave lag or there are options you can choose that slave lag doesn't happen. So basically your slave will always have been the same state as the master. And that also means that from a cluster point of view there is no master. All nodes are equal in a Glara cluster. So you can write to any node and you can read from any node. But similarly to replication, every node has all of the data so every node will have to write everything. So again, you don't really get any write scaling really. Right. So same thing as with the standard replication it can be used for read scaling because now if you have three copies of the data you can read the data from any of them. The difference is that you can actually use this for reading any kind of data. You don't have to worry about slave lag anymore. Now you can actually get a bit of write scaling from Glara just because you can relax some of the I guess requirements of your disk because you don't have to worry about durability on a single node because you have durability across the cluster. So you don't have to care about making sure that the read log is always committed on a specific node because as long as you commit your transaction it's already in the cluster so it's exist in one node. So you can do small optimizations like that but it doesn't provide true write scaling. So it doesn't work for write scaling but it can give you some benefits because you can relax some of the durability requirements that you normally have locally because you now have multiple nodes. But same thing as with replication you typically need some kind of load balancer in front of Glara to make this work. All right, any questions about how Glara works? I mean I explained it but very quickly. Mainly H8 technology but it can be used for read scaling. So if we then look at adding a load balancer because both replication and Glara clustering they don't provide a load balancer by themselves so you need to put something in front of it which could be the application your application could be aware of this but typically you don't want to have it in the application because you want to be able to change the underlying topology for the, well your data cluster or anything without having to change the application every time. So what if you add more nodes? What if you remove nodes? Do you have to change the application? So typically you want the middle layer that takes care of this. And here's a list of proxies or similar technology that you can use. I'm gonna show a few slides about max scale partly because it's our product so of course marketing and partly because it's fairly good as well but again I'm not objective in this, right? So max scale is a proxy we built which has great features and it's very modular and all these things but from this point of view the main positive features with max scale is that it's a proxy that's built specifically for MySQL and MariaDB, right? So it has built in monitoring that's specific to the replication and Galera technologies. And one thing we have in max scale is we have a thing called read write splitter or read write split router which basically allows you to not have to worry about splitting reads from writes in the application but max scale does it for you. So basically if you have a cluster or a replication setting you have a master and you have slaves. All you have to do is send your queries to the max scale and max scale will then based on whether it's a write or a read it will either send it to the master or send it to the slaves. So that's something that's great because you don't have to do in the application and max scale also monitors all of the servers so it basically checks that the servers are there but it also checks for replication it checks the lag how far behind are they and you can actually and you can create the rule saying all right, I don't want to use a slave who's too much behind and then because max scale monitors this max scale will just stop sending queries to a slave if it lags too far behind. So max scale works with both so max scale is just a proxy writes but it actually has built-in functionality for both. So this is the replication part you can also use it with the Galera and here it's the same it monitors the state of the servers because in Galera a Galera the servers can be up but not synced with the cluster because so if a server crashes in Galera a node it comes back up then it needs to synchronize itself from another node. So it can be there but not synced and basically max scale monitors the state of the Galera servers or Galera nodes and only send squares to the ones that are there. So these are kind of the benefits for having a proxy is that the application doesn't have to think about all these things because this proxy does it for you, right? And it's max scale is fully open source and it's on GitHub, wherever it is. Actually an interesting thing with max scale though so when you use max scale with Galera you can use it and do just round robins and queries to a node but one of the potential drawbacks with the way Galera is architected is the fact that you can have deadlocks upon commit. So I spoke a bit about how Galera works when you send transactions to Galera. So basically all of the transactions processing all the transactions processing is local. So you do when you send a transaction to Galera it does everything locally and only when you do commit is it sent over the network, right? So that's great because it means that you don't have any network hops in between your operations but the drawback is that you might have changed the same row in a different node at the same time. So you might have a conflict and I guess the good news is that Galera is aware of this and Galera will detect it and Galera has a conflict resolution which is basically the rollback of the transaction. So when you press commit you might get an error saying transaction rollback due to conflict, right? And then you have to redo your transaction or do something else. But this only happens if you write to all nodes and also only if you write to the same row at the same time but an easy way to avoid is by just directing all writes to one node because you don't really gain anything from writing to multiple nodes because every node still has a write everything. So you can actually combine Galera cluster with the read write split router and just write to one node. Now, this node doesn't become a master, it's just a master because we say so. It doesn't have, you can switch master at any point in time and it doesn't affect anything, right? But here, by using the read write splitter you basically avoid all that you cannot have any potential deadlocks or conflicts between the different nodes because you only send writes to one node. So everything is handled locally. If there's a conflict it's handled by local locks and not by aborted transactions. So that's also a good use case for Markskill. Yeah, so that's a good question but because Galera is still a synchronous cluster it doesn't matter. So if your transaction is committed here it's also committed on the others because they're still in sync. So you send your transaction here, you press commit and if you get okay it means that it's also on the other server. So then if this one crashes you just use another one and it's already synced. You don't have to do any failover like you do. With replication, if your master crashes you might wanna promote one of the slaves to become the new master but then all of the other slaves have to failover to the new master. So you have a failover process. In Galera you don't have that because they're all in sync all the time. So there's no failover process. Can you scale Markskill? You mean can you have multiple Markscales? Yeah, so the answer is yes. I guess the problem is that they don't communicate. So there wouldn't be any communication between the different Markscales but you can have multiple Markscales to the same underlying database cluster, yes. But there's no, if the Markscale crashes if a Markscale crashes you basically have to start all of those sessions over. There's no session recovery at the moment. Yeah, so you can have some kind of virtual IP between two Markscales or something that's just for HA but it doesn't really scale for scaling purposes. But for HA you can have two Markscales running one crashes, you use the other one but all the open connections on the crashing Markscale are lost. All right, so that's basically what you can do for re-scaling replication or Galera clustering. It's totally different. It's totally different. I'll get to MySQL cluster actually in a moment because MySQL cluster is based on sharding. So it's completely different and Galera cluster every node has all of the data. So it's great, it's more like what you'd normally would think a cluster is whereas what used to be called MySQL cluster which is by the way the worst product name in the history of MySQL because no one ever understood what it actually was. I used to be part of the team who was implementing and selling MySQL cluster. So we always had to explain what it actually was. Yeah, so Galera cluster because I mean as soon as you do, you have multiple versions of across the data back and you don't have a central system. So if you have a shared nothing architecture the problem is always how do you distinct, network failures from node failures? And typically there is no good way, there's no good way of doing it. You don't know, the other node is not communicating. You don't know if it's because you can't connect or because he's not running. I mean, there's no way of doing that. So Galera uses basically a majority vote basically. So that's why you need three nodes because if you have only two nodes and they can't communicate, both will stop because neither has majority. You can actually in Galera, you can have an external, you can have an external arbitrator node but then it means you still have to use a third machine so you might as well make it a cluster of three and then you're good. So three nodes is kind of the minimum and I mean 80% of our customer or maybe 90 even use three nodes. I've seen a few with larger. Yeah, that's a good point as well for load to be able to scale the load, of course. If your nodes start being highly, if you have two nodes and it's above 50%, the one node can't cope with all of the traffic anymore. That's a good point. All right, so that's kind of for read scaling but if your problems is right, you need to scale the rights. None of these replication or Galera helps you solve that because every node has to do all of the rights. So then you have to go into sharding which means you have to somehow partition your database across multiple instances and how you do it, well, you know, this is kind of a high overview of different things you can do. So kind of the best way to do it in a sense is to do it in the application logic. Have some kind of clear way of dividing your data or dividing your use cases so that you can use different servers for different parts of your application. So if you somehow can do it in application, that's typically the best way to do it because then you also control exactly how you shard it. But in some cases, you just can't do it like this. You have one big table and basically you need it to be one big table, for example. And there, I mean, if you shard in the application or lower down, it doesn't really matter because you still need to be able to put the results together from queries in this big table, for example. So there are some connectors that offer sharding functionality like the connector J or together with MySQL Fabric, it works. You can do it in the proxy level. So for example, Max Scale, which I talked about, it actually has some very primitive schema-based sharding. So if you are able to, if you have the same schema with different schemas, basically it can shard so that each server will only have one of the schemas but it looks like the server has all of them from the application point of view. But then if you have the case, as I mentioned, of the one big table, this of course doesn't help either because you need to be able to split the table, the one table into multiple pieces. And you can't use external tools for that like Scale Arc or something, but there are also some storage engines that offer sharding built into the storage engine like NDB, which is MySQL cluster. I prefer calling it NDB because then it's less confusing to what it is or spider and some others. So actually, Peter kept on mentioning this, so sharding equals pain, right? So there's a lot of reasons why you shouldn't shard. There's a lot of disadvantages with sharding. Your queries become more complex if you wanna be able to aggregate the data together. You typically can't do it in one query unless you use a sharding technology that's on the storage engine level like spider or NDB, but if you use any higher level, you can't get the results. If you shard on the application level, well, you basically, there's no query that can give you result across all the shards because you have to do it in the application layer. So you're not able to solve your problems with SQL. It's harder to manage because you have multiple servers. It's not, high availability, of course, is much harder because you need to make sure that every node is highly available and backups were also mentioned. And if you want to backups and things, you have to somehow manage that they're all synchronized as well from a data integrity point of view. If you do a query that touches multiple shards and one shard crashes in this operation, how do you now make sure that there's not three shards that has the change and one shard that doesn't have the change and things like that? So you have to somehow deal with those kinds of things as well. So pretty much everything becomes more complex when you shard. So I mean, Peter had a 50 minute talk about sharding and basically all he talked about was how not to shard. So because of this, right? So, I mean, and he has good points, right? But I'm not going to go into that if you want to know more about this, look at Peter's talk. So just looking at where you can show, I talked about this, you can do application, you can do proxy or below the database because there are storage engines that provide sharding. So someone asked a question about the MySQL cluster or NDB and NDB is basically a sharded storage engine. Originally it was actually a key value store, a sharded key value store. So basically in NDB you separate, you have the MySQL servers or MariaDB servers up here and the lower level has basically a sharded key value store and then you have functionality for getting values. Now, and basically NDB has built in sharding, it's based on the hash, well, a mod of the hash value of the primary key. So you have a table with a primary key, by definition it's sharded and it's all based on the hash of your primary key and you can't change it and so forth. So you can't basically start MySQL cluster without sharding. And it also had built in HA and all that kind of stuff. So it's fairly fancy, but so who here has used MySQL cluster or NDB to a couple of guys? So was it easy to use? No, exactly. A lot of trouble. We had customers who tried to using it and it typically for generic OLTP use case, there's a lot of drawbacks, a lot of things. Well, many of them are just because it's sharded because you can't do the same things with sharded. Because we provided a full SQL interface, people would try to do like joins and aggregations and things like that. And because everything was sharded, it took a really long time to build these queries. I remember a use case where we had a customer who had a fairly small table. It was a 10 gigs or something, not even though it was less than that. It was like whatever, two gig table. They put it in MySQL cluster. That was the big table, then they had a few small tables and they put it all in MySQL cluster. I think we had four nodes. We did a query, it was a join. It took eight hours to get the results. So then because it was only a couple of gigs a table, it was a four gigs, three gigs, whatever it was and the other tables were smaller, so we put everything on a inner DB, on a local server, and it came back in seven seconds. So eight hours or seven seconds, right? Yeah, so first of all, at a time, MySQL didn't have any other joins but nested loop joins. Which means that you do, it's like a for loop, for loops. So you start by, okay, give me the first row of the first table and then, all right, first row of the second table and every single time you do this, you do a network hop because it's distributed on the network. So if you have a table with a few billion rows, you have to do a few billion network hops and this is for every table in your join, joining seven tables, the latency times a few billion and it starts showing. So it's all latency, right? It wasn't processing, none of the machines was running on high CPU load, it was all network hops. So that's kind of the problem with sharding. Now some of these issues have been solved with cluster, it's not as bad anymore, but you can't do complex queries as efficiently if you shard, right? It's just out of the question. Right, so that was the DB, I'm not gonna talk so much about the DB. We get customer calls, it's like, hey, we have this MySQL cluster, we have data on it, we got it running four years ago, no one has touched it, can you help us get rid of it, right? Because no one remembers any commands anymore and it just runs, right? So we still get this, but typically it's customers who wanna figure out how to get rid of it because no one knows how it works anymore. But I guess MySQL gets more of these than us. So another thing I wanted to mention as sharding technology is spider. Has anyone heard of spider? No one? All right, so spider is built into MariaDB, so if you download MariaDB, you actually get spider there. It looks like a storage engine, but it's actually not really a storage engine, it's kind of an overhead, I guess, to storing the data. So what spider does is that it uses the partitioning interface. Who knows if MySQL or MariaDB has partitioning? No one? One guy? So there's a syntax we're doing partition by and then you decide how you want your table to be divided into different partition, it could be based on a date value or lists or whatever, you basically decide you wanna split your table up into smaller pieces. And that's been in MySQL since 5.1. So what spider does is that it allows you to use this partitioning syntax, but instead of having local pieces of the table, you can actually put them on remote servers. So the spider, I guess, node won't store any data, but it will basically be a relay to where data is actually stored. So this allows you to very easily, I mean from a syntactical point of view, very easily split up the table and you decide completely how it's split up because you say partition by, and then you use any of the built-in partitioning mechanism, and then spider basically takes care of distributing the data. Now, spider is not, I guess, an enterprise grade feature. You have to do a lot of these things manually. So for example, you have to make sure that these tables exist on the other nodes. You can't do re-sharding easily. It has to be done manually and so forth. So there's a lot of things that could be improved with spider, but still the basic concept works and most of all, it provides right scalability because now if I have the spider table here, I've split the table up into three based on just different customer values. So now if I send a right query to the spider node, it won't do anything, it will just relay the query to where it actually should go. So with this setup, I could do three times as many inserts per second as I could with just one node, or I could store three times as much data, or I mean both technically. Of course, there's a small overhead of having the spider node, but it's fairly negligible, right? Spider is also transactional, which is great. It uses the XA transactions. So it's actually, when you have a transaction that uses multiple nodes, it actually creates a transaction so that these can be rolled back in case there's an issue. So basically it's how it looks. I mean, technically the remote table can be anything, so you could actually have separate types of tables depending on what you're doing with them. So depending on what the use case is. And the remote servers are not particularly spider, where it's all done by the spider node. And you can have multiple spider nodes attached to the same spider cluster, similar to MaxScale where you got multiple MaxScale nodes attached to one cluster. So it's an easy way to do sharding and it's built into MariaDB. So here's an example of how you do it. I create a table on the spider node. I say partition by and I give the partitioning scheme and for each partition, I basically say where should that partition be? And that's it. And then on all of the backend nodes, I have to create the same table, but here there's no mention of spider because these nodes are spider, are not spider aware. So then if I do an insert on the spider node, basically it looks like one insert, but it will actually create three inserts. Every node will just get one of these rows. And there you go. So if you look at the performance of spider, so basically, I mean, if you do key lookups, it's really fast, but if you do anything more complex, similar to NDB, if you start doing joints and things, it won't work as fast anymore. However, spider has some optimizations for doing some things like if you have small tables, like an auxiliary table that you often need for joints, spider has actually built in functionality for doing the joints locally. So instead of doing the joints on the spider node, which would mean fetching all the rows first and then joining, you can actually tell the spider nodes to do the joints locally and then ship the data. So you do have some optimization. And obviously for writing inserts are generally faster because each node is independent. So basically you can scale read writes and that's pretty much why you would use charting or spider in the first case. Right. Any questions about spider or charting? The spider nodes fail over design. So basically, you basically have another spider node. So, I mean, similar to Maxi, you would have two spider nodes that have the same setup. You use one and not the other and you have some kind of VIP in front and then if the spider node fails, you switch to the other one. But again, I would stress that spider is not an enterprise grade thing. It works, we have customers using it, but we acknowledge that it's not enterprise grade tools. I wouldn't recommend it for anyone who doesn't know what they're doing. But obviously, typically the customers we have, we've told them this and they have a use case where they need something like this. So I don't think so. I haven't heard of anything where it's built because this is like low level charting. So typically the charting solutions are above the database layer. And this is below. So I haven't, I can't think of any that's like this either. All right, but so to summarize, I mean, just scaling in general, I mean, I typically agree with Peter, right? That charting is a pain and you should avoid it at all costs, but there are, of course, reasons why you made money with charting. And I mean, Peter had an example about the customer who wanted to chart and he started asking questions and basically there was no real reason and they just wanted to chart because they heard it was a cool thing to do, right? And we've had similar, even worse. We had a customer who had performance issues and so we started looking at what they were doing and they were charting and they were charting with 16 nodes, right? And they were using an external tool for charting and they basically had 10 gigs on each node, right? And we looked at their traffic and it was the same. They had like a, so they had a couple of hundred queries per node and so we asked them, why are you guys charting? He's like, well, because we'd contacted the charting company and then told us, they told us to chart. I was like, well, yeah, I mean, they had sold their charting technology to them, right? And they, so we were like saying, well, you know, I think we could, we said, well, the best solution to a problem is actually to use one node, get rid of all the charting, put everything on one node because their performance problems were with complex queries where they had to aggregate data from all of these nodes. So yeah, we have these conversations a lot, right? So you should only chart when you really have to, right? When there's a really good reason to it. You have a table that, I mean, typically it's one table, but it could be more than that, but something like that. And I mean, if you have a high read ratio, I mean, these replication was built for read scaling, right? So if you have a high read ratio, just use read scaling. Don't just start by using charting, either. All right, any questions? So is Galera a commercial product or is it free? Galera is open source. So in that sense, it's completely free. There exists, so Galera, so the company behind Galera, it's called Codership, I think they're gonna change their name to Galera, but they basically just build the replication part of it. It's an underlying library. So what you wanna do is you wanna get a binary with this built in. So you can either get Percona XRDB cluster or MariaDB Galera cluster. And in MariaDB 10.1, the Galera is built into MariaDB. So it's not a separate binary. Before Maria 10.1, it was a separate binary. But after MariaDB 10.1, you basically just get MariaDB and you enable Galera and it's there, basically. I mean, Galera is great in the sense that it's synchronous, right? So you can use it. It's a very good H8 technology as well. But then it depends on your load because Galera is a bit more complex. Standard replication has a lot of drawbacks. It's more complicated. You have slave lag and all this stuff. But if you have huge amounts of reads and you need to scale to 30, 40 servers, then I wouldn't use Galera. So it kind of depends a bit. Galera is great for a smaller use case where three nodes would be enough or maybe four or five or something. But if you need to scale it massive, then Galera is probably not. I would probably do like a Galera. And also another thing with Galera is it's synchronous so you don't want to have them far away network-wise. So you would want to have all nodes in the same data center and things like that. So even if you use Galera for HA, I wouldn't use Galera for a remote data center. I would use standard replication from one Galera to the remote data center, for example, right? So standard replication for a larger amount of nodes and Galera more for HA, from that point of view. Any other questions? All right, if not, thank you very much for listening to me this long.