 We should all go for that. So yeah, do stuff at Codership, do stuff elsewhere. Done a lot of MySQL for a long time. So what is Galera, really? It's a virtually synchronous replication. Some of you managed to stick around to talks earlier about group replication. It's also high availability with no data loss. Consistent data across all nodes. So copies of data across your three different nodes, for example, no single point of failure. So I think that was an ocean-based talk that said you could reduce the size by a third, even though it's distributed. And there's a reason why they do that. Whereas we follow repeatable read, very consistent, three copies of the data, very much like NDB cluster. That's quorum-based failure handling. So again, you need three nodes. Because if one node fails, you still have two thirds. Commits with optimistic and grand city control. So whatever's written to basically one node is shipped via certified before it actually is the OK send back to the client. It is multi-primary or multi-master. All your nodes are equal. It's actually a core feature of the product by design. So it does transaction conflict detection. And you can issue your basic transaction to any node, though many people do have a proxy in front of it. And it works, obviously, in clouds without issue. Obviously supports SSL, so in transport as well. No framework is required for automatic failovers. It does do parallel apply. And it's got literally thousands of users. It's still doing extremely well. Of course, you have some trade-offs. So it'll never be as fast as asynchronous replication, which is just commuting to primary or semi-synchronous, which is primary and at least one secondary. So we never will beat the laws of physics. And of course, storage. Storage is going to be tripled. So we have many instances of people across the more than a decade now of deploying galeras across various companies. You'd actually realize that sometimes people want five nodes, seven nodes. But you're really replicating the data five times, seven times. It's not really the most practical thing to do. You should really aim for a three-node galera cluster, not a two-node galera cluster, which actually a lot of people also think they can do and get away with. So there are multiple distributions of galera. I just want to highlight, well, obviously, the highlights. The codership upstream variant is the one that actually comes with clone SSD as a plug-in option. So your SSDs don't happen via extra backup. And I'm sure you've seen some nice presentations about why the clone plug-in is much faster. It's kind of like an r-sync with enough amounts of locks. So it's not just r-syncing a live data directory. Clone is much faster than extra backup, which I'm sure anyone from Oracle MySQL will tell you. It's also true when it comes to full state snapshot transfers. And for the non-galera folk in the room, galera has two forms of state transfers. One is the incremental state transfer, which is something you have if you already have an existing galera cluster and a node goes away for a period of time. And it comes back. You can get an incremental set of data. It gets like a diff. Whereas the full state snapshot transfer is for provisioning new nodes, because it literally can either r-sync a copy over or use this clone method. MariaDB, actually the first to ship galera cluster in a distribution. And it's included, actually. So MariaDB, by default, is either async, semi-sync, not as a plug-in. Galera also not as a plug-in. And you obviously get all those amazing MariaDB features that multiple people talked about earlier today as well. And then there's Pacona as well. Basis Pacona server, it comes with proxy SQL. Now with the operators, it comes with HF proxy. The strict mode is kind of nice. For what it just allows my ISAM tables without primary keys, forces you to ensure the bin log format is set to row. And it actually has automatic configuration of SSL. This is maybe the most interesting thing out of Pacona XDB cluster 8. The fact that you probably run a database now configured in the cloud. And I hate to break this news to you, but all my SQL replication happens in plain text unless you turn on SSL. So if you're not turning it on, I don't know what you're doing. So there's some feature highlights, obviously, for galera. For one, there is things like intelligent donor selection for an incremental state transfer. So it'll find the node that is most compatible to giving you information, plus with less load. PC recovery equals on has been turned on. So in the event that a cluster node crashes, or even if the whole cluster crashes, the ability to have persistent cluster information maintained, not requiring a bootstrap is much easier. This is actually how many tools nowadays can manage to actually do an automatic recovery of a cluster because it actually manages to pick up the GRA state that file information fairly simply. GT IDs are different in both my SQL and MariaDB. And I think that was a good idea of how you could design the MariaDB GT IDs. Now the good news about galera is either if you use my SQL or MariaDB, it follows the native GT ID of the database server. Pro tip, galera actually came with its own GT ID. Galera was actually the first to come up with a GT ID implementation followed by tungsten replicator, where they had this in a transactional history log followed by actually supposedly would have been, it was actually then MariaDB and then my SQL. Foreign key support, very good. When I say improved foreign key support, all you were seeing in the error logs were lots of errors when foreign keys were encountered. It turns out that they were just warnings. They weren't errors. Your foreign keys were actually running just stuffing stuff into the error log. So by improving it, we've actually made it less verbose. So that's good. Some new MySQL tables like WS trap cluster, cluster members and streaming log. Cluster and cluster members will tell you fairly things about your cluster as well as the members in the cluster. Could be three nodes, nine nodes. And of course, streaming log. I'll talk about streaming replication fairly soon. So this one replicates transactions of any size. So before Galera 4, your maximum transaction was two gigabytes. So if you were doing a low data in file and if you exceeded two gigabytes, it would fail. So this would, and you could actually control it. Typically people were telling you to always do it much smaller. But with Galera 4, you control it via streaming replication. What that means is you can actually ship 10,000 or 20,000 rows across the network even before the transactions are actually applied. But they're actually already there just waiting to be committed. So it's much better than shipping gigabytes of data and then finding that you have to do a rollback later. Poor networks. This one is something that I think Galera has benefited a lot from learning from group replication in the DB cluster. Because by default, the PAXOS-based protocol that group replication uses actually handles poor or slow networks reasonably well, whereas Galera 4 had to learn from Galera 3 to 4 about making this better. So you actually can handle much better network that will drop and are poor without sacrificing your data consistency. So this is another benefit of having open source out there. It's that everyone benefits from each other. So the reason why I put this slide here is for the sole purpose that you can actually get some of these features inside of open source Percona server. Black Box for error logging is actually open source in Percona server. Non-blocking operations for online schema change, also open source inside of Percona server. And I believe it wouldn't take much longer for Percona to also port Gcash encryption to ensure that your data directory is fully encrypted. So no one here talked about the fact that MySQL supports at-res data encryption. And it is fully at-res data encryption. So your entire valid MySQL is encrypted. And in our early morning talk today, we did allude to the fact that you could use Vault, Hushacups Vault, to actually do key management, Oracle S solutions like KMS, there's also Amazon's key management solution that MariaDB ships. So multiple ways for you to have key management, but also ability to encrypt your entire data directory. But if you happen to use Galera, the Gcash is the only thing not fully encrypted. And I honestly believe a port's coming fairly soon. The biggest hurdle that I've actually heard from most people in terms of wanting to upgrade is we don't want to migrate to MySQL 8. Most people want to stick to 5.7 or our MariaDB users. And they just say, oh, we don't want to migrate to MySQL 8. If you're still using MySQL 5.7, I mean, you probably shouldn't be sitting here any longer. You should literally go out and plan how to migrate to MySQL 8. Because not migrating is no longer an option. So most common setup that I've seen that we would obviously recommend as the minimum viable setup is three cluster nodes in one data center. The other common setup that you see is the nine cluster nodes across three data centers. These three data centers can be Singapore, Frankfurt, New York, no problem. Presumably you're running this inside of Amazon or Google Cloud. If you're telling me you want to run across clouds, then the latency will hurt your application eventually, especially if it's insert heavy. Because remember, database operations are actually local. You can actually segment database operations. So what happens with a nine node gallery cluster is that if you write to Singapore, it will write to the three nodes in Singapore, segment zero. And then it will write to one elected node in Frankfurt, segment one, and one other elected node in New York, segment two. And this actually means it doesn't send the same transaction across nine times but only five times, and two times over the network. So it's kind of efficient in that sense. So the latency penalty, as you know, is minimal because it's just certifying it before anything. And of course some people also use this with asynchronous replication. Though I like to reference Marco Tusa's blog post and his YouTube video about why running it across multiple data centers may not work out well for you, and you may prefer asynchronous replication between two data centers. For one, we find not everyone really has three data centers. Many people are happy with disaster recovery in two data centers. So telling them to actually find another data center is an expensive operation. Telling them to throw up another three nodes, expensive operation. So sometimes people do async across two as well. You must always remember the weighted corums. So this is why you don't have a two-node-galora cluster. You don't have a four-node-galora cluster. You don't have a six-node-galora cluster. Odd numbers are good. So realistic common setups that I've seen over the years. Yes, there is a two-node cluster. No, you should not run it. Three-node cluster across two data centers. Usually this comes from the telcos. They say, we have very fast interconnect between the two. We can do this. OK. So long as you know, what happens when it fails? Three-node-galora clusters across three data centers. Not even segmented. So it's one big, one-galora cluster with latency across three data centers. Not the best idea. Five nodes spread across two data centers. Again, you've got the five-node thing correct, but the two-data center thing not quite correct. So actually, I'll introduce you to the arbitrated demon after this. And well, this one's production. Seven-node cluster in one data center with async secondary is hanging off one as well. It's a big e-commerce site. Typical things you probably need in your my.cnf. These are pretty basic. Of course, you have lots and lots of nodes. Remember the segment things. This is what most people forget to do. Actually, I have a few more tips for you. So make sure you have segments for each data center. You can actually increase the replication windows, make it like longer timeouts, even timeouts to above the max roundtrip time across nodes. You must monitor flow control, which is everything with FC. That's what you look at. You can also actually run Galora Cluster on a dedicated network. Because Galora Cluster is part-based, you can actually route traffic to separate network interface cards. Support 3306 goes through ETH0. And the Galora ports go through ETH1. It's also doable. In fact, NDB Cluster does this quite often, actually. And then you need to play control to flow control. This one's being renamed, by the way, because we can't call it master slave any longer. In 2023, we must be politically correct. Causal read timeouts, EVS, auto eviction. Your GCash size needs to be set. Otherwise, you will not get an IST, an incremental straight transfer. You need to open up the port for your incremental state transfer. Otherwise, you'll always get a SSD. You need to make sure that if your methods are like R-sync, IC Linux is not going to stop R-sync. You should probably have retry autocommit set to something like 5, 8. This is maybe the most basic thing I see most people not do, is that they don't set the ability for your application to do autocommits, so that when there's a cluster conflict, there's a victim transaction and a priority transaction. At least this will help you retry up to five times, which I have five minutes left. I'll be on time with questions. Then MySQL likes to give you options to shoot yourself in the foot, like certified non-PK equals one, for example. There are actually other options that I didn't put here, like ignore, split brain, things that you should never use, really. If you don't use primary keys in MySQL with ODB, guess what happens? You get one inside anyway, yes. It's internal, so it'll actually cause problems with cascading, deletes, and so forth. So really, use primary keys. In fact, MariaDB is a little smarter in the sense that you've got this false primary key option that you can actually put inside your My.CNF. It's kind of nice. It will force the developers to be smart. This one, WSREP, replicate MyISM. It should really always be zero, because you should not be using MyISM. However, if you use MariaDB, you may find that this option does exist, because MariaDB's temporary tables are also in ARIA and system tables. So this replication needs to also replicate. So it's more or less, I'd say, experimental, but the documentation may say it's usable. The answer should be switch everything to your DB. So if you have two data centers and you want to still run this way, the best way is to have a Galra arbitrator node. So it can be sitting like a, it can sit in Amazon, Digital Ocean, whatever. And if you have an even number of nodes, you can't afford more than two in one data center. Have the arbitrator demon running on maybe your application server. All it does is it looks at the traffic and acts as a voting mechanism. It doesn't actually have to do anything to install data. So it's actually a reasonable solution to fake another node. You probably want to use some proxies. I think we talked about proxies today morning, quite extensively. So there is Galra load balancer, like HA proxy. They're both level four load balancers. I highly recommend proxy SQL or Max scale if you can succumb to the license fees. So really just go use proxy SQL. It's pretty good. It's open source, GPLv3 actually supports native Galra and group replication host group types. So you can actually reconfigure things within the proxy SQL using what looks like SQL, really. In terms of backups, backups are good to have. You should use extra backup or Maria backup. But if you want to provision new nodes, clone assist is actually really good. I'm surprised Procona has not managed to port this yet. I mean, it is open source. It is a plugin. It will get ported soon, I'm sure. It cannot run in MariaDB because MariaDB's architect is completely different from having cloned. This argument has already been had with Monty before. It takes a lot of work. So common setup and runtime issues. Nice Linux, firewall. DNS might be a problem. Everything could be a DNS problem, really. So maybe you should just use IPs. This one. Don't start and restart two nodes at the same time. People do this more often than you think. And it actually brings your cluster down. Also, if you have really long running queries, maybe that's what an asynchronous replica is for. MySQL is this thing called max execution time you can use. MariaDB, you can just kill queries. These functions are not that important. Lots of adoption, not so important. Plenty of things to improve on. Probably not so important. But you can look at the slides later. Lots of reading. I've given you plenty to read. Yeah, really. I'm open for like one bit of questions. For whoever is still here. So any questions, otherwise, I can ask one. I have a quick question. I can go back. Actually, sorry, I managed to forget my question. I will take it over to Mibu and have a quick question. Achievement unlocked. He's forgotten his question. So I mean, the thing about shooting yourself in the foot is that that's in the name of convenience. I just make it easy. Yeah, yeah, that's true. MySQL is always about making things very easy for the end user. I think to be fair to MySQL, we suffered a lot less recently compared to, say, MongoDB, with people going, oh, it's web-scale, blah, blah, blah. They got web-scale eventually. I mean, if you were on long enough, you'd remember that people used to laugh at the MySQL documentation that said, hey, you don't need transactions. You can work around it. You said the same thing back in the day. So I do remember my question. Let me just turn that off. Is that a bomb gun? No, that was my timer. So you mentioned telcos sometimes running three node clusters in three different data centers. Two, actually. Or two. OK. The question I have about this is that just because you have fast interconnections doesn't necessarily mean that they're low latency. So have you actually had problems with long, fat, pipe cases in those cases? Of course. Slow response time. And we go, well, obviously it's your setup. What do you expect? Lots of physics do apply. We can't fix them. But obviously, for some people, asynchronous replication is probably the better choice, or even semi-signals. Absolutely. But you can run into long, fat, pipe problems even there. True. Where, you know, yeah. And I think Facebook was made a large proponent of semi-sync, as was Google. And even Facebook realized that at some stage, semi-sync wasn't enough for them. Let's go raft. But you were not here in the morning when we talked about that. No, nope. Yes, OK. Well, thank you very much. You're welcome. Fascinating talk. And yeah. Thank you. Thanks.