 Absolutely. It is an intent to leave that last one. I'm not sure about the previous tool. My Oracle and open source tool. It's not built into your client packages for MySQL. But again, it's open source, so it's easy to find, grab a package for it. And again, this is a multi-threaded logical backup, just like the previous one. But again, the big difference is it's multiple files and it's file per table. And of course, it supports consistent backups with different storage engines because, you know, that's the whole point. When backing up non-transactional, such as MySQL, let's see people using this. But really, it's because most of my clients use our fully open source. So this works by copying the data files and while it's doing so, it's copying the stream of changes out of the redo logs. So it's basically pulling things as fast as it could possibly do. Do so. And of course, streaming them somewhere else. Once all the NODB tables are copied, it issues a lock for the non-transactional tables. This is awesome because it means you're minimizing the global lock, for example, your MyISAM tables. So it does all the transactional tables like NODB, grabs them, grabs them. And then global lock, real quick, grabs the MyISAM tables and releases that and it's done. That's that short lock to... There's a short full lock to grab that to make sure everything is consistent. And of course, it releases all its locks. This is actually my go-to for physical backups, is extra backup. It's an open source tool developed by Vercona. It's been out forever now. But this is the go-to, at least mine and obviously pretty much most of my clients. It basically works pretty much the way MySQL AirPrize backup works. No license is required in this case. This is the open source software. It follows the MySQL release cycles, but obviously Oracle doesn't share their source code ahead of time, go figure. So obviously they need time to test and develop the features. If MySQL releases a version of the database that has a backwards incompatible change. So definitely before upgrading your MySQL like the absolute latest and greatest point release make sure your extra backup if you're using it works. Otherwise you could be running the new shiny and not have backups all of a sudden. So MySQL 5.6 is deprecated. I hope not too many people hopefully are using this anymore, although I still see a lot of people using it in my business. But it's deprecated, right? No more security changes, bug fixes, please get off it. And if possible jump straight to 8. 8's been out now for, god what is it, 2018? I don't remember, long time, 2018. That's a long time, four years please. Get your testing done and move to 8. Like I see people going from 5.6 to 5.7. You're killing me here. So anyway, with 5.6 if you still have it you're using extra backup 2.3. If you're using 5.7 you're using extra backup 2.4. So MySQL 8 is where you want to be careful. Oracle just a couple days ago released 8.0.30. The 8.0.29, double check my numbers. Never trust anybody. I'll check my numbers, but 8.0.29 extra backup just was compatible with 8.0.29 which I think was what, like a week or two ago or something like that. But anyway, the point is just be careful. Make sure your extra backup works for the version of MySQL that you're using or planning to use. And the turnaround time seems to be pretty quick for Percona on the extra backup. And it just depends was there a backwards compatible breaking change or not and how complicated is to fix that, to add that compatibility. Snapshots are awesome. If you're not using them and you have the capability of doing them you need to look into it. So there's a lot of different ways you could be doing snapshots depending, yes sir? Yes sir. You know, I'm not sure if it's parallel or not. I'm getting nods, yes in the background, thank you sir. I'm getting a couple nods including a colleague of mine. So yes. Yes, parallel. Yes sir. Discs snapshots are awesome. There's a couple different ways this can happen. For example, if you're running Linux and maybe your volumes are set up with LVM you can do LVM snapshots. If you've got VMWare you can do snapshots. Google has snapshots. Amazon, Microsoft, your SAN if you're using it which is painful for a database but you know it just depends on your infrastructure and your needs. Your SAN probably has backups and of course there's others that I'm not covering. I think Netware has some, like there's a lot of different options. But if you can do snapshots, they're awesome. Many of the modern storage solutions and cloud providers have them. And it's basically a really, really fast copy of your database. Like minutes to do hundreds of gigabytes. You know, terabytes are very quick. Maybe just a couple of minutes to do like a terabyte of data. If you're running the extra backup or God forbid one of the logical backup tools on a terabyte of data it's gonna take, so a logical backup on terabytes of data you're probably talking about like a week or something. I mean obviously depending upon the speed of your storage your network, all the different things. And by the way in case I forget to mention it elsewhere because I don't remember if I have it in my slides if I'm doing backups again, don't have them on the same server. So one nice way of doing the backups typically you're going to do it on the server instead of remote, it just depends. But I usually like to do my backups originating on the database server. Meaning I'm running the command there and then streaming them elsewhere. But the streaming it elsewhere could be just as simple as okay I've got a SAN or whatever or a GCP disk whatever that I've attached to this server and mounted that volume. So now you've got a new drive, new folder, a new volume representing a data storage device that's attached to your server. So you could literally be, your backup could literally be saying okay just shoot to this directory on the same server. It's not actually on the same server. So it solves your problem as the server goes down that you've got to copy somewhere else. And that makes it really easy. You can just detach that volume or whatever when you're done whatever your needs are. So when you're doing these snapshots you need to do a little bit of work because you want to make sure you're completely consistent and everything's flush to the disk, that kind of thing. So you'll want to do something called like a flush tables with read lock where you stop all new writes coming into the database. Depending how busy your database is this could take a minute. It also depends how you implement this. A little bit later I'll talk about how you implement and how it might impact the flush tables with read lock to make it easier. So you grab that lock, that full lock that prevents any writes to the database. You can still have reads. While this is active you want to grab your coordinates. Your show master status, show slave status. If you're running MySQL 8 you can say show source status, source. Show source status and show replica status. Obviously that change has not made it back to the older versions of the database because they're older versions. So anyway, but the important point is you're grabbing your point in time, your current consistency, where you're at right now as far as the master and the replica are concerned. And then go the extra mile and freeze the file system because MySQL and or the OS might still be like oh well I've got this data in a cache somewhere there's not a lot going on right now at the disk, no that's bad. So freeze the file system, make sure absolutely nothing else is happening. Now you can grab the snapshot and you're sure that there's nothing else going on. Obviously at that point you want to unroll it in the opposite order, unfreeze the file system, verify it, unlock the tables, verify it, make sure everything's good to go. And again even with terabytes of database this is super fast, yes. I'll talk about that later but you're going to grab it, you're ideally going to grab it from the replica. But yeah we'll be talking about backups from the replicas in detail. There's obviously a ton of other tools. There's too many to go into. A lot of times they, but the point here is that a lot of times under the covers they're basically just using one of the other tools. Like for example the Holland backup tool is using MySQL dump or extra backup. Even some of the commercial tools such as Combal includes support for all the physical backup tools. And again there's a lot of different options. But for me what I see most of the time my solution is that I recommend the clients snapshots if you've got them and maybe a logical dump at the same time. Not at the exact same time, but like do logical backups maybe once a day or something or once a week and then do the physical backups probably more often, again just depending. So you've got the flexibility of both. Because you could have a bug that like I said maybe somebody wiped out one or two rows or just one table. And if you've got that logical backup already done and it includes the data you needed you're golden. You've already got it. You don't have the extra time of having to restore a backup and then pluck out that particular data that you need. Best practices. How am I doing on time? I'm doing pretty good. These are just some things I've seen or experienced. Obviously feel free to disagree. I'm always up for a discussion. I say it's mostly common sense but of course there's so many things that are common sense once you hear it. So I'm hoping you know if you don't already know it I'm hoping you'll hear it here. And if I don't cover it I've got a good friend in the back of the room that's giving a talk on backups also this afternoon. His name is Pep. So you could always go heckle him this afternoon. But like I said it's mostly a lot of common sense. So back up your binary logs. Doing so like I said you've got that full backup whenever and then if you've got your binary logs also being backed up you're going to significantly lower your recovery point objective. Stream it to another server obviously just like the backups. You want it somewhere else. So stream it to another server or network file system mounted volume whatever from another device. Bin logs were a thing a while ago but you know and you could certainly still use it. A bin log server is basically another type of replica but it's a replica that's not normal. It doesn't get used for the type of things a replica would normally get used for. So you can read from it and use it to run reports on or queries against. Should I make a joke Dave about you yawning in my class? I'm sorry. But the bin log servers are one way to stream the bin logs. I'm sorry Dave I didn't mean to call you out there. I thought it was funnier than it was in real life. You can use the MySQL bin log tool again to do that. When I'm setting up a server in an ideal world I'm going to have the data on one volume and the bin logs on another. There's a lot of advantages to that keeping things separated. This way for example your logs might not fill your data volume that kind of thing. Like you know sometimes things happen and you might get slammed with traffic all of a sudden for whatever reason and all of a sudden your log files are starting to go nuts and you might be filling things up. Maybe you forgot to set the rollover properly and where it starts getting rid of the older ones. You don't want to just keep an infinite number of logs or bin logs. And here's some of the settings. You can see what bin logs you have on the server. Show binary logs from inside the MySQL CLI. In this example there's just two of them on the server. These file sizes are super tiny. By default the bin logs are usually 1.1 gigs. And of course this is showing whether or not they're encrypted. For whatever reason as a side note not really a backup type concept. But if you ever have to delete some of the bin logs because let's say for example that server is getting slammed and the disk is full and all of a sudden you realize you've got 1,000 bin logs don't just delete them. There's an actual purge command for getting rid of them. And if you've got 1,000 of them which is a terabyte on the normal settings don't just get rid of them all at once. Because all of a sudden you're telling your storage device hey whack that terabyte of data. Do it in chunks. But anyway you have to use the purge tool because otherwise some of the different pieces of MySQL are like oh hey I've got 100 bin logs out on the server where are they? Use the appropriate tools. Incremental backups increase your recovery time objective. Again because the process you have to follow. If you're starting full backups then the incremental incremental incremental to get to where you need to be. But again it also helps if you've got that terabyte petabyte of data and doing a backup takes a long time. Maybe your write volume is really high. It might be a write heavy database. So it's more complex but it makes your backup time shorter. So you can restore the full backup like I was talking about earlier and then just apply the bin logs and basically it's an incremental backup. Full backups and bin logs will help with your point in time recovery because when you're trying, yes sir. So the backup let's say it's an increase so you still have to pull down the full backup, right? But the bin logs compared to an incremental backup. So the incremental backup, let's say you're running them once an hour. So if it's minute 59 you're almost an hour ago that you had your last incremental backup. If you're backing up the bin logs then you've got the data right up until practically the last minute, last second practically. Just give or take. So the speed is good. Probably not exactly the same. Yes sir? Yeah. It can be your incremental backups. That's the way I see it. You do your full snapshots or full copies, whatever and then keep the bin logs for like say the last 24 hours. If you can, if you're a really big company like a Shopify or whatever you're probably only keeping bin logs around for like an hour. Actually I think they might even be shorter. Some of those big companies have forced MySQL to have bin log rollovers be in the minutes. So because of the amount of data. But yeah, the bin logs are better. Do a full backup. Keep your bin logs. So the bin logs that you're replaying against the database are from your, they're the ones you've streamed somewhere else, the ones you're backing up. As you're recovering a database it's going to be recreating new bin logs, right? Because you've restored a full backup which doesn't create any new bin logs. But then as you start replaying bin logs you're telling the database hey make all these new changes to bring me up to date to where I want it to be but those are new changes. So the database is creating new bin logs and obviously you don't want to overwrite your backup bin logs because you've moved them somewhere else. You don't want them getting wiped out. As soon as you have an event occur make sure you grab everything, right? Grab what you need. Grab logs off the server. Grab logs off the database, the OS. Make sure you've got all your bin logs that you need. Move things off to the side so that you don't lose them, overwrite them, whatever. Because yeah, for example while you're replaying these bin logs that represent those new changes to the database does that answer all of the questions? So again this is this process helps bring your recovery point objective to as close to zero as you're getting. You're ideally going to have all of the data with super minimal data loss. It just depends how fast your database is writing. You might only be losing seconds of time or minutes or whatever, but it just depends. But it's basically about as good as you're going to get. And again as I've said you're basically restoring the full backup identify the point in time so that you can tell the MySQL bin log tool where to recover to and then feed them into the database you know things that'll object things that'll specify what that objective is where you want to recover to could be anything from like a dev thinking they were on a developer environment database and maybe they're like okay I'm done with this data, let me truncate it and put new test data in you know and little did they realize during a prod environment happens all the time devs do it DBAs do it if you're not careful I had a talk once about something called fit Acer basically the point is just slow down if you're going to make if you're going to hit enter on any type sorry for the segue I think it's important especially talking about disaster slow down if you're about to hit enter that if it's on the wrong server if one of the parameters is wrong or you're in the wrong environment any of those things if it could cause damage somewhere stop just verify are you on the right server are all your parameters right is the command right are you in the right directory don't do an RM minus RF from the root directory like all these different things just take a few seconds so so many companies big to small to whatever just a few extra seconds and you would have no longer seen that thing in the news about so and so being down this week so so I think we've talked about this there's no one size fits all again it depends on your needs your RPO, RTO, all that good stuff and what your capabilities are can you do snapshots do you have a terabyte or petabyte or 5 gigabytes of data depending how big, how fast your devices are you know your solution needs to fit you and again having those logical backups especially if you often have like devs or whatever hey you know I lost this data can you help me out I need a table or a particular database usually something small right that's less than a full backup do you have those even every once in a while do the logical backups you know like just squeeze them in somewhere separate replica whatever segue so don't do the backups on the source or master database don't do the backups on the server that's taking all the production workload changes have a replica if you can have a replica just for your backups but you can take it offline you can lock it you can do whatever you need to with it because for example let's say you're not doing those logical backups but all of a sudden you just need a particular table and it's small and you don't want to have to recover the full and play the bin lives and then do the dump so if you've got a replica just for backups stop the traffic to it stop taking any more writes grab whatever you need off it dump, do a logical dump, physical dump whatever you're not impacting your production environment that way you're not impacting your servers there's no downtime life is good so obviously you do need to lock it if you're still trying to do traffic on it hopefully you're not sending any writes to your replicas it's not what they're for you will cause problems and of course you can stop the slave thread to close things down temporarily excuse me but best practice please offsite backups again don't let those backups be on the same server sometimes you don't even want them in the same data center I know somebody that they used to be in a data center and they had a fire and the firemen were running into the data center with a hose that was dripping water like literally we have a picture of it we usually show it in all of our talks don't keep it in the same place put it somewhere else if you're using cloud that's awesome because but even then you might want to move it to a different region or zone whatever depending how paranoid you are backups be paranoid and you could even use a managed service for your storage if you need to like Iron Mountain I don't usually see the need it just depends again no one size fits all schedule your backups don't be that person that's like manually running the backups like once a day once a week cause you're gonna forget you're gonna get busy whatever schedule them there's a ton of scheduling tools anything from Cron that's built in the Linux to whatever billions of scheduling tools be lazy automate everything you possibly can right I love automation I like to be lazy and automation doesn't do human typos if you get it right and you tested the automation thoroughly it'll do the same thing over and over more or less unless of course something else happens disk failures stuff like that so is all of this really worth the effort like this is a lot can I just say well I do those backups weekly manually or or that kind of thing like it's a lot of work to do backups right but I hope if we've got to this point I hope I've already answered this question if not I've probably done a bad job but things happen right disks go bad data centers get fires devs, DBAs application bugs there's so many ways to lose your data and nowadays if a lot of companies nowadays if they lost all of their data you would put them out of business think about that if you're the DBA you're the systems engineer and all of a sudden you lost all the data from let's say your servers that may be all running on MySQL you could just put the company and yourself out of business so and backups are not high availability high availability think of like a car driving down the street right it's got four tires that's not high availability because it needs all those four tires to run but like let's say you've got run flat tires that's kind of high availability because the car is now in a decorated state but you can still drive it and like the lug nuts on the car there's usually multiple lug nuts on each tire that's also redundancy high availability kind of thing the spare tire on the trunk you know is is kind of like disaster recovery it's the spare something goes wrong we swap it out but backups by themselves are not high availability so today we covered some basics we covered types of backups different handful of tools the main ones I think some best practices and was it worth the effort does anybody have any questions it is absolutely that's an extremely good point malware is getting to be very bad right now I mean right now it does seem to be getting worse where some hackers encrypting your hard drive and then demanding a ransom a lot of big people hospitals, corporations here on the east coast we had one of our major utility providers have it happen to them and power on a major part of the east coast was it power or water gasoline our gasoline services were all locked up in several states of the US because that one company had their hard drives encrypted and somebody demanded a ransom backups that are offsite on an unmounted hard drive maybe it's in a bucket somewhere but it has to be somewhere else that's not always available on the server so if you're always backing up to a mounted volume unmount the volume and like move it elsewhere maybe even back up that volume to somewhere else but make sure sorry, sorry thank you Dave I hope everybody enjoyed the talk thank you and now? all right let's start who wants to hear about the boring topic of security all right good all right so let me give you a quick introduction 2006 at Maya's KL EB company behind Maya's KL and then work for Percona I joined Amazon Web Services as a database engineer two years ago and I switched to security doing some security work and I'm currently leading a database security team we're doing penetration testing we're trying to basically find as many security issues as we can before bad guys will find it and by the way I'm hiring security engineers if anyone wants to work at AWS doing penetration testing work and databases please come see me after the talk so what do we do we are running database as a service so we try to make it easier for people to start and operate relational database as a service to start and operate specifically Maya's KL we also have our own version Maya's KL compatible database Aurora Maya's KL all right so I usually do that at the beginning of the talk how many of you are running different versions of Maya's KL in production well everyone is probably in this room worked with Maya's KL that's not my point which specific version are you running so who actually runs Maya's KL 8 in production all right good 5.7 all right 5.5 5.1 5.0 4.0 3.23 not many people are running unsupported versions and no one in this room I'm kind of surprised they're running 3.23 right so I started working at Maya's KL with version I think it was 3.21 if I remember correctly but it definitely worked at 3.23 it was something around 2.000 so it's good that people are using the latest versions of Maya's KL because there are much less known security issues and it's actually bad for hackers so we call it bad actor a bad actor a black hacker trying to attack Maya's KL so here's the brief history of Maya's KL and security issues security features in Maya's KL so it all started at old version of 3.21 3.23 was the real Maya's KL version with many features and at that time we had authentication passwords and authorizations in a form of grants but that was pretty much it from the security point of view and then over the years lots of new features started popping up in Maya's KL databases 4.0 introduced encryption in flight between the client and database you can imagine there were no such option before and then better Maya's KL password hashing and then starting 5051 lots of new features started to appear in Maya's KL encryption better passwords and stuff like that in Maya's KL there has been lots of new features added but what is important I think is secure default database needs to be secure by default here's a quick outline of all the Maya's KL security features probably not all but interesting ones so encryption, we have encryption in flight we have encryption at rest we have transactions we have authentication with a new authentication protocol we have new authentication method pluggable authentication so you can even write your own if you want and I think one of the biggest things that has been missing in Maya's KL for a long long time is roles traditionally there were no roles in Maya's KL and roles appeared in Maya's KL but what do you guys think the most important thing security by in my opinion the most important is secure default if you are running database you want to basically make sure that it's secure by default and why it's important because recently we have this notification of some security company scan over the internet and found over 3.6 million Maya's KL servers exposed to the internet so that's basically insecure default so this thing probably have nothing to do with Maya's KL actually it's just a network configuration but because lots of insecure defaults over there in the wild it's it is still possible so here's our agenda but before we will talk about the agenda I want to talk a little bit about what I'm going to talk about so in this talk like this in a state of security in Maya's KL 2022 it's really hard to create a very good talk which is really awesome and exciting but at the same time complete so I was thinking to myself well there is this new exciting feature in Maya's KL 8 I should talk about that and then I ended up with a 500 plus slides for one hour that's not possible right and it's not exciting so what I decided to do is I decided to focus on those four topics each topic I will go into look at the security feature from a attacker perspective from a hacker perspective what a hacker bad guy can do with it and then after we understand what is possible we can see how the security feature applies and how it helps to prevent it so we'll talk about encryption authentication authorization and auditing so we are here right now and we will start with encryption and we will start with encryption in flight so for a long time the default again talking about secure default by default Maya's KL connection was not using SSL and that means that the data is connected to this connection over the network unencrypted and let's see what that means and what an attacker hacker can do so I have recorded this video and I will start this video this is a screen recording so basically I'm running this Maya's KL8 and you can see that SSL is not in use and then I'm trying to select from the credit card and on the right TCP dump running right so well it's on the left side from your perspective I'm running TCP dump and then with the TCP dump I can see the traffic and in this case an attacker has a black hat hacker has successfully intercepted my credit card number which is 1, 2, 3, 4, 4, 5, 6 so we have intercepted the traffic successfully so let's take a look then we have this number here we have this bad actor intercepted the traffic so what should we do? we should always use a cell for the encryption the good news is starting in Maya's KL57 we have SSL enabled by default the connection is established to Maya's KL being from the application layer or from the command line or from the workbench it's using SSL and how do you see that? you can run backslash s in your command line client and you can see that SSL is enabled so let's repeat that if you have an SSL enabled this is the version 5.7 but of course that all applies to 8.0 so let's repeat that test and see what an attacker can do again the example for Maya's KL workbench so here I'm doing the same thing but this time I have SSL enabled actually I tried to do the select star from CC no database selected but you already see that intercepted traffic is actually garbage it's encrypted and an attacker cannot directly see that so now I'm selecting from credit card number I can see the credit card number but an attacker will see this piece of garbage alright so basically we have successfully protected from men in the middle attack where an attacker can get the TCP dump listening and see the traffic so we have protected so always use SSL where you can configure Maya's KL users with SSL how you can do that the best thing to do is to add, require SSL when you create a user so that means that way we will ensure that the connection will always be encrypted but what if the connection that is trying to establish is not encrypted I can always do skip SSL right but then in this case even if I specify the right password the connection will not be established the connection will be refused to establish so that's our SSL, require SSL piece working we can also protect other things you can read about that I don't want to repeat the manual there is a configuration option for source and replica for Maya's KL replication there is between nodes that's a very good question there are secure benchmarks that you can run that will give you very generic things about how Maya's KL is configured there is a CIS benchmark which you can run to determine that specifically for Maya's KL and SSL you can just select from the Maya's KL users and see if the SSL enforcement flag is there there is a field there that you can check for other stuff you can also do some manual integration check so what else? encryption at rest encryption at disk this is also important let's imagine an attacker, a bad actor will be able to get to the database or be able to obtain a backup somehow what this person will be able to do and let's see very simple test very simple test I created on a Maya's KL server I created a table and I put my name there PII data so I put my name there and then I wanted to see if I simply do a grep for that piece of data will I see something I wrote here on the database and my data directory is this is where I am and I just do a recursive grep searching through all the files there and I can see that there are lots of matches here first of all the inadb the log file matches that means that my information my PII information is there the binary log matches the actual table space matches right and then finally double rate buffer matches so lots of information has been stored there unencrypted on disk again what that means is that if someone was able to obtain a backup or be able was able to get into the database survey and copy the files the information is there end of game so PII is stored on disk in clear text how do we protect we can use transparent database encryption transparent database encryption only protects from that specific scenario only encrypts data on disk the data in memory is not encrypted so if someone will be able to connect to MySQL will be able to connect to the MySQL directly through MySQL protocol and it doesn't apply you can read more online about the table space encryption and how to implement that so there are lots of options here you can encrypt the inadb files you can encrypt binary logs you can encrypt the regular logs you can encrypt TMP different versions have different security features I would suggest using MySQL 80 yes yes of course so what I want to do to demonstrate here is to encrypt everything I put this in my.synf file I encrypt everything and I want to repeat this when I encrypt everything I repeat this test my name again there and I do a recursive grab and now nothing is found because the data is encrypted so when I search for a specific stream it will not be there because the data is encrypted so in this particular case if an attacker will be able to get the data from disk the attacker will not be able to unless they have a key will be able to decrypt it there are lots of other things about transfer data base encryption I will not be able to talk about that anymore but important thing is if you need encryption on disk use that, go ahead that's a good question I had done a previous presentation with about encryption at rest you have the option to store the keys outside of the MySQL server on a shared drive you can store it in volts whatever but I just don't have time today to talk a lot about this so our next stop is authentication and when we talk about AUF we usually distinguish between AUF N and AUF Z so authentication is usually applied to connecting to the database username and password and authorization applies to what user can see what user can access and stuff like that so we will talk about username and password and connecting to MySQL in security authentication and accounting so again for MySQL we have authentication username and password authorization is usually grant and accounting is auditing so we will start with authentication and how MySQL stores the username and passwords and how an attacker can potentially take advantage of that so the basic MySQL AUF plugins are the MySQL Monteys original creation old password don't use it very old the newer implementation called MySQL native password also don't use it only use in some exceptional cases where your application is unable to use a neuron and the recent one which is SHA256 caching SHA256 which is available since MySQL 8 again let's look from the attacker perspective so let's say that we create two users with the same password and we use the original MySQL native password we will see here we will see the exact same hash what that means is that there is no salt involved here and this is actually important in a scenario where somehow two users which have password one and was able to get other users with the same password one it's actually also a specific tag which is called rainbow tables and you can use this tag to get the actual password from the hash and you can also simulate that if you do select password function that's actually being deprecated from MySQL 8 but in MySQL 5.7 you can do select password and specify the password you will see that the same password hash will be generated and actually this is the algorithm behind it we have run one twice so let's see how the newer algorithm works so if you do the same thing and you create two users with the same password you can see that the result hash of this new algorithm is actually different so that means that they are salted much better algorithm but let's see why do we want to use this and to answer that question we'll need to start how did a bad actor get a password from hash there are basically two options the first option is to brute force there are applications there are open source tools available one of them is hash cap in some cases if you are able to sniff the traffic and there is no SSL enabled then there is a specific attack unhacks attack that you can use you can read online it's very interesting you basically have the modified MySQL client if you're on the same server where MySQL is running you can use that client and basically use this hash that you have collected and authenticate but let's see what we can do there is a great article here at the corner blog published in 2020 about how to brute force the MySQL password from hash so I decided to give it a try and I spin up my instance easy to instance with GPU I use this p3 2x large instance with a GPU and I install this hash cap open source tool and I use the dictionary the word dictionary so basically the user that we have created has password one and the password one is obviously in this word dictionary and can be recovered in less than a second so obviously don't use password one as your root password but let's do some let's add some I run the hash cat actually this guy is me looking at this code like this so this is a very simple case of SQL injection where an input from the user passed into the SQL query without any sanitation and actually the strain is being created out of that so what what we can do with it what an attacker can do this so we can add to that strain we can actually add this we can use union here and add an additional SQL statement which will read directly from mysql.user table and then we can we can see that we grab the root password and now we know what to do with this we run the hash cat how do we usually configure it right we configure we have all and every privilege to the root user and we have a specific privilege for the web isolated in this case in this mysql.bit everything on the application user and this is the wrong way of doing this that basically means that we have no isolation anymore but even worse than that so if you have that user that have everything that has insert update delete select on start.start that user can select from the mysql.user get the password run the hash cat and reconnect to the database with more privilege but even worse you don't even need to do that so if you have grant insert update delete on start.start that means that you can actually go ahead and update the password for the root user admin user and that's it and you basically going from just one database access to everything else everyone else so this is not good for any isolation between the users and again this example is that you can also do other thing right you can actually change your own permissions your own grants and you can become the super user on the database so you don't even need to update the password or whatever so how do we fix that well the original way of fixing that was to get the percent sign so you can basically do the more isolation by specifying whatever prefix you have for the database at least that doesn't give the cross customer isolation but at least it will protect your users from accessing the mysql system database you know changing the permissions and stuff like that this is also not good but at least you can will not be able to create a database with this prefix but what if we don't have a prefix what if there is no distinct pattern in naming our database yes especially in the older versions you just insert into mysql.usr that's it if you have an access to the mysql system database you can do lots of different stuff if you have an access to the mysql system database it still works on it even though the mysql 8 have moved to the different table space but you have an insert and you will talk about 8 and how we can configure that so that this whole thing will not happen so it's very easy to make the mistake and this is a clear privilege escalation path inside the database if you have a global read write access you can update mysql user table you get new mysql privileges that you are not supposed to get so what was introduced to actually fix that there are three good options here partial revokes, roles and dynamic privileges so what we really want to do is we want to grant everything except in this particular case and it was not possible before and starting with mysql 8.0.16 you can do that you can grant globally except and we want to isolate our mysql system database to solve at least the case of mysql privilege escalation so if you try to do that before like in mysql 5.7 and you try to revoke all and you want to revoke the privilege that you have not granted before you will get an error you haven't specifically granted mysql privilege you cannot revoke it and in mysql 8.0.16 you can do that as long as you enable this new partial revokes variable so here I enable the partial revokes I grant all on my application user and then I can go ahead and revoke some of the privileges partially so here I revoked the access to mysql system database this is exactly what I want and I am trying to create a new database with an arbitrary name and then after that I am trying to update mysql.user and get my privilege escalation I am unable to do that because this user doesn't have access to mysql so this is called partial revokes another good thing in mysql 8.0.16 also related to security is mysql role mysql role is a collection, a name collection of privileges and why I started talking about this is that this is a very good way and good and easy way to maintain your users and to maintain this partial revokes so what I can do is I create a new role app role that is supposed to go to my application user and then in this role I will revoke everything from mysql database I basically want to protect mysql database and I want to protect on the role level and then I apply this role again I revoke from the role not from the user and then I apply this to my application user all the pre-existing grants on that user so I revoke all first from my application user and then I apply the role and because the role already have this partial revokes in it I will not be able to access mysql database and if I do a show grants for the app I actually see that it's granted the role and if I try to do the update on the mysql user database it will not be able to do that because the role protects me from that all right so for the authorization part we were able to used the new mysql 8 features roles and partial revokes to actually help with this local privilege escalation issue so finally talk about auditing very quick note on auditing why do we need audit log and there are different types of logs in mysql there is a regionally being general log which will store everything that is efficient will have binary log which will only store the changes coming into mysql and mysql audit log has been introduced in I think 5.7 and the Pekona server made it open source so why do we need audit log so we really want to find if there ever been an esql injection so in this case we enabled I have enabled the audit log this is the Pekona server this is using the xml format you can use other formats doesn't matter but what I'm seeing here is I'm seeing the esql injection which I should never see here and esql injection is here I can see the username and host but what is the most important part here any guesses exactly timestamp why do we need timestamp we need timestamp to be able to actually correlate that to other logs because at this point we know that this is coming from the application we know the host of the application but we don't know who was the actual user that introduced that esql injection so this is the unique part and this is why you really need audit log and not other type of logs in audit log you have timestamp you have the query and you also have username and password using those three things you can go back to your application server if you want to do a little forensics you go back to your application server what happened at that date and you will see the actual request if you will dig into your web server logs you may find at that date you may find something that will explain this esql injection and then you will be able to trace it back to the IP address of someone who did that yeah, assuming all the times in sync you can get a window of times you can also try to find this data somehow because you sort of have some piece of information that you need some fingerprints but yes, make sure that your time is in sync across the server I think that all I wanted to talk about this is my LinkedIn profile filter to connect and ask any questions and by the way did I say that we are hiring we are hiring security engineers see me after the talk if you are interested questions, correct yes correct, I haven't discussed lots of things I was thinking to myself there are lots of security features that came into my esql I I was thinking well I will run out of time so I had other presentations I had a presentation before at Pekona Life specifically about encryption encryption in flight encryption at rest how you do the column level encryption how do you store PHI and another presentation I did on what was that another talk in security just storing PHI, PHI information correct, they have been recorded it should be on Pekona website and the slides are available alright, any other questions alright, thank you very much hello let's give a couple of minutes more let's start well, this is welcome and thank you very much for coming here today today we are going to talk about a point in time recovery for my sql I made a joke about the presentation name instead of back to the future now we are going forward to the past and my name is Pepla I am a father of three kids and three cats and in my spare time I work as a consultant at Pekona you can find me at Pekla with a bit redundant I am currently located in Barcelona and I am Catalan you can if you speak Spanish if you want disasters happen so failure is not an option but it is a possibility even if we design our system taking care of almost everything there is always the human factor the bugs in the code the security issues we can have sql injection we can update a version that even not our code a product made by a third party that has a bug and this bug makes your database inconsistent on the circumstances or even there is a problem somebody has to push the code on Friday in the afternoon to fix a bug because the site is down and then there is a lot of pressure to do something quality assurance they are holidays the code looks fine but when you push the code you are merging code from a different thing and the database is running we have seen in the past people running deletes or updates without word clause and I have seen DBAs doing this and I have seen DBAs doing this by accident sometimes for example not database related but I removed completely one directory just because I was typed the remove command and I said I have to fix it and ask for confirmation and while I was deleting I pressed return instead of delete just in the middle of the removing I removed completely their name of the file and I left the directory and it was an accident but it happened and I am not talking about something that happened to somebody this happened to me so disasters happened how can we avoid or reduce the impact of a disaster one option is to make frequent backups so we make frequent backups we store the backup in a different location but the problem is that we can't make a backup always be making backups or in theory we can't use the backup software to do this kind of things because backups take time so even during the time of the backup does the risk of losing transactions so there's always a gap between backups we need a way to store all the changes that happen in the database and this is what point in time recovery is is restoring a database to a specific date and time and it's important the specific date and time is previously unknown you are not doing point in time recovery when you do a backup before pushing new code to the production environment or you are not doing point in time recovery if you for example make a backup before upgrading the version point in time recovery is done always when you don't know the point in time so what do we need to do point in time recovery we need three things the first one is a valid backup obviously it will be used as a base for the recovery the second one is a copy of all the executed transactions since the backup till the disaster and the third which is not easy at all is identify the transaction or transactions that led to the disaster in some cases the third point is not really needed for example if the disaster is I removed physically the database files I don't need to know which transaction caused the disaster I just need to restore all the transactions that I have okay there's no need to make the effort to identify the transaction because it's clear and why can't we do this the good thing is that MySQL provides not only but two mechanisms to store all the changes that take place in the database one is the INODB logs which are always active and the other one is the binary logs that can be enabled or disabled okay what are the INODB logs well the INODB logs are a mechanism to to speed up how transactions are applied in the database performing random reads random writes involves a lot of different operations so if you want to perform a lot of changes it's easier to pack those changes together in a sequential file and when you have time you apply those changes to the corresponding pages and also by doing this you can pack changes so for example if you do several changes in the same page if those changes are in the logs they can be applied together okay so you reduce the number of I operations by packing these operations so in the INODB logs you have everything that is written in the database okay this means that every change is first written in the INODB logs and then it's written in the table spaces it's written in each of the tables or other database files the problem the problem is that there is not any tool that allows us to use the INODB logs to perform pointing time recovery this is not completely true there is one tool that uses actually two tools that use the INODB logs to perform recovery these tools are the database itself when there's a crash the database the INODB logs and applies all the changes and this is recovery because we have an inconsistent database and we have the INODB logs that we used to make it consistent and the same with backup software like for example Perconxtra backup this software has one of the stages is applied logs the problem is that we don't have a method to permanently grab those INODB logs probably probably this will change in the future but currently we don't have this method the other alternative are the binary logs what are the binary logs? the binary logs are a sequential log of all the executed transactions here it's very important the concept of sequential this means that all the transactions that are executed on our database regardless of the order the transaction started are stored by the order the transaction was committed so we can have a transaction that starts maybe now but this transaction will not be written in the binary log until we commit this transaction even if this transaction needs one hour to complete and the sequentiality is very important because it allows us exactly to find the place where the disaster happened if the binary logs would be for example parallel logs it would be more complex but as they are sequential it's easier to identify and why do we have the INODB logs and the binary logs the main reason is the design of MySQL if you know MySQL has a two-liar design where you have the SQL engines the SQL liar and you have the engine liar the INODB logs belong to the engine liar while the binary logs belong to the MySQL to the SQL liar, sorry the good thing is that this means that the binary logs are engine independent you can use the binary logs for pointing time recovery even for different engines than INODB during this talk we will talk mostly about INODB but some of these ideas could be used for different engines too the binary logs are logical is a logical log, it's not physical and when I say it's a logical log what I mean is we store information about the logical structure of the database we don't store information about pages or blocks we store information about tables rows and database structures in general in the first versions the binary log was storing exactly the SQL statement that the transaction executed but in recent versions I think it started in 5.1 you can store row modifications I said can is run you must store the row modifications I'm not going to explain you why but it's more reliable using row than statement you can there are three formats statement mixed and row default used to be statement then it switched to mixed and in version 8 it's row there is a lot of discussion about the performance impact of row versus statement some people says that statement generates smaller binary logs than rows because for example if you run an update of 1 million rows and this is one liner in the binary log it will in a statement it's just update the whole table while in the row format you need to store all the modified rows and this is true in the case of modifying a lot of rows but for a lot of applications you are modifying just one row and what is more important sometimes you are modifying no rows and in the case of modifying no rows the binary log in row format writes nothing but in a statement it writes the statement so depending of your application and the patterns that your application uses it will generate more log using a statement than using a row format and this is not about I run a query that finds no rows to modify it also applies to queries that modify something that is already there so if I update a value that is one to one a statement will write this into the binary log and row will say I modified nothing so I write nothing and also a statement if the update statement needs to access one million rows to modify one row if we replicate this change we will need again to access one million rows but in the case of row format if you access one row you will write one sorry if you access one million rows but modify one row you will write just one row in the binary log and this will be replicated just as one single row so it's not really true that statement is for performance it depends on the case for some cases probably it will be in other cases it will not be and also remember that this is a sequential log so we write things sequentially in pages so probably we'll pack several changes together and the impact in performance will not be that big also we need a consistent backup we need to know the position of that backup in the binary log to avoid applying twice the same transactions what does this mean the transaction is written into the binary log before being committed it appears in the binary log depending on the value of a variable called sync bin log so the thing is that I can write into the binary log but if I don't flush maybe it's not written in the binary log if you want to have 100% durability you need to make sync bin log equals one but this depends also on the underlying storage if you have batteries these kind of things then you can relax well we need to know the position of the backup in the binary log to avoid applying twice the same transaction clearly we are going to restore a backup and we are going to apply the transactions since that backup until the disaster so we need to know the position of the backup because otherwise we can either apply after the backup and miss some transactions or apply transactions twice in this in one case we would have a corruption or an inconsistency and in the other case probably the pointy time recovery would fail just because we are trying to modify data twice how can we get the position of a backup almost all the tools that are able to make a consistent backup already provide this information in the case for example of Percon extra backup stores sorry there is a type of stores the position in a file called extra backup bin log info file MySQL DOM stores the information at the end run at the beginning of the DOM file well at the beginning of the DOM file when you use this option and my damper stores the position information in the metadata file so in any case if we use these tools and also the enterprise backup of Oracle stores this information we have the initial position for the recovery to GTAD or not to GTAD well there is no question what is a GTAD a GTAD is a global transaction identifier a global transaction identifier means it's an identifier for each and every transaction that happens in our database why is it global is it global because it's supposed to be unique for your database in all the universe okay in theory there shouldn't be any other databases out there generating the same GTADs that you how is this done there is a UUID of each database generated using the different factors so in theory your database has a unique identifier and then you have also a sequence of transactions UUID plus sequence is a GTAD in the past the positions in the database were expressed as a combination of file and the bin log position and the bin log position was something as simple as the number of bytes of the binary log file in that moment so if somebody says file 1 position 1000 this means file 1 and the 1000 bytes okay so by keeping track of all the executed GTADs an instant is able to get the position automatically here is the trick the trick is that as I have GTADs that identify any transaction I know all the changes that I have executed and I can if I want to do point in time recovery I know which changes I haven't executed yet okay so this is an example that I used for the presentation I created a table called point in time recovery quite simple I inserted 3 rows and after that I made a backup simple 2 I did not use anything complicated all databases, the events, routines in a single transaction to make sure that the backup is consistent and as I said before I store the position in the backup if we look at the contents of the backup here we will see that I'm going to move here we will see the GTAD executed and we also see the binary log position here it's the position and here it's the binary log how can we capture the binary logs there are different methods to capture the binary logs you can copy physically the binary logs just using the operating system command but the good thing is that my SQL bin log my SQL bin log utility to process the binary logs has an option to read the binary logs and store them locally so in theory we can take one database and from another server execute this command we read from a remote server we provide the IP address raw format because we don't want the binary logs to be processed at all we just read and write and here we are mimicking the replication process if you know in my SQL you can set up a replication that allows you to copy all the changes that happen in one database to another database what we are doing here is acting as a replica and to act as a replica we need to connect to the database I use this credentials extremely secure and the last option is stop never this means that I will be collecting the binary logs forever and the last one is the first binary log I am going to request beware because stop never does not mean forever stop never means as long as I am able to retrieve the binary logs for example if I stop the database my SQL bin log will stop and you need to restart it otherwise the process of copying binary logs will not continue if there is an incident in the network or something like that my SQL bin log will time out so if you are going to use my SQL bin log to grab the binary logs you need to monitor that it is running and if it fails you need to go to the last file and restart it with the last file to recover it it is the SQL engine it just waits it is a bit confusing but it is the SQL layer and the answer is quite simple the binary logs are generated regardless of the engine so if we use my ISAM or whatever engine we use my SQL binary logs will be generated so now the next step is what we made is the streaming the bin log it is written in the actual directory so I read the binary logs from that host and I store locally yes but this is not that strange this is something that my SQL already does for replication so there is a protocol that supports this and I have a replica I'll notify the replica it's not like have you something for me no, it's a communication between the server and in this case the pseudo replica and this communication if there's a new binary log entry there is a process in the server that notifies the replica and sends the binary log so it's not a permanent polling it's a bidirectional communication when I have a binary log I'll send the binary logs to the replica this does not bring it brings load but it's the number of CPUs you have but it's very few so the next step is we already made the backup we are capturing the binary logs and now I insert three additional rows and I truncate the table so this is simulating a disaster unfortunately disasters are not like this we need to be a little bit more complex but we will discuss this later so now we have what do we have? we have a backup we have the binary logs and we have a disaster actually we said we need three things because you don't need disasters they come what's missing we can try somehow the moment in time that the disaster happened how can we do this? there is a comment show binary log events it's a pity because I think I don't know if you can see it but in the bottom part of the screen it's well here it's if you make the effort you will see truncate table so this I'm not supposed to move because the recording but what we have here is the name of the binary log we have the position we have the event type which is really important in this case it's a query it's executing something but we also have write rows these write rows that are applied in the table scale 19 PITR are the inserts I made before okay and here we have the truncate so with show binary log events in this case we would retrieve the position of the disaster so everything is fine we are very happy the only problem is that usually your system is really very busy and your binary logs have maybe millions of rows this is not going to happen show binary logs show bin log events with 24 or 30 events you have millions of events okay so you have millions of events that's enough to drive you crazy trying to find the event so here the trick is use there's a comment called pager in my sequel that allows you to define the pager in theory you should use more or less but you can do some advanced things with the pager for example use grep to search things and then when we run the show bin log events it will search for the line that contains truncate table PITR and it will show one row before okay so in this case again we have the position two left columns and in the first row we have the GTID of that change so we can use these kind of filters to capture the position but if we return to the the previous screen and make again the effort we see that sometimes the change is the exact sequel comment is not here why it's not here it's not here because I said that the binary logs store rows but in the case of DDLs like truncate it stores the statement because at truncate it makes no sense to store so removing of rows it's easier to just store the DDL and this means that in the event it's easy to see but if the disaster instead of being generated by truncate it's generated by an update without word clause that updates all the amounts in the same amount then you will find events like the ones there you will find right rows events on that table so in this case show binary log events would not help that much you can still do things with this information for example if you remember I said that the position is the number of bytes so if we see that the distance between two events is very big this means that this event modified a lot of rows so we could try to process this a little bit and get this information the problem again is that probably this will return a ton of rows and you can't create a maybe somebody can but it's difficult to create a pager common with rep that compares the value of the previous and gives you the number of the distance between two events the other option we have is use again mySQL bin log mySQL bin log initially was created as a decoder of the binary logs so in the case of row format you can run mySQL dash B for verbos then base 64 output decode rows and this will dump all the commands that were executed in this case I wrote a small one-liner that does something similar to the previous grep that to give us the GTAD the position we have the position because we know the binary log and we have the position and then we have also the GTAD important this was a test and I had just one binary log file sometimes you don't know when the disaster happened and then maybe you have 300 or 3000 binary log files you need somehow to guess or otherwise you need to process all the binary logs there is no option one of the tricks is for example is searching for binary log files that are really very big because they are not split in the binary log so in theory the binary log files have a fixed size but if we have a huge transaction we will exit that size so if you look at your binary logs and all are one gigabyte and you have one binary log which is 10 gigabytes probably does the update the binary logs are sorted by a sequential and you have the date of the file obviously in the header of the binary log you have time stamps and you have a lot of stuff but if you have a lot of files having to open them and process them it's lower than just looking at the operating system well finding the position is not trivial why? we can have multiple similar events we can have imagine that we have every hour we run an update that updates 100,000 rows and one of these updates was wrong identifying the rows is not easy so finding exactly the offending even writing the command to grab can be complex sometimes the event has several lines it was written in several lines so when you grab you need to make a complex search because it's not an update of the table it's what you're looking for is in the world and this is the third line for example so the dml are not visible in the show of big log events if you have a lot of huge massive binary log files it can be really time consuming to process them sometimes you don't know the exact syntax of the query and something even worse maybe the disaster is not just one query it's a bunch of queries executed together it's one query that is retrieving data then modifying one table and another table and another table so identifying exactly the position can be complex one of the clues you can use as I said before is the position of the following event you can use the size of the files one it's important to understand it's not easy to make it something that can run by automation so this kind of recovery is even if you try to write a run book for pointing out recovery it's not easy to do it but there are some lies you say investigate search for okay the important thing is that you need to understand the process because when you do this that's a disaster the data loss probably your site is down business pressure you need to fix this quick that is the process and you know the process then you can speed up and reduce the time you need to fix the issue well what we are going to do now is the traditional way of doing pointing out recovery we restore the backup in this case it was a dump so I just execute the dump and if I check the contents of the table we see the first second after rows that I inserted before the backup then I applied events I use as I said before my SQL bin lock is a tool that decodes the binary locks and transforms them into sequences sentences that can be applied by the database so in this case what I do is my SQL bin lock that's just a position the position we retrieved before pipe my SQL this applies all the changes and if I run the query you will see that we have first the third row, pre-backup and the post-backup so I did pointing out recovery life is wonderful now yes we could not use the position and use the GTAB but this is something we will discuss later the problem with my SQL bin lock is that it applies even sequentially so imagine that we make a backup once a week because our database is really very big time to make backups that often and we have a ton of binary locks we make backups on Sunday night and somebody decides to break our database Saturday morning so we have a ton of binary locks to process if the if the process is sequential it can take days to load all the binary locks another problem is that we can't tell my SQL bin lock to stop on a GTAB we can tell to stop on a position but not on a GTAB ok so in this case we need to provide the binary lock file and the position in the file how can we make this faster well the trick here is that the database of my SQL already knows how to apply events in parallel because this was implemented to perform parallel replication so what we are going to do is a trick somehow the database is going to be able to apply those events in parallel everybody knows what replication in my SQL is it's just the binary locks that we've seen that are applied in another database so the same process we did here but it's done by my SQL itself allows us to I make a backup of another database I restore the backup it collects all the and apply all the and then we have two databases that are synchronized all the changes that are written in the source are applied on the replica we can make the process faster by using parallel replication to do this we have two options the first one which I will describe just is we create a dummy instance where the binary locks are stored so let's imagine that we write the binary locks on an storage server have a box that just has plenty of disk and the binary locks are written there I can do one thing is I just install my SQL there I start a dummy instance with the minimal requirements before starting the database I remove the binary if there's any binary lock from the dummy instance I move my binary locks to the location I generate a file that is the index of the binary locks and I start the instance and then I tell the database that I recover from the backup to become a replica from this instance it's funny because it will provide information about binary locks that not the binary locks but it works this method is very useful if you have multiple replicas that you need to rebuild okay you have a lot of replicas that you need to rebuild you restore the backup on all the replicas you apply the binary locks on all the replicas then you have all the replicas are at the same level and then you can rebuild your replication topology as you want and as we are using GTIDs it's very useful if there is another method to do this is we trick the server making it believe that there are really locks so instead of taking the locks and putting the locks on a dummy server what we do is we place the locks as if these locks were coming from a replication process consists in I connect to the source database and I ask for the binary lock and I store this binary lock in a file called relay lock and the good thing is that the format of the relay locks is identical to the format of the binary locks so we just need to copy the files and that's it so in this case what we do is here I'll describe the process as we want to run the process in parallel we change the replica parallel type to logical clock if you are using 5.7 change the word replica by a slave and the word source by master because there was a change and I'm using the syntax change the parallel type increase the number of workers then we copy the binary locks if you look we change the name to relay bin and choice the name of my computer it's the anchovy in Spanish we return the ownership to my SQL we generate the index of relay locks we have to do one thing there's one variable called server ID it's server in a replication environment has to have a different server ID as we are replicating binary locks that come from myself what I'm doing is temporarily I change my server ID after that I configure replication to anonymous existing host specifying the relay lock and the position one because I'm going to copy all the the relays I could use the required position in this case for this example it fit the position one because we are running gtad and this is what happens select star I have the three rows after the backup I start a replication until the gtad and all the missing rows are back after that we just clean replication, restore the server ID to the original value and we reset the transactions that were executed after the disaster usually if you look thanks that we have gtad if you look at most of the documentation they will tell you that you can exclude only the gtad of the offending transaction by doing different methods that are explained here but just do this if you are sure that you are creating a consistent database why because there are relationships between the content of your databases and there are processes that do stuff with the information you have so for example if you have a table that gives you the average amount for the invoices of the last hour during the time you had you experienced the disaster this data will be wrong so sometimes it's not a good idea to reapply everything even if this data is not exactly affected by the truncate or the update you need to know the relationships internally in the database well questions yes I think you need to be careful with what you do we are hiding and feel free to contact me if you do do do there we go oh man he just called the previous presentation amazing now there's no pressure that is loud is that loud ah I talk louder than pep does is there a guy back there messing with the you can try keep on talking I can do that I can absolutely keep on talking let's see present review and not now you can't hear me at all you messed it up oh Kenny too much I guess is it okay for everybody else if I talk like that is it still too loud she says it's okay what about the guys in the back good front row you're at a rock concert now okay hang on new window I rarely present with I just started moving everything over to google slides so I'm slightly learning oh wait doesn't this need to be how do I make that full screen that's not how to do that full screen is everybody okay with that all right back to back sessions right everybody loves back to back sessions okay let's talk query optimization in my sequel so a little bit of a 101 how many of you in here that's a stupid question how many of you in here are using my sequel stupid question I gotta stick to my slides I have a bad habit of going off slides and then I run out of time okay so query optimization 101 we're gonna talk about all this stuff I have 55 minutes like 63 slides so do the math of how we're gonna go through this this is actually a redacted training session so a lot of what is in this material is part of our standard training that we do at Percona which I'll get to in a second we'll talk really quickly about how query planning takes place in my sequel what are some of the common query mistakes that I have seen and that some of the other consultants at Percona have seen as well how do you find unoptimized queries in your mysql system things that you can do other than explain which is what we're gonna talk about in the first section and then what can you do going beyond what mysql gives you kinda at the basic level quick introduction how many of you actually have heard of Percona everybody in the room that slide is done that's great for those of you who don't know me my name is Matthew not Matt that's somebody else I've been with Percona I just did the math 9.67 years I am a senior architect and senior professional services department at Percona I specialize in mysql and training so that if you do want any type of mysql training also proxy sequel and group replication and Percona should be a cluster and if you want Mongo training or Postgres training we do all of that now particularly for me I've had mysql experience going back as far as 2005 2006-ish in everything from SAS to massive sharding to hosted environments and even dabbled in IP telecom early on in my I just graduated from college and I hear work on this system type stuff so familiar with a lot of things alright let's jump into query planning so you got a query in mysql obviously you want it to go fast right otherwise why are you why are you using it what's the process with mysql when you want to know what this query how is this query going to be executed you ask mysql what do you intend to do that starts with a keyword that's known as explain you're going to ask mysql here's my query what do you intend to do with this query how are you going to access and find that data mysql is going to give you an answer that answer could be right it could be wrong mysql doesn't tell you that mysql just says this is what I'm going to do it's up to you to interpret what mysql tells you whether or not that's a good thing hopefully I will show you what some of those bad things mean so that when you go see that you can say oh yeah Matthew said that's a bad thing we know we need to do something to make that query better like I said it all starts with this keyword explain obviously we don't have time to go through the entire user manual of what's in explain so go check it out if there's something in here there's a ton more like I said this training material is usually about three hours long uh oh I was just referencing the current version of mysql which is four years old now which we have all kind of been I've noticed that we all kind of been harping on that a lot but yeah somebody asked one of the previous talks like who's still using mysql 4 and I almost raised my hand and said I have a client with mysql 4 on windows NT and I was just kind of like ah that's not the but yes sadly enough we still have and then the ones I get forced to use 5.6 like no nothing's forcing you anyway alright let's take a look at this example hopefully everybody can see it apologize if you're in the back and you can't read my tiny font but I have to smush it all in one slide this says my example query says explain that's that keyword we prefix our select statement actually this works for inserts updates deletes and selects so explain am I query and then this is the output that the optimizer is going to tell you so this query actually goes to the optimizer and basically doesn't execute it okay so this is not actually executing the query this is just basically optimizer going in and saying how am I going to find these rows is there an index to use if so let's use it can I use it is it the better of multiple types of indexes that are available those are the choices I'll point out one red flag is this right here where this says type ALL that right there is if you take away anything and you didn't know anything take that away that's a full table scan that's going to read every single row in that table worst case scenario is it's all on disk and you have to read all those disk IOPS and put it into the memory right which table we're using which indexes were possible that we could have used which one did we use how long was that index approximately how many rows are we going to have to read and a whole bunch of other stuff yeah and if you want to know what the whole bunch of other stuff means you'll have to go read the manual or have me come out and I will go more into depth than our standard stuff alright so this is it we asked my SQL what do you intend to do my SQL gave us this and Matthew just said that means bad so what can we do well obviously we're going to have to add an index where are we going to add the index ideally we want to look for our equalities because that's what we're filtering our data on so let's add an index on that title column now one thing I didn't show you guys was the schema make up of that table okay so we're going to try and add our index and in this particular case this is a unique situation we're going to get this error message like I said I didn't show you guys the schema of the table but I can tell you that this column title is of type text and a text field does anybody know how many characters can fit into a text field nobody knows 65,535 right everybody knows their powers of two right so there's a problem because there are internal limitations with how long an index entry can be it's a thousand bytes so since this is a text field that could have up to 65,000 characters we can't that's obviously longer than a thousand bytes so this is a unique situation most of the time you probably have barcar 20s barcar 30s things like that and you have never run into this but if you have really long string columns you might have this a problem so how do we fix that in this particular case my sql has something that's known as built in prefix indexes so I can actually say take the first 30 characters of that column and index just those I've been told that that's a unique feature to my sql that nobody else really has they have the capabilities of doing that out of the databases but it's not as easy as this so yay all that stuff okay so now we've added our index so what I said earlier we asked my sql what do you intend to do my sql tells us we determine whether or not we like that answer or not we make some kind of change that change can be modifying the schema or that change could be modifying the query in some cases that could also be modifying the data and what I mean by that is by archiving instead of having 4 billion rows on the table now you archive off a bunch of old stuff and you now only have 1 million rows that's a change as well and the optimizer is going to treat that differently okay so now that we've made a change let's ask again same exact question what do you intend to do now now that I've made a change what do you intend to do this obviously looks for those of you who are nodding along looks a lot better okay this is no longer a full table scan this is now a referential lookup and referential lookup is inequality on a secondary index so that's good right so the amount of rows previously we were looking at 3.3 million rows approximately now we're looking at approximately 4 I keep saying approximately because this is based off of statistics that the engine provides not actual lookup values all good question the key length is 52 because this is a utf-8 column and a utf-8 column can be up to 3 bytes per character and then there's a so the question is here so this is the first 30 characters this is not the number of bytes this is to the first 30 characters and so because the column type is encoded as utf-8 that means it could be up to 3 bytes per character yeah oh I'm sorry my math is bad okay I see your problem now you're doing the math and going 3 times 3 is 90 not 52 so it's most likely because I copied this slide from a previous training and this number is wrong for the previous slide so thank you for calling me out on that but that's the question so the question is why is this 52 this is an older number because this slide this slide used to be title 50 I used in a previous iteration I updated it but I didn't update this slide so there he caught me okay so that's why that's 52 it's basically telling me the length what's the maximum particular index length I needed to go through so at some point this will need to look at 52 bytes so that's the entry to answer the question that's really only useful when you get into composite indexes so if you have multiple columns inside one index and you want to know how effective like are we using all of those pieces of that multi-column index you would need to do the math to determine am I using this one this one and that one by just doing the column doing the key length math does that make sense or did I just confuse you even more okay okay what are some other ways that we could access this table primary key so maybe I have this number cache somewhere already so primary key primary key this is a constant level access can't go anything faster than that you can't access an entity B table faster than primary key right like hopefully you're all familiar with like as well this is a wildcard expression inside my sequel we use the percent sign this is now a range scan so we've seen full table scans secondary index lookup primary key lookup this is now a range scan why is this a range scan that's a wildcard so we're looking at everything from BAMB blank to BAMB ZZZZZ right you've also probably seen betweens greater than less thans those are all ranges of data that we're looking for why are those ranges I go to my graphic we use B plus trees inside NODB they have limited depth I believe it's four if my memory serves me correctly my graphs are not to four this is my picture if you're unfamiliar with what a B tree is B trees have a point of entry and then you decide whether or not you're going to go left or right so in this case we're looking for everything that starts with BAMB so we ask ourselves is BAMB less than or greater than kill bill go over here if it's less than or greater than Casapunca if it's less than or greater than we reach these leaf nodes inside NODB this actually has our data these are just road signs along the way and then we store our data down here so we will scan all of the pages that are in between here and here so that's why we get a range scan we're looking at a range of data this type of query comes up very frequently from our support staff is this a range how many of you believe this is a range query how many of you believe this is a I'm sorry how many of you believe it's a full table scan it's like it's telling you it's a full table scan my question is why is this a full table scan you can't answer any more questions alright there you go so proceeding wild card so what is this saying when you think about go back to our tree what's this saying Matthew just said we have to hit a point and then do we go left or right yes we have to go both so that's why this ends up being a full table scan we go to the left and scan the left half we go to the right scan the right half we've effectively got a full table scan it comes up frequently this is a bad example but this usually comes up with email addresses so you want to say something like finding all of my at gmail users they say percent sign at gmail.com inside my users database and then they complain and they say well I have an index on the email column but it's still full table scanning this is why it has to do all that yeah this would always full table scan yes yeah if you said fred at percent and you had an index on it then yes you would be able to utilize the fred part and find all the freds and then find every wild card after fred yes yeah no you would use the index you can't use part of the index and full table scan alright yeah we answered that one okay this is another one that comes up okay so this one is find all the movies that start with Z the explain plan looks okay it's not full table scanning so yay it's using an index about 25,000 rows looks good I'm going to change one thing about this query I'm going to change one thing watch for it I changed one thing all I changed was a Z to a T and I went from using an index to a full table scan anybody have any idea nothing else changed he's got it right basically how many movies out there Z how many movies start with T right so this is where you get into how your data itself is critical to how the optimizer is going to choose your index there's a point where essentially the optimizer is trying to make a cost to the decision it's a cost based optimizer in my sequel it's trying to make this choice of which is more expensive more expensive in terms of disk IO memory CPU all that stuff what's more expensive I'm going to go to the index scan a bunch of stuff and then I have to go back to the primary B tree and grab all of the row data or should I just go ahead and grab all the row data and do the filtering somewhere else do I want to do two steps or do I just want to do one step but maybe reading a lot more information essentially what this means right here if I'm looking at a if the optimizer says you're going to look at a small amount of data the index is cheaper keep going keep going keep going you're going to be looking at so much more data in the index and we also have to go back to the main B tree to get the rest of the columns that you're looking for because I don't have ID title and I'm sorry I don't have ID and production here as part of my index the only thing that's part of my index is title so I have to scan the title index to find all of those values then go back to the main clustered index and find ID and production here those are not part of the index so it's a step process so at some point it becomes more expensive to do both of them than it is just a full table scan yes yeah if you only had if you only had select title then that would be what's known as a covering index and then all of the data that would be an index only scan effectively yeah depending on the size of your table I mean if you had a 10 megabyte table those probably all fit inside one not one page but yeah like 50 or 60 pages yes this is all of the rows in the table yes some of the rows you could have one page with every page in there to be a 16 kilobytes and so you just divide that many you have to find your row length and divide it by 16 and that's how many pages you have whoa back in the back first I'm going to tell you that this is not an exact science graph that I'm just illustrating that at some point that will happen it might be 10% for this table it might be 25% for this table it all depends on the size of the table the length the row length what indexes are available why not available there's a lot more to it than just a simple point you also can't move this either you can't on this example on my example data set yes that was the case but again if your data set your table are different then obviously that's going to change as well the advantage of forcing an index the disadvantage of forcing an index is that you will still do like the two step process so if the optimizer says it's cheaper for me to do a full table scan but you're saying no use this index it might take longer so it could be less memory less CPU but it'll take longer because you're having to do this two step process to go and find all the data there's usually a reason why you have to use force index and it might be because your stats are out of date so if you just did a big bolt data load into that table and the table hasn't had a chance to refresh the stats the optimizer might be choosing something that it doesn't know about so it doesn't know that you just inserted 5 gigs of data and so if you did a quick reanalysis of the tables indexes you might fix that problem without needing to use the force index alright so what do you take away from this data is critical so if you've got a development, QA environment try your darndest to convince the upper exec people or whomever that your dev environment should be a copy of your production environment for mainly this reason you don't want to be testing the t% case in your development environment push that code to production and have the z% sorry testing the z% and have the t% effect because your dev environment is only 10% of your production environment I understand that if you're in healthcare or someplace with PII information that you can't do that so there are some other tricks where you can actually copy the optimizer statistics out of your production system and implant them into your dev system and then freeze them and so you should get the same type of behavior that's advanced query manipulation stuff which we can help you out with if you want to know how to do that input values are also critical as well too often I see clients that are testing the z% case and they're not thinking about the t% case so the same situation this query worked great in our dev environment that we rolled it out and all hell broke loose because the first user who came along to the system typed t% and so make sure that you actually test not only your smoke test but make sure you're grabbing some real world values that are coming from your users and testing with those values alright good questions, a few more common query mistakes that we see unfortunately too often so now you can take a look at the schema for this table and then you can take a look at the query this one came out, this was David's query that came up like a month or two ago that was like oh we should throw that in the training so we've got an orders table, we've got order ID we've got an index on order ID and there's data in this table select star from order where order ID equals blah blah blah why did it full table scan somebody said it so how many of you believe that order ID looking at this is a number so if you look at the column type the column type here is a VARCAR but what the user has requested is an integer so internally the optimizer there's three warnings on this by the way so if we did a show warnings I've redacted some of the junk around it this is basically saying that the optimizer is having to convert the data stored inside the table as string to integer well if you convert data you can't use the index so modifying the query in this case to include the quotes turns that into a string and now we have string comparison so we can use the index so there's a very much got to be aware of that no dates should be inside strings as well because if you put a space in there with the timestamp no dates have to be in quotes as well no only the difference would be if you're doing a date time versus timestamp because date time is stored effectively as a string but timestamp gets converted to an integer so that but that shouldn't affect the function in there so that's a good one how about this one so I've got an index on production year select star and I'm calling this Mayday function so if you don't know what that function does it takes the year 1922 and this is going to create a date out of it January 1st 1922 it's all it's doing is that find everything where that's greater than a year ago and again full table scan not allowed to answer neither is pet Kenny's not in here, Kenny's not allowed to answer questions either anybody know, yeah go ahead no I said it right here, alter table movies add index impossible keys is null, yep that's right yeah, why I'm applying a function again on top of that column so the index for production year is the index of 1922, 1923, 1924 over and over and over the index is not therefore I can't use the index because essentially the optimizer is saying I have to go to each row and run this function on every single row and then compare do then do the comparison yeah, does that make sense because the optimizer is looking and says hey I got an index here but you're also asking me to run this function on top of that data all the time yeah the simple fix in this case is to move your essentially your constant value so I want to now compare what's in my database to something better and this still takes last year and takes the year function so now I'm actually comparing 1922 to 1923 so on and so forth so the rule of thumb is don't manipulate the data you're storing in your database to match your user input you match your user input to the data in your database if you feel like you have to manipulate the data in your database that means you're storing the data wrong yeah yes ah so it's not in here but it's in so in my sql 8 since it was brought up in my sql 8 they introduced well it 5.72 technically but in my sql 8 they introduced indexes on functions okay so in my sql 8 you can now say add index on make date yada yada so you can do that now in my sql 8 how is it implemented under the hood it's implemented as a virtual column which have existed since 5.7 so even back in 5.7 you could have altered this table and added a generated column of the make date function and then applied an index on top of that and you would have effectively done the same thing yeah good so far let's find some of these bad queries instead of your application and digging through logs and stuff how can my sql tell us my sql can tell us in a couple different places one is the sql query log that's existed since the beginning of my sql's existence I'm pretty sure sql query log is a file on disk you would need to use something like pt query digest from the prokona toolkit free open source tool written in pearl that will essentially read the log file parse it and aggregate all the statistics because the sql query log is just a never-ending linked list of the query and the stats behind executing that query how long did it take did it full table full table scan etc etc etc more if you're in something like rds or gcp and you don't have access to the disk performance schema is where they're moving more and more stuff to so there are specific tables inside the performance schema that you can query and because everybody loves the name of tables inside the performance schema events statements query digest history link foo foo whatever you might want to look at the sys schema which comes with every version of my sql since 5.6 there are some very simple views inside the sys schema that make looking at that data more easily if you don't want to do either of those just go install pmm and we grab all that data for you which can use either the sql query log or the performance schema to grab and aggregate all that data I'll show you guys a quick little screenshot here in one minute with using pmm I have not I've not seen any do you know of any the only I think the only because the performance schema essentially self like model it manages itself because performance schema will only keep a certain number of queries depending on which table that you're querying and the sql query log will go essentially forever and forever I've only seen more performance issues from my sql in terms of the sql query log if your sql query log grows unbounded then it's my sql that's having a problem so but if you use if you use pmm with the sql query log pmm will auto rotate your sql query log for you yeah yeah I don't think I don't think pmm needs needs in particular pmm needs that one but there's yeah there's a few you can't get if you turn on too much inside performance schema yeah you'll you'll start hurting yourself in precona server that's what I should ask who's using precona server excellent uh baria db uh cloud hosted like rds gcp and at some point I'm gonna ask the people who didn't on prem sorry that was the other question on prem okay all right some of you didn't raise your hand so now I'm curious as to what you're using I thought I got them all I thought I covered them all all right turning the sql query log on and off simple enough where's the sql query located right there this is the one you want long query time that determines whether or not a query is slow or not if a query takes longer than that amount of time it's not slow and therefore it's not logged okay so if you have you set that long query time to two and a query takes 2.1 seconds it won't get logged and you will never know of its existence okay long query time equals zero does not mean off it means zero which means any query that takes longer than zero seconds which is every query that means every query will get logged into the slow query log which is usually what you want you want to know about every query even the fast ones because sometimes even the fast ones may still can be optimized in some way this could be fractions too you can do it at 0.1 if you want to do something like that slow verbosity it's a feature of Percona server we have full so we have a lot more extended statistics inside the slow query log itself I'll look at an example of that we also do slow log sampling so you can actually limit rate limit so if you're scared of this if you're scared of setting this to zero and you've got 30,000 queries per second in your system you're going to go through a slow query log very rapidly but instead if you still want some good stuff you can change that to do either session sampling or query sampling so slow rate limit depends on what that one is so if you said slow rate limit equals 10 and you had session it basically logs all of the queries every 10th session yeah or if you had it set to query and then you said slow rate limit equals 100 you're going to sample 1% basically every 1% of queries that come through you'll take a sample of those downsides to those methods is you may miss some queries you know if you're lonely looking at every 100th query you might bad luck get the same query every time log slow store procedure statements feature Percona server if you have store procedures typically in community you can't see any of the select statements that are inside your store procedure but if you have this turned on we add those to the slow query log as well the last one is slow log filtering so in Percona server you can say I can put this to zero log everything but I want you to only log the ones that do full table scans which that's pretty cool this is actually horrible so you can pick and choose so I only want to see the ones that are full table scan comma temp table on disk that would be pretty neat this is what PMM looks like if you haven't seen PMM come to our booth we have PMM running a chaos box that you can twist and tweak a whole bunch of values and stuff and see it changed instantly on the graph so have fun with that one this is our query analysis we grab all that data we can aggregate it you can click on one of these and it shows you more stats down here 30 days retention by default so you can absolutely say what happened two weeks ago to that same query and watch as it changes in performance over time oh hey look there's a timer alright what else can we do basic explain we have explain format equals json this gives you the same information that regular explain does but it spits it out to you in a json string there's some more stuff in json it's usually all this cost info and all this extra stuff over here on the side but all of this over here is everything that you already get in a standard explain output why do you want to use this in theory you could use something like this in your CICD where you're actually checking queries as they get checked into your github repo or your gitlab repo and vetting them automatically because your tooling could access a dummy system run the explain output json parse it all and basically say if ref is not const or primary or whatever you know fail the merge request or something like that so you could really get deep into integrating yeah and then just check against that yeah I would say fail the merge if it's like a code push that I'm doing that has a new query me personally I would say full table scan if ref is all I would say fail the merge or I would say something maybe does this have see this is not going to have rows examined because you have to actually execute for rows examined if you had a rows examined if you get to analyze I would say like a rows examined versus rows ratio would be a bad ratio there there's a question now in the json explain plan there is like this one that's what it's telling you but this number I can't read that number and say this is what it means because that number has 50 different algorithm metrics behind it it's a cost based optimizer but there's like what go ahead it's a cost based optimizer that's taking into account of there's a page already in the buffer pool you can actually tweak those you can actually tell my sequel do I have ssd's or do I have spindle disks and that will actually modify and merge those as well which is weird the cost numbers only really mean when you're doing like optimizer tracing because we do an optimizer trace you can actually see all of the choices that the optimizer is doing and it's really just picking the cheapest one so you'd have to turn on cost you'd have to turn on optimizer tracing in order to see like why is it choosing this one the cheapest one and tracing will give you so much more information about the decisions that the optimizer is making yep and that score could change too the same query that score could change between now and five minutes from now so that's why it's not as that query will not always cost this amount how would you find a query that's low cost but highly repetitive meaning it executes frequently in your system but it's still low cost okay so you have a bug in your code that's causing the same fast query to go more times than it should be going I would say you would need PMM for that you'd have to be logging every single one of those executions and then aggregate all that and then you would throw an alarm in PMM that says if there's more than a number of queries per second or per minute or whatever you need to alarm us yep so one of the other tools that I briefly mentioned was PT query digest and so that one you would take your slow query log run a report against that and then run another report against your slow query log later and you would see again the same stuff that's obviously manual process if you had PMM it's just always up to date yeah cool yep top 10 the first page that you look at top 10 actually that says top 11 actually I don't know why that one says 11 but that's top 11 this is in the last 12 hours so in the last 12 hours these are the top 11 queries that ran across my entire environment so it's the load so that's a combination of how long an individual query took and how many times it took yep so you can actually click where this says load right here you can actually click that and change this column to filter on all the different stats that we collect for queries it's load and it's by default sorted by load but you can change that to query count query time change the window to the last hour the last 20 hours you can come over here and filter and say I want this cluster and only look at those data you can have I only look at my prod environment or I only look at staging environment yep all over the place yes it supports custom tags as well so when you add servers you can put custom tags as well and you would be able to filter on custom tags too and what you can do is add a z or availability zones not specifically to Amazon but you can put availability zones in there too explain format jason explain format tree just kind of gives you a more I don't know visualization of the nested join actions that are taking place usually when you look at explain they're all just kind of on that same level and you kind of well yeah I understand they're going in that order this is just giving you a little bit more space as matter that's not an artifact of my slides that's actually how the spacing with the little arrows means so this is basically saying we did a full table scan on title and then we did filtering and they were kind of you know merged together then we did a union up here between this result and that result excuse me we have the cost we got rows again those all estimates so it's just a little bit a different way of looking at data there's more data to it too yeah hmm they probably should be I'll put them somewhere I don't know where but I'll put them somewhere they didn't ask me or say I needed to submit them so I will sure I will put them somewhere you probably on my google drive somewhere they're on there I'm serving this live from my google drive so if you wanted I can share them with you I don't have a link or slide share or anything to put them yeah okay the new one which I recently started using and actually used it very well as cannon fodder against my client to show him how bad his queries were it's awesome is explain analyze this came out in 8018 so it's fairly recent explain analyze actually executes the query and gives you what the tree output does but it gives you a little bit more how many rows exactly an actual cost time it's like preparing time and then actual execution time I don't know I've never tried it against a write thanks I didn't try it against a write I would imagine it's going to do something similar to how standard explain does where if it's like an update it's going to rewrite it as a select and do that still yeah I haven't tried it as a write what version was that if you did an explain on an update and actually execute it really I did not know that because I've done that many times on 5.7 I know that's not an issue on 5.7 hopefully nobody here is running start over here on the left because I wasn't paying attention the first number so this is like what they say is like trying to think of how to I believe it's the time that it took to basically prepare and then it was the total execution time I know I'm not explaining that correctly and I honestly will have to go back and look on that one because these two numbers don't make sense this makes sense here that it took that amount of time to prepare and that amount of time to execute but I have to check on the other one for you so my apologies for that understanding yes this information is already inside the slow query log well let me take that back that's not correct so like the slow query log is not going to contain the fact that you took a step to unionize two things together and it took a step to do this so on and so forth the slow query log is not going to have that information so therefore PMM would not have that information though but PMM will have the query in it so obviously the query will still show up and you would be able to click on the button that says show me an example of that query and then you could copy it and paste it and go in you can run and explain live from inside PMM but we don't have not expanded it to be explain analyze yet I don't know there would be a lot of overhead I know in Precona server you can turn on query profiling for the slow query log but that would be a little extra overhead but that's still not the same thing as what we're seeing here yeah no old version of mySQL not 5.7 not 8.0 that is a bug from an old version of mySQL to explain analyze yes it will execute the update yes I would say that if you wanted to explain analyze an update statement that you yourself could just turn it into a select because an update is essentially a read and then update because we have to find the row first so if you just took your update set whatever whatever where something you could just turn that into a select where something and that's at least you've got stage one out of that update handled and then pass that as just the updating and the actual writing part yeah do a select yeah unless you're doing an explain analyze so if we do an explain analyze on an update that will execute because we know an analyze has to execute it you could yeah no you'd have to write a tool for that no pt query digest you can connect pt query digest to a running mySQL server and it will run the explain for you but it doesn't support explain analyze yet you could because the because pt query digest is written in Perl you could edit the Perl and just where find where it executes the explain and just put format equals analyze in front of it and you'll you'll get that out just change the code oh it's not gonna know those though yeah yeah it's it's a little hacky you could but it's a little hacky all right going beyond explain so explain doesn't really give you explain itself doesn't give you post a net post execution analysis unless you're doing explain analyze what are some other things that we can do to find out really what did that query do okay the first one that I want to explain is our handler stats so contrary to popular belief in ODB is still technically a plugin to my SQL it's just now a required plugin into my SQL it still behaves as a pluggable engine whenever you write a query and it's sent to the optimizer the optimizer is turning your SQL into a series of API calls those API calls are what go down to an engine whether using an ODB my rocks or whatever engines just implement API function calls so when you say something like I want to select from this table the optimizer says oh there's an index I'm going to call this function on the engine and say hey mr. engine find this data using this index the optimizer decides the index to use the engine does not make that decision so you can count well my SQL is already doing this for you but you can actually see the counters every time the optimizer has to call one of those particular API calls down to the engine to fetch data these are both session and global so you can do flush status and reset the counters for your session execute some query look at the counters and you can say handler read random next is all numbers and everything else is zero that means that we did a full table scan because this is random data we don't like random data very nicely these are also automatically grabbed by PMM it's usually the first graph I go to when I'm doing analysis on a client's machine is I go look at these globally and right off the bat I can say on average you have more queries that don't use indexes than do use indexes you have a query problem not a hardware problem in most cases yeah usually if that read random next is the highest graph line on that we got a query problem let's fix your queries fixing queries is so much cheaper than adding CPU or adding RAM you can also do profiling if an Oracle engineer wants to tell me why you're deprecating this you know I'll fight you this is so easy but if you do set profiling even on 8 it says deprecated like no so easy the performance schema way is so clutchy and cumbersome this is easy turn it on run your query look at the profile done profiling is all the different stages that your query took during its execution checking permissions opening the table optimizing it oh lovely creating temporary table sorting sorting converting heap to my isam this is old anybody know what that means the gentleman laughing what's it mean well no what's it really mean so it's 5.7 yeah so this is a 5.7 server so anybody know what converting heap to my isam might mean let me take a guess so think of this one and now think of that one go ahead so this is a temporary table in memory and then this is a temporary table on disk that's what that is now 10th of a second now on a system that I jumped on and did this on that took the majority of this time so if you see created converting heap to my isam and it takes 76 seconds okay either A the table is giant and has to be spilled over to temporary on disk or in my case the sand that they were mounted on was not configured correctly and I got to use this as cannon fodder to their sand engineer who swore to me that I had the best performing disks mounted on the database server he had a misconfigured I saw this and then I took sysbench and I did some basic filo test and I said dude this sand can't write faster than 50 megabytes a second you can't tell me that I have the best this EMC multi-million dollar sand and he was like an hour later I get a message and he's like yeah okay so I didn't have this thing figured correctly try it again and boop magic look at that not everything every query will not go through there are certainly some stages missing on here so if your query doesn't do a temporary table on disk you're not going to see that stage the other one that I got once was in the checking permissions I got a checking permissions that took 10 seconds to check permissions that should be really fast turns out that that client was using host name based user accounts and every time that you logged into my SQL it has to do a reverse DNS check their DNS internal DNS was having issues slow queries who would have thunk it but profiling worked out great they are but I was still just like running slow and it's just like yeah mm-hmm you could if you are just a regular app user you can profile your own queries pretty sure you can profile your own queries there's nothing sensitive in it that should prevent you from doing that none it's just for your it's for your session only it's a per session so I would turn it on for my session then any queries that I ran would be profiled and the profile would be recorded or saved in memory and then go run a gajillion queries you're going to be fine no I don't think you can I don't think you can turn profiling on globally I wouldn't want to either I think a prokona server actually you can in prokona server you can turn on profile and it will be in the slow query log I would not do that though that would be some big overhead maybe you are a dev environment where nobody cares but not production 54 minutes sweet if you have questions I am extremely active in our forums and our forums are extremely active themselves it's free community support by people like me and pep and the other people at prokona so if you've got questions about MySQL, Maria Mongo, Postgres you want to know what the next hot movie it no I don't think we actually we actually do have an uncategorized category so you know but whatever is there a scaled place for me to upload them to okay there will be a place for it I will do that if those want the slides I will do what he says not at all all built using open source software and our stuff as well everything that we do is on github basically everything we do is open source it's grafano with a bunch of stuff that we've added on to it to make it better don't go anywhere talk ssh right now and go do it if you are in a amazon or on-prem what are you on your laptop what are you running your production system on you don't have a production system okay never mind if you're in amazon we have an AMI that's published you can just one click deploy but it's usually a docker container just go it's like four lines copy paste copy paste out of our docker I use our documentation in my classes to point out the fact that it's copy paste copy paste copy paste you got PMM running so very simple it's an agent so you install the agent on your MySQL server they communicate back and forth so on and so forth if you don't like agents like this guy is shaking his head we do do remote as well so if you have like you have your PMM server and if you have like an amazon rds or a gcp or basically a MySQL server anywhere in the world that has internet access you can click add remote instance and it'll connect and it'll start scraping the data doesn't scale as well and you miss out on all the cpu disk memory all that stuff is rudimentary yes you know how many people have ever commented on my Btree design you are literally the only person in the 14 years I've been doing MySQL that has commented on my Btree design and that's you win the award for that but most of the time people don't even know what a Btree is and so my rudimentary the first time I said it I got I got slammed by someone who said actually there are B plus trees because I was originally just trying to be high level and talk and it was like they're B trees and da da da da and actually they're B plus trees great now I went and I updated my slides they are B plus trees and then yes but you are correct yes you're correct now I have to go update my slides and I have to go pull up drawio and draw new trees yes cool I'll be down at the booth I should be at our booth if you go downstairs we're in the booth right next to MySQL guys if you got questions come ask us that's why we're here we're hiring if you know MySQL Postgres or Mongo come talk to us as well we're a work from home remote company so come talk who's up next you pep pep's gonna go and he'll be right back hello hello no my glasses are these are for medium distance actually I need these glasses to see the screen but then I can see and I should use the pregnancy once but I don't like well let's start another backup talks today I need some MySQL backup MySQL backup strategies and methods my name is Pepla I'm father of three kids and three cats and in my spare time I'm a co-children per conner and this talk is dedicated to my parents for obvious reasons well why your backup strategy is probably wrong what's a strategy strategy is the skill of making or carrying out plans to achieve a goal so backups are not a goal you don't get up in the morning and say my goal for today is making a backup a goal is to protect your data a goal is to ensure that your business is is up and running a goal even is to achieve achieve legal compliance or some sort of regulation but backups per se are not a goal and if they are not a goal they are not subject of a strategy you should focus on something that is a goal and a goal is for example recovery when something happens you should focus on recovery because this is a real problem having a backup is not a problem actually I used to say that if 99% of your backups fail but you succeed one recovery that's fine if 99% of your backups work but you fail one recovery you have a problem okay this is an edge case obviously so you should have a recovery strategy and also interesting the difference for me that I'm not English native the difference between recover and restore is very interesting because usually when we focus on backups and restore we focus on returning something or someone to an earlier good condition so I have a backup one week ago I restore the backup that's all well this is not true depending on your business you will restore the data but your business will not recover because you have one week of data loss so the really important thing is to recover not restore the data and the problem is that the amount of different things that can happen is is huge you can have a disaster you can have a criminal attack you can have a nature disaster employees that are not happy or are simply crazy harder failures wars major nature disasters so and you can have even combinations of these kind of things or even situations you can't plan in advance so absolute protection is impossible okay and if you try you have to pay a lot of money so the first thing you need to do is to define the set of incidents of the types of incidents you want to protect yourself for example those which are more possible those that bring higher risk or just a combination of cost versus risk so you are willing to assume that the server can crash and the data loss is not bad for you to make backups some this happens for example on on the development environments or test environments you don't make backups but it can also happen on old data environments used for reporting well if you lost that data you can't do the reporting but that's something you can live with and then you have to design the backup restore availability methodologies to ensure that you are able to recover from those different types of incidents and well backups we can have different types of backups and at a base usually consist in two types of files on one side we have the db files which contain the data and on the other side we have log files which are usually used for to store data temporarily and are used for consistency this is drawing of that model in the files writes are sequential and all the transactions are written and in the db files the writes are random if we want to make a callback app what we do is somehow freeze writes and then we copy everything usually the traditional method to do callback to stop the database but you could perform a full callback app without stopping the database as long as you make sure that everything has been flushed into disk hot backups copy the data files ignoring the changes as we have the data files and we have the log files that contain the changes what we do is we copy everything without trying to be consistent and then with the log entries we make this copy of the data files we turn it into something consistent okay and there's something I like to call the warmback app warmback app a warmback a callback app a warmback app is a callback app without closing the database or closing it for a short period of time for example snapshots are a warmback app because you don't close the database but you need recovery okay and there's also one method that can be ugly but can be very effective if you have to make a backup of a huge database is using for example rsync with the database open and you transfer the huge files that are never modified via rsync with the database open and then you close the database and transfer the rest of the files okay well what's on the technology used for this snapshot probably it was a snapshot in virtualization or GCP yes sometimes this fails because it's not an storage level snapshot and then it's not it's a virtual storage level snapshot so the problem here is that you need to make sure that different physical storage are a snapshot at the same time so it's the same example that you have for example I don't like snapshots because if somebody writes something for whatever reason in a different file system then the snapshot is not consistent and this happens very often in the BMWare things like that that these snapshots are made on data stores and for whatever reason the server is using more storage then you create another file on a data store the operating system sees those two files as a single disk for example or is able to be a LBM to merge them into the same file system but BMWare has two files and creating a consistent snapshot of two files placed in two different data stores is complex another option sometimes people tries to avoid making backups by using replication so I have no problem if something fails I can just promote one of the replicas but this is protecting us from a reduced set of incidents and it provides a false sense of protection if somebody drops one table it will be replicated immediately okay so if we don't want to to replicate it immediately we can use delayed replicas how much delay because sometimes things happen on Friday and people realize on Monday so two days and if somebody executed an update and corrupted data so the amount as I said the amount of incidents is really very high so we often the delayed replica feels like a security and we have logical backups and physical backups what logical backup is usually is defined as generating the SQL code needed to rebuild the database but this is no longer true originally logical backups in MySQL were performed by MySQL DOM that generated a SQL file that you could run on the database immediately but this is not happening anymore with advanced tools like MyDumper or the MySQL shell and DOM utilities what these utilities do is they use the same methods to use the data than MySQL DOM this is queries but this data is stored on intermediate files that do not contain SQL statements so usually you have metadata plus the content of the database so I prefer to define logical backup logical and restore operations that execute SQL code to extract the data in the case of the backup and SQL code to rebuild and load the data in the case of the restore logical backups usually are small in size in the case of MySQL they benefit from the MySQL approach to the multi versions due to the fact that you do not require logs in other technologies logical backups require a lock at the database level but due to the fact that repeatable read is implemented as a snapshot at the database level in MySQL you can use repeatable read as a method to create an image of the database the problem with logical backups is that rebuilding or restoring the backup can take really a long time because you need to reinsert all the data and you need to recreate all the data structures index and so are really good for partial recovery it's very easy to recover just one table usually you have a readable human readable copy of the data which is very good for example if you need to make a partial recovery but not in the sense of recovering one table but in the sense of recovering a bunch of rows from one table in MySQL the impact is really low so just one query and the last thing is that you need a leaving instance to recover so you need to install MySQL and you need to start MySQL and create MySQL instance to recover the logical backup between the logical tools the first one is the good and old MySQL dump is just one thread just one file it's very easy to stream MySQL dump using netcat so you can do MySQL dump, pipe, netcat send to another server and then apply it the same time you can use it also to dump only the table definitions and there are options which are not really used very often but are interesting but again for example generate the limited data using dashed up you can dump the schema the schema and the data or if you dump the limited data only you just dump the data and recovery can be very lengthy MySQL pump oracle into to have parallel dump it has some interesting features for example MySQL dump dumps or used to dump the user table instead MySQL pump dumps the user creation and it has some interesting features it supports compression it supports advanced object filtering you can define patterns for filtering but the problem is that it is not valid for creating a replica because it does not return the position so in theory it's consistent but yes MyDumper is an open source tool they are two tools MyDumper and MyLoader it's actively maintained by the community and I say actively because it's a project that has been sleeping for some time but now it's quite active and some interesting features have been added recently it allows you to perform parallel backup it allows you to perform parallel restore and it allows you to make a consistent backup it you make filter the tables and the objects you are going to process using both compatible regular expressions it generates multiple files and it also allows you to stream the backup the way the backup is streamed is quite interesting because what MyDumper does is it performs the backup in parallel locally generates the files locally and then there's a thread that takes every file and sends it to the destination so there's an additional thread that takes care of the streaming yes but the stream is always single-thread because usually it's the network yes but in the case of MyDumper there's a specific thread for the stream MySQL shell it has a set of utilities to dump low tables, schemas and instances in the case of the export it supports multiple formats this means that the data is exported and you can define the limiters and things like that it also supports things like rate limits to avoid loading too much the server it also supports storing in S3 compatible storage it can make consistent backup and the utilities are you have low dump to load a dump and then you can dump an instance you can dump schemas, you can dump tables and you can export that is the one that generates multiple dialects TSB is tab-separated values physical backups perform file or block-based operations to do the backup and restore I divide the physical backups in two types the ones that have awareness of the database and the ones who don't have awareness of the database structure so the snapshots are the main hot backup physical hot backup that exists but as we said before they are not aware of the database they perform the snapshot the same way and for example they don't check if all the files belong to the same snapshot something you have to program separately and also if for whatever reason you are forced to have files into different storage types then probably creating a snapshot is something impossible or at least very difficult but how you make sure that you are freezing in the same moment of time moment of time the two disks you flash one and then flash the other they are not consistent anymore it's like copying the and between the physical hot backup tools we have mySQL Enterprise backup and we have the clone plugin what does awareness mean its database page has lsn this is the last sequence number it just tell us the number of bytes written into the the the I was going to say the binary lsn it's into the inodb lsn so it's quite simple it's a number that it always grows and it is used to identify if a page or a block is newer or older than another if we are aware we can make an inconsistent backup and turn it into consistent and for each lock entry that we have we can compare the database blocks and say this lock entry needs to be applied or not why do we need this we need this because in the database lots of changes take place at the same time so we have concurrency, lots of transactions with different lengths are executed in parallel what happens when there's a crash when there's a crash we have a bunch of transactions that were committed and some transactions that were not committed and we also have information stored in the lock files so with all this information we are able to perform the recovery operation and rollback the yellow transactions that did not complete so this is the situation when we boot after a crash and I was explaining this because when we perform a backup it's basically the same we finish the backup in one moment of time so instead of having the crash here we could say backup end and when we restore the backup what we do is we remove the partial transactions the extra backup I like to call it the swiss army knife because you may do almost everything you can perform a full backup you can perform an incremental you can perform a partial restore you can encrypt your backups you can make backups to a cloud you can compress the backups it's portable you can make a backup from one operating system and move to another operating system as long as they are little endian or big endian both the same so for example I've been using extra backup to restore backups on windows even if there's no version of Percona extra backup for windows because you need to do some tricks but you can do it you can do it because the I know the defaults are compatible you can do it as long as you have a Linux machine where you can see the volume the window of volume you can you can stream backups it's compatible with Oracle MySQL and Percona server for MySQL it used to be compatible with MariaDB but that's no longer true and it's free open source to make a backup so the following steps first we make the backup after that we prepare the backup preparing the backup is what I showed yesterday is applying all the changes and then rolling back all the transactions that were cut in the middle of the end of the backup and then we can restore the backup whatever we want next tool is MySQL Enterprise backup it's a licensed software it's multiplatform it supports Unix Linux and windows it also supports backup to tape using Oracle technology and it also supports and this is really very interesting binary and relay lock backup so when you perform a backup you can also copy the binary logs and the relay logs so this also can be done with my damper you usually you do it when you need to restore you could do it after the backup you can run the prepare yes but you can do a partial prepare because this way you can do incrementals in the prepare stage you do two things you do the apply logs and then you do the rollback of the pending transactions okay there's an option that is just apply and don't do the rollback okay backup so you can still apply them okay the way iNodeB logs are being treated is changing so for example now the new versions are already able to dump the iNodeB logs so the dam that extra backup did originally is no longer in it you can start the backup and use the dam provided by my SQL this is a new when you make a backup using extra backup usually what you do is you copy the data files the iNodeB logs this is no longer in it why? because you can tell my SQL to do it to copy the iNodeB logs somewhere this is a new feature so it allows you to so you just need to this is why I said before that this is changing and probably this means that in the future we will have sort of base replication because otherwise it makes sense to implement this well it makes sense for the enterprise backup but if you want to create an iNodeB real iNodeB cluster then you use these two for replicating stuff okay the last but at least is the clone plugin it's mostly used to reveal instances of replication but it also allows local cloning to directory don't try this abroad test and decline your recovery procedures disasters always happen in the worst moment and when there's a disaster you need time to decide you don't have time to do research or learn how to restore and also something very important if the crisis is huge you need to involve business because if the backup will take three days to restore probably you need to decide the order you restore and you need to make some compromises that are beyond the job of a DBA I like I enjoy the backup sound like the shredding a cat because a backup can suffer of quantum superposition it was completed successfully and you can restore it in a relevant time what does this mean maybe it takes too long to restore if you made the backup using bicycle dump or it's an incremental backup and you say I make a full backup every month and I do I do this for an hour restoring this will take a long time for sure or there's something missing in the backup or there's a back in the backup software and this happens unfortunately quite often with the new policy of MySQL of implementing future changes in minor versions we've seen that actually we were forced to add a feature that does not allow you to perform a backup on a version of MySQL higher than an extra backup it has to be you can say go ahead and make the backup but as a default it checks the version and it does because we discovered one feature that made backups complete successfully but they were not able to be restored due to one change that MySQL added or something as stupid as the infrastructure is not how I want so sometimes you make backups but you don't have a support contract and your on-premise crashes and then the provider says that it will need three weeks to deliver server like this questions I've been quite quick I was manager at a company which was quite successful but they were quite they had problems with treasury and the money and they needed to work every day and they could not stop for more than three days if they stop for more than three days they can't return to the business because they don't have enough cash so and it was a large company with maybe 400 400 million euros per year but it was like because they had very low margins and then for them being able to recover quickly was critical not because it's LBM who does the snapshot so LBM is able to control the snapshot because LBM you have LBM is quite similar to other companies you have the files but you have a lock so it's able to perform a snapshot using different devices what I'm saying is if you are using for example a snapshot on storage and over that storage you have LBM and this storage actually are two network storages so how you make a snapshot and make sure that this snapshot is good or two sands on the same LBM volume and sometimes this happens with things like VMware where you have data stores and also very often the providers try to move the snapshot to the lower level so you think you are making a snapshot on VMware but there's a driver that moves the snapshot to okay so as snapshots you have to test so that actually are consistent on the paper it's safe we've seen in RDS I think it was in RDS I don't remember it was RDS or Aurora an instance running on X3 okay the underlying file system was X3 the underlying file system and this brought a lot of performance issues so with those providers the problem is that you have no visibility you have what they say this means that usually snapshots are reliable but you don't know if the underlying storage is 100% reliable I'm talking about for example EC2 instances a different thing is RDS if they say the snapshot is consistent it should be consistent because they are selling this as a consistent snapshot but it failed probably if you use a replica it's easy otherwise why are you using a snapshot I mean yes but why are you using a snapshot instead of using Percon extra backup it depends on the parallelism and how fast are your disks I don't have a number for GCP but I've seen terabyte size backups running really really very fast an hour, sort of one hour I mean I'm not saying that the snapshots are not good what I'm saying is that the snapshots um yes if it's a snapshot provided by it's not a snapshot provided for database I'm saying if it's cloud SQL then it should work because they have methods to make sure that everything is flushed and when everything is flushed it's really flushed because and this can happen the file system driver for your storage tells you that it's flushing and it's not flushing it can happen that it ignores the sync or the file system driver thank you very much hello, hello, hello you hear it? is there delay? too loud, too quiet perfect, thank you not a lot of people yet yes, it seems so and it's Friday night 4 so I'll start in a minute or so I don't have a mouse clicker unless so maybe some questions that I have for you in the meantime so you can interrupt me at any time, right? if you have questions but maybe I have some questions for you so who's using MySQL in this room? okay, some people are not okay, and from those who are using MySQL who's using replication? MySQL replication okay, most of everybody and then from the people who are using replication who's using replica set cluster or cluster set no, that's Galera that's Galera, okay, but that's good to know so have you heard about it or are you using it? but are you using inodb cluster then? okay, so good we have a user are you using inodb cluster cluster set? so that's what the talk is about today so have you heard about group replication and extra db cluster from Prokona? good, good oh no thank you so need to change and I need to be quick okay, so good, so some people have heard about it, are you using it? so who, good so we're a small group so really if you have just feel free and talk, right? I am a person with a lot of kind of slides and I'm skipping some of them already because of audience which is fine but first before I want to start so I'm going to start talking about database architectures and I'm one of the product managers at MySQL and one of my products is MySQL replication, high availability middleware products so that's where I'm a product manager of at Oracle and the database architectures that I'm talking about is mainly high availability and disaster recovery solutions so everything that I'll talk about and it's not very clear I'm not trying to sell you anything, I'm just trying to show you some products that we have that are open source, right? so whatever I'll talk about is freely available everybody can use it, try it out, break it whatever, right? but before we get to that I want to give a little recap on business requirements, right? because it's important when I talk about different MySQL database architectures is that we kind of say like how do you choose which architecture what is good for whom or what business, right? and the two concepts I suppose most of us are familiar with this is a recovery time objective and recovery point objective so RTO recovery time objective is how long does it take to recover from a certain failure and recovery point objective is how much data are we or can we lose when a failure occurs, right? so that's important we have this lined up here so we have time, there's a failure so the RTO is how long can you afford to be down and then the RTO is how much of that data will you lose when a certain failure happens so this is important question to ask your business, your application what it's important to know this before you can go and design and choose a certain database architecture for anything really now the type of failure is there's a couple of different types of failure that we have and to kind of summarize we have three we have high availability and that is within a single kind of region or data center however you want to see it a single server failing so let's say we want high availability so we want no data loss RTO 0 when a single server fails or when a network partition happens then yeah that's high availability disaster recovery is when a region fails how long does it how much time do you have so what's the RTO for activating another data center or like is it seconds minutes, hours and then we have a third kind of failure is human error like little like drop database by accident or an application bug how much, how fast can we recover from that there's a couple more different types of failures but yeah human error maybe let's say that's where you have to recover from a backup do point in time recovery something like that now important to know is that when I I've done a lot of like consulting and like architecture design and when I ask the business how and this that means I'm talking too much about one slide when I ask okay what's your RTO RPO it's always like uptime needs to be 100% so no RTO 0 almost RTO 0 no data can be lost but it's important to look into the context of things because I think when we design something we shouldn't we I think it's fair to say we can achieve quite relatively easily an RPO of 0 in high availability within one region but if you say I want no data loss when a disaster happens when a whole data center goes out things are getting a little bit more complicated and I hope you're not using the internet for doing all your communication because the internet is not really a necessarily a stable environment right so it's a it's important different RTO RPO depending on the different type of failure and of course it's very important that the MySQL or we're talking about MySQL here or the database is just a part of it right the network also has to be stable or the network also has to be designed to sustain that I have had customers ask I want no data loss automatic failover but my network often goes out I have or I'm using the internet for doing wide area network and I want automatic failover between regions well you're gonna have a lot of issues with that so the network also has to kind of support these claims right so these are the concepts and I'll explain a little bit about the different types of architectures that we kind of that I kind of recommend or solutions that we have built in MySQL and then we'll tie them on RTO RPO and what they do what they not do right but before I go that and that's why I kind of ask these questions is what it's what is it with the how do people deploy and I'm sorry I'm gonna put in the power here because I think this will help yep it won't go out as fast now so it's what is important is that we talk about the past and what people are doing so one person said I'm using you know DB cluster but I feel a lot of people are still using MySQL replication or maybe Prokona XRDB cluster or Galera based solution which I think is good but if we talk about the past and I see many companies even bigger companies that just deploy MySQL is a lot of the work that a DBA has to do is all about setting up and automating the database architecture you have a primary and maybe you want two secondaries well in the asynchronous world or in let's say more than five years ago like 2016 or before and you use MySQL then you would have to set up primary then restore a backup of the secondary configure replication you have to add a user you have to kind of set the GT IDs correctly things like that so there's a lot of work to be done and what MySQL did was they gave you technical pieces like okay you have replication there's a way for you to kind of take a snapshot of the database there's pieces that we give but we never fully integrated it and that's what we have changed since 2016 when our first product in a DBA cluster came out we started integrating this all to really make it easy because there's a lot of work DBAs have to do and for example my partner is a DBA as well and a lot of her work that she does MySQL DBA I met her at a MySQL conference by the way interesting and a lot of the work that they do is automating this setup and making it work and make it their own but I think a lot of companies and there are special companies out of there that use MySQL a lot of companies kind of need to solve the same solution the same problem and if we talk about high availability and disaster recovery it's about RTO RPO basically like we have some common needs that companies have there might be different but they're going to be all in the kind of the same ballpark I'm not talking about read scale out things here in this slide deck we're talking about HA and DR specifically here so in 2016 MySQL released something called MySQL InnoDB cluster and that is a basically a name it's just I wouldn't say it's a marketing name but it's the name of the product and that is just a set of using a set of different technologies making something and by that I mean we have MySQL group replication that is being used and I'll get into that more MySQL shell so you have the MySQL client but since MySQL 5.7 we also have something called the MySQL shell and then we have MySQL router which is kind of a load balancer that can load balance traffic reads writes reads and so on so these all these products together together with the MySQL server as well like we fully integrate this and we make it very easy to set up an architecture like this a standardized architecture one important thing and there's a lot of features that we added to MySQL 8.0 which went GA 4 years ago so it's nothing new is InnoDB clone which is a way for example to when we add a secondary in the past you had to provision it you had to like restore a backup and so on now with InnoDB clone and you don't have you can use the feature of standalone yourself if you don't want to use the integrated solutions that I'll be talking about is that it would basically allow an empty server here and you just say clone from this there and you do MySQL authentication MySQL connections are opened it sends a physical snapshot of the data to the secondary restarts and it's up and running and you can configure replication and you're up and running so this is very important because InnoDB cluster and the other solutions I'll talk about they all are run from MySQL shell and MySQL shell runs somewhere remote and it says okay you're a cluster add a member add another member and it does everything for you for router you just say bootstrap I want a bootstrap connect to this cluster and that's all you have to do so really it only takes a minute or two to set up an environment like this of course it takes longer if you have two terabytes of data to copy I mean it's just for an empty setup it's just super easy and otherwise it's the time it takes to take a snapshot and copy everything over so what we have here with InnoDB cluster is an RPO of zero so no data loss in the case of one server failing FYI we have three nodes here you can go up to nine nodes with group replication no data loss if one of those fails if more than one fails at the same time then we do not guarantee no data loss right and then the RTOS seconds and by seconds it's configurable but the minimum is five seconds so you can have it fail over after five seconds of being a node unreachable or network partitioned and so on I can give a whole hour multiple hours of talking around this but I'll try to be brief because there's more to be coming right this is this is only the beginning so 2016 we released this we have a lot of users a lot of our customers are using InnoDB cluster but then we found some other need like maybe the user so the first thought was okay we have group replication I'll get more into detail about group replication and that's easy to set up and that's high availability built in but not everybody needs this but what if as a user you don't need automatic fail over maybe your network is not as stable but you still want the same ease of use well that's why we have InnoDB replica set and what is that it's basically asynchronous replication configuring so the shell configures everything does the provisioning configuring of the replication adding the users and the same router integration so basically the same commands you type but you just say change cluster into replica set and you have basically an asynchronous replication setup you can add as much read replicas or secondaries to this as you want there's no limit question or just here okay okay no question okay no problem so that's 2020 that's when we released this what we released was it last year end of last year beginning of this year is InnoDB cluster set and this is just a very frequent request that we have from our users we had InnoDB cluster which really works fine but we want disaster recovery give us disaster recovery so a lot of our users know the time before InnoDB cluster so what they do is kind of like okay let's just set up another InnoDB cluster and manually configure replication between them trust me that won't work why because this is an integrated solution there's a lot of automation happening by shell and it won't work as you expect it to be because like a metadata schema here the integration will fail but a lot of users did set it up despite all these drawbacks that they had to take care of so that's we took some time we took quite a lot of time to make it work we introduced InnoDB cluster set and basically it's InnoDB cluster multiple InnoDB clusters so you can say region in one in Rome here and in Brussels you have a cluster in each and then the asynchronous replication happens from one cluster to the other and also like instead of three minutes it takes five minutes to set this up again very simple fully integrated all automatic what we then provide with this is and of course in replica set maybe I forgot to mention this there is data loss because it's asynchronous not semi sync we're not doing semi sync here if you want semi sync you're in InnoDB cluster territory but asynchronous so data loss is okay and failover is manual this is replica set is like a basic version why don't we do automatic failover here is that the only way to do automatic failover for us is to have a monitoring node make decisions like orchestrator or my school MHA there's been several tools open source tools available that do this but we don't want to do that we do not want to make a monitoring tool that makes decisions does failover why because often false positives happen with this monitoring issues if there's a monitoring host has issues then it might make the wrong decision and so on so that's why we have InnoDB cluster it does have the automatic failover it's basically using a PAXOS based algorithm it's PAXOS but it's some variant to it so it does automatic leader election does the failover it guarantees that the data is replicated to the majority before it acknowledges so there's a lot of features built in that make this not need a monitoring host so this is a lot better solution so what we did with InnoDB cluster set is that we still have high availability with RPO0 and RTO of seconds within a region but then we have a manual failover with some possible data loss but this is asynchronous replication so we're talking about if the network bandwidth is not limited and the servers are not overloaded we're talking about milliseconds data loss usually it's not going to be a lot of data loss if any so this architecture very important it might look easy but we had to do a lot of features for this one is we have router and router is everywhere and router knows about the whole topology router knows about it router can connect to both topologies depending on the setting that you use and I'll show you more what the kind of things that we have built in into this now the good thing about this is that when you have Rome and Brussels it's asynchronous meaning that there's no right performance impact if you write a transaction here you don't have to have consensus or make sure that another node from the other region has to acknowledge it so you have the same performance with a regular NLDB cluster but you have a disaster recovery solution yes so why can't we do automatic failover between regions again for that we would need the third region and the one that kind of like witnesses and then makes a decision the kind of one that so there's a couple of ways to implement it so if you want automatic failover with this we can't because we don't have this monitoring host in another region that kind of decides who's the primary we also this is group replication stays local we do not in group replication have the concept of an arbitrator at this moment so something that you're using actually because the arbitrator is a possibility but the problem with the arbitrator is that it needs to receive and takes part in every transaction it receives all the data and that might you need a real data center with low latency as well because it will be impacted if you want automatic failover I'll show you later you just deploy a single NLDB cluster across multiple three regions then you'll have automatic failover yes because it's asynchronous it's not yeah we need this monitoring host to make that decision that's why it's not automatic oh follow up sure it's hard coded is there a reason there's there's just concern from engineering that if you do more than nine nodes that the communication overhead is too much so we haven't had users ask for more and I think if you like I don't I don't necessarily see the use case and maybe we can talk later of having more than nine nodes because usually you want to do this when you have kind of like re-scale out or something you can add asynchronous nodes to a cluster so and there's a lot of actually nice features that would help you and I can show you that later on that make that possible yes sure no problem okay so in NLDB cluster and I think that's my next slide we can do you can try to all nodes but there are some limitations foreign keys in MySQL are implemented in NLDB layer which happens under like in storage engine level so cascading foreign keys are not known to the binary log which is used to kind of we don't have the information the upper levels in MySQL don't have the information available about foreign keys so cascading foreign keys are a problem and DDLs are still a problem in multi primary you can't drop a column on this node you can actually and it will replicate but if you do a write and use that column it might break so there are limitations and we need to do quite a lot of refactoring in MySQL to actually solve it properly yeah yes but note that every member has the full dataset and needs to apply every transaction already so but I understand yeah in cluster set we even do not allow multi primary mode it is a single primary you cannot enable multi primary in a cluster set so we're gonna have only one primary and all the rest is gonna be read only even this one is primary it's gonna be read only so this is some of the features we implemented if you do a clean failover here we guarantee no split brain because we control that this is constantly read only so if we do a failover manually we make this one read only this one the primary and we do it in the right order so we can guarantee like no split brain will happen which is what some older products kind of did not do very well they would have the potential for split brain things like that okay but I'll talk more about InnoDB cluster now if that's okay so InnoDB cluster as I mentioned it has four different components so we have MySQL servers we have group replication we have router or routers and then we have MySQL shell and I'll elaborate on each of them a little bit and again I'm staying very high level here because I have 49 slides and I'm only at slide 14 and I skipped the first five but anyway it can go quickly soon okay so the goals of InnoDB cluster is we have MySQL and we have so many products in MySQL but this is about bringing all those features together we have InnoDB cluster and that drives the features that we need wherever it is right is it in the router and the shell group replication they drive okay this has to work and they all have to work well together we integrate it all together and we test this all together so we have a QAT that does testing on this environment right so it's really good to have right if we before that we didn't really have that we did test replication we test our technologies but we never tested the full solution because it wasn't a product right so that's important easy to use so it's very simple to use and I'll show you a little bit how the commands look like it's JavaScript or Python mode it's quite simple and then if we talk about group replication so what group replication provides you fault tolerance automatic fail over as Alan asked we can have active active anywhere but with limits and I mentioned those limits with foreign keys and DDLs it's automatic membership so adding and removing members is automatic provisioning happens automatic a physical snapshot or like an asynchronous recovery that's all part of this shell doesn't do anything there shell doesn't copy any data shell just initiates add the member and all the rest happens automatically network partitions failures all handled automatically within the bounds of course of what type of network partition it is we have conflict detection resolution so if you write to multiple at the same time and they conflict somehow we have basically first committer wins system so the first one to commit will win and the other one will roll back that's some part of the protocol so we prevent data loss in this case so RPO zero no data loss when a member fails so the difference between kind of the Galera based solutions and group application is in Galera if you have three members each member has to acknowledge every that the transaction was received before the the member that executed the transaction will send back to the application okay we're good to go the commit has succeeded in group replication the majority is enough so if this member is kind of slow the transaction will still continue because there's majority has received it so that's a little subtle difference in group application versus Galera this also means that if I have one member in another region you're not gonna have that overhead where we don't it's gonna the consensus will happen local and in kind of a Galera solution it has to kind of get an acknowledgement from the other side so if it's 200 milliseconds every commit really seconds to it in group application not sometimes it does it doesn't there's like a mode single primary mode by the way it's there's some I'm not gonna go into detail I can talk about it more later where you can enable that the downside of course is with Galera you know that the other side has received it but in group application you're not sure because one node there majority can be reached here in one region so if this region dies there's no guarantee the other side has received everything but yeah anyway I drift off sometimes sorry so group application so it's it's total order so rights are ordered it's basically one bin lock sequential that's what kind of what ends up with right yes question yeah yeah the majority will continue and if the majority does not have the primary it will get the primary and the right role will enabled and the minority will go into isolation like it will wait it will wait until the network partition is resolved yes three and two yeah yes correct okay and the other two are isolated as well already okay yeah so let's say so we have five and then the two here two of these go away then the majority will between those three because that's more than 50% these two will isolate themselves like we'll go into read only and not accept any right transaction if the one isolated it was the primary a new primary will be elected among the majority one of the majority members you can configure wait right and you can prefer this one if the others are in another location okay that's still the same thing but when they become the primary like they were chosen to be the primary they will you can have a setting that applies the queue before it activates so it will take a bit longer and you can also disable that setting so there's two modes either wait or not wait yeah transactions to apply yes the queue to backlog to apply everything yes correct that's the group group consistency group application consistency level before on failover there's like a setting there okay so we have a half an hour left so group application writes are ordered it's XCOM which is a kind of a Paxos implementation it's more based on Paxos Mencius which is another variant of Paxos we have configurable consistency guarantees similar to kind of what Galera has implemented differently so you can have eventual consistency meaning that if you read from a secondary you're not guaranteed to see the absolute latest but if you want you can do it in several ways you can kind of configure read or write consistency levels depending on your desire it is using MySQL replication framework so the same team that develops MySQL replication also develops group replication it's the same engineers well it's different teams but they all know about it and it's using binary logs gtid's everything that you use an asynchronous replication group replication loses as well it's been GA since 5.7 so 2016 end of 2016 I believe it became GA and yeah we supported on all our platforms so whatever platform we support group replication is supported as well including Windows not that that matters often but okay who's running running on Windows nobody okay yeah but we do have it we do have some users we actually have some big users using Windows but yeah it's important okay so no data loss rpo0 I've mentioned that splitbrain is prevented because of the way it works with the quorum calculation I'm not going to go into much detail automatic failover automatic partition handling you can do kind of read scale out with it so you can add more members and you can send read traffic to them all works there's automatic handling of replication lag using flow control there's like 20-30 settings of it to control that if a secondary is lagging behind what do we need to do the primary will start slowing down the throughput that's what we can configure how much it delays all of this you can configure that with those flow control settings the consistency levels I briefly mentioned you can do active active but you kind of need to be careful on you know the cascading foreign keys are a problem doing a lot of DDLs will pose a problem so that's important now good about this solution is optimistic locking so the right performance is not too bad I think we can we have environments that do like 20,000 transactions per second so that's actually right transactions per second in group application I think that's kind of the highest we've seen so far in low latency networks it doesn't really matter so we've seen that so that's good and the added latency because of the way it is implemented we only add one roundtrip it's not like a two-phase commit that we need to do we don't need to like prepare all nodes okay everybody prepared commit so that's two messages we have to send back and forth to the nodes this is not done like that it's basically you send the traffic and the nodes say so that's one roundtrip added and that is enough for group application to know okay the majority has received it I can continue we then have a certification that happens to guarantee that there's no conflict so that happens without communication between nodes also because every member has the full data set during a transaction no extra communication happens it's only at commit of the transaction that any network communication is happening so you're staying local within a transaction you'll get inodb performance just like you're used to with a single or asynchronous replication okay so briefly a slide on mysql router this is our integrated routing solution at this moment there's actually new features that we've released we released some features last quarter it's basically you configure your application to connect to the router and the router will figure out where to go and at this moment the router is a layer 4 load balancer so it does not do transparent read write splitting there is a port for reads and there's a port for writes so be aware of that and router will know reads I need to use the secondaries writes I need to go to the primary that's based on port it also has tls offload so you can do the tls ssl connections happen locally if you install router on the application servers and then the server we have connection reuse even now recently where the connections remain open and then we just reuse another connection authentication never happens by router router always proxies the authentication to the server so if you're familiar with proxiesql proxiesql does the authentication your user management has to happen in proxiesql with router no this is as lightweight as it can get just very small binary it will proxy the authentication to a server that does the authentication sends it back and that's it it will never authenticate but we have this kind of connection reuse so we do a local ssl because ssl is a lot of back and forth so that adds latency you can do that locally very fast and then we reuse another kind of open ssl connection that we already have and that way you can kind of reduce latency router we have users that have hundreds of routers per group application cluster or per inodb cluster why is that? well we have like features so mysql has the mysql port 3306 that's the mysql protocol but we also have the x protocol on port 33060 and in one of the features of the x protocol apart from docstore and kind of that's our no sql api and our regular sql api is also notices and what we have there is that when the topology changes it sends a notification to all the routers who have a connection open and they don't have to pull every half a second for the status they don't have to monitor the status of the group they'll be notified so that's why we support like hundreds of routers on a single group we don't need to do health checking that often because it would overload the database servers so there's like a lot of little features there I wish our documentation was a little bit better so I hope we can work on that because router has quite a lot of features but I think we need more how tos written so one of the I think I'll give a presentation soon at another conference I think cloud world about all these use cases all the things that router can do also router works for replica set cluster and then cluster set it doesn't matter it's the same router so that's also very easy and then we have shell so shell also was released in 5.7 so you have javascript python and sql so the default when you log in it's javascript and you can write your own javascript plugins you can actually write plugins that do all kinds of things so very very easy to use it does this doc store our document store it supports just like our connectors do as well so it does both protocols the mysql protocol and the x protocol and maybe I'll have an example here of how does it look like I don't have a demo here today because I'm yeah I still have 25 minutes but that's not enough to do a demo if I wanted to give all the other 20 something slides so here's how you connect you start shell and it's in javascript mode you connect to user admin on mysql one and you say I want a cluster object and that's dba. create cluster that's how you create a cluster a one node cluster by default they were error out because the configuration isn't applied yet so you're connected remotely on a server and it will say oh the setting this been binary lock setting is not good or this or that or server ID isn't set yet well dba it will configure instance on admin at mysql two or one so let's say we want to add member two it will configure it it will say okay all these settings are we need to change you want me to apply that yes you want me to restart the database if it's not dynamic yes so mysql eight you can change my set persist set persist only there's a restart as a command so you can restart the database so we can do with everything we can do remotely from a machine so if you have containers which is kind of common nowadays you can just connect to the mysql you don't have to get a shell somewhere on a container to execute commands everything happens remotely over that mysql connections and you can configure it set any variable restart and so on it will do that all for you let's say you want to add the instance so we have the cluster object that we created here and it's like a one node cluster and we do cluster that add instance mysql two it will say okay I have no data you want me to provision it I can do either binary logs or I need to clone it what do you want if it knows that there's not enough binary logs available it will say you need to clone it so you can do clone it does all the cloning for you and it adds the member to the cluster if you then want to add mysql three it's the same command add instance mysql three but if you then want to configure router this is only thing you need to do mysql router so on the wherever you want to install router as user mysql router that's the shell that's the kind of Linux or Unix user you want to run as bootstrap that's all you need to do well you you say the host name right mysql one for example and then you start it and that's all you need to do when it bootstraps it connects there's a metadata schema it knows based on the metadata schema we should have mysql one mysql two and router generates a config and then updates kind of a state file as it runs it monitors that metadata to see if there's any new members whatsoever what's the status of these members then you can get the cluster status so it's one of the commands there's many and then it gives you a JSON output with the status so we have the primary as mysql one it's in read write mysql one and then we have two read only members here mysql two and mysql three you can do like extended status you can remove instances there's all kind of operations that you can do but that's basically how you deploy a cluster and if you want to deploy a cluster set which I'll talk about later it's just basically cluster dot create cluster set another member and there you have it you have another cluster so it's almost the same command to run so replica set one slide only on that it's less frequently used but still we have users for it is same integration but it's asynchronous it still does the cloning and the user handling and the configuring of replication and it's manual failover but also shell provides all the functionalities to do a manual failover even in cluster you can say set primary you can change the primary here you can also say set primary oh I want this to be the primary because I need to do some maintenance here sure you can do that shell will automate all this for you if you want to make it automatic it's up to your responsibility you can just write a wrapper that does health checking and then sets the set primary or force primary so we have set primary which is a switch over like a clean changing of primary and then we have force primary which is a failover that we do which of course can create some split brain but it will tell you all that and then you have the opportunity to kind of do a stonet or whatever you want to integrate on how you want to use this okay any questions so far now we go to cluster set I'm just talking all the time and nobody else is so I see people nodding so that's good a few people are falling asleep but that's also fine you know no questions now this is one of kind of the latest development right is cluster set so we talked about inodb cluster we have cluster set now so the same exact ease of use as with the commands how to set it up and what so on you can have one cluster but you can also have multiple clusters so let's say we the primary cluster is in Rome we have one in Brussels while you can have one in Lisbon you can have as many as you want there's no limit actually we didn't define a limit like there's nine nodes per inodb cluster there's no limit on the amount of cluster set clusters but I think there's be practical limit or some kind right you also are not required to have every cluster to have three nodes you can add a third data center with a single node if you want you won't have high availability in Lisbon but you can if you want nobody's gonna prevent it it just works okay simple here let's say we have this environment this is okay a little bit of questions sure ask the question first I'll explain it I have a couple slides but I'll explain it here already because I kind of have to put some context on that so let's say we have two data centers so Rome and Brussels the primary is in Rome we have four routers two and each we have a target mode it's a bit hard to see I need to change the font here you have different target modes important to know we're nobody has asked yet because I get this question a lot can I do bi-directional between clusters and can I do it active-active between clusters the answer is no why it's asynchronous we cannot guarantee that there's not gonna be any conflicts so we are not gonna make a product that is gonna make you lose data or have a split brain so either we will have a fully supported solution that will guarantee that there's no split brain or just go and deploy your own group application and whatever you want to do but not in our solution right this is you're laughing but a lot of people are doing this right so many people and I have to talk a lot of people out of this yes and you too right at your work so okay router knows so the default way router works is target primary which means it will send writes which is red and then reads in green it will send it to the primary cluster but you can also change the target mode this one is in target mode Brussels this one is in target mode Rome which means that I made it a reporting application because it only does reads but if you send writes to this one writes will go to this one because the primary is in the primary cluster so Rome is the primary cluster it has the read write node here in target mode Rome if we send it right here it will go here and it will send it here now it's a little bit of a decision we have to make if this one is in target Brussels if a write happens here the write will not continue because there is no writer here this is the current situation you cannot say target Brussels but send the writes here we can add that to the feature of the code but it would delay us the release of this product so I was like we were like we need to release this as is and not delay it and one of the things important for this is this router is so lightweight just run a router with a read only and run another router with a read write mode and then you can have it as target primary and then send your write traffic to that one and that one will follow the primary and then this one in target Brussels will send the reads always local so it's because router is kind of lightweight now this router will know within seconds that a machine has shifted let's say this became the primary router will detect it within seconds what else is important here oh you can change target modes and the features that we're doing now to router as I mentioned is lightweight we do not want to do authentication there we do not want to kind of we want to keep it lightweight and the way you configure router for these modes and other settings that we're having is you do it through shell you just say this router needs to change its target mode right now what happens is shell updates the data here router will know oh my target mode has changed and it will change the way it behaves so what we avoid trying to do is we don't want you to have to change a config here we also have a rest API to check the health status of every router but knowing that routers can be deployed everywhere we don't want shell to have to connect and send an API request no we can all do this centralized because there's a database hey look we can use the database it's highly available the data is there so we use this we we heavily rely on this to just update its config and router just learns about the changes it's a very neat small simple design but it works so it makes it really really possible and flexible oh two questions yep yes this is fully using GT IDs oh good good question so the question is what about GT IDs so GT IDs global transaction IDs I think was introduced in Moscow 56 or is it 57 I don't know it used to be that if you configure replication you had to configure bit log file position like you change master it was then now it's change replication source is a new syntax it says okay this is the file this is the position the problem is if you want this machine to replicate from this one the binary log file replication is different than this one and with GT IDs globally global transaction IDs there's a way to say just replicate from this one you know what you've applied or it is like just just DT ID said I have you don't I don't have this give me whatever I don't have yet so this heavily relies on it group replication uses one GT ID per cluster so we have two GT IDs here to be more precise there are two GT IDs per cluster there's a management one that we had to implement but yeah it's an implementation detail we had to add some features to make this work we had to add another kind of management GT ID so you'll regularly see four GT IDs if you check the GT ID set yes it's the view change UUID group replication view change UUID the reason why is that if a primary switches here like something happens or you add a member a UUID is sent with the chain the new topology but in a split brain situation you create GT ID inconsistencies between them so that's why we put it on a different UUID so that we can reassemble it and we know it's a management we can ignore the management GT ID events yeah okay sure yeah we don't then we need to have users and passwords here in the router and we don't want to do that yeah SSL happens local we can have that but then the authentication is just the authentication string and yeah security you have to start replicating users while they could sync it the routers could fetch yeah and we also have this audit logging that we might want to do as users we have like in our enterprise version an audit log so we want to know who authenticated so we want to keep control here and the overhead that we've seen we don't have users complain about that about that specifically we're also adding kind of like connection reuse to it to the mix it's actually the connections are reused so the overhead becomes less you don't have to open a new connection so less overhead you can you can configure it you can configure the pool how you want it yes yeah yeah yeah whatever there's no limit yes you can connect the replica through a router yes that works yeah that works but we added features that we don't need to do that and I can show you that I can show you that in the next couple of slides three four questions yes go ahead yes yeah so it there's a timeout configuration for that one there's a limit on how many you want and it's also not in total how many it's actually per user and connection capabilities because remember the authentication we don't want different users to use the same kind of connections so there's there's a little bit of difficulty around that and then what was the last yeah okay yes correct and it's just I think the configuration is if the if you have that many idle start closing them so keep keep that many idle as you want yes yes but router by itself cannot open connections because it doesn't know how to authenticate so it needs a user originally to open an original one and there's a bit of a trick we need to keep state of the connection tracking session tracking like there's there's a lot of state we have to manage and and make sure that that they are compatible for the different users that things get quite complex yes question the router instance or new connections have to be opened so the application then authenticates again and router router is basically man in the middle that's what it does it kind of yeah it knows some things but it it doesn't know the clear text password but it can use whatever hashes were there to kind of forward it at least but it doesn't have access to the actual password bootstrap once question I'll show you that's what the scenarios I'll show you the after these slides I have some scenarios what if a region dies what to do I'll show you you bootstrap to any of them it doesn't matter because router figures it out what to do router will know oh bootstrap to this one it will see the metadata it will know that this one is the primary and it will register through here and it will replicate to everywhere and that's it you don't have to know the primary it's easier for automation when you automate this just give it one of them and it's enough sure no no yes I think we switch to what is it event based we don't do threads for each connection anymore so that's better I don't the scalability kind of depends on router but it does seem like from what our users say there's no they haven't raised any issues on that yet yeah anyway I have a couple more minutes and 19 slides so yes yes correct you can actually switch if you want okay let's show some use cases and scenarios right let's get down to it I have seven minutes okay good let's say we want to switch the primary cluster so this is manual so changing a cluster this is a clean operations that I'm gonna show you you have two regions you just want this one to become primary because the workload shifts there or you want to do some maintenance whatever right it is one command set primary cluster and you just say change across cluster to here so in purple is everything we do right replication will be reconfigured this one will become the primary and this one will become the secondary so router will also know oh have to connect here now and it will start connecting here target primary writes come here writes suddenly move there these rights will also move to that one so this one command and it will do auto automation all the work necessary for you to do this a second or that's very fast there what well yeah what we need to do is we need to put the primary all primary read only all connections that are ongoing need to commit or roll back so it kind of depends on your workload yes okay then router I think I mentioned most things already so I'm gonna skip some things then because I've mentioned it's a target mode I've mentioned these things so scenarios as the person mentioned what if something happens scenario number one the primary dies not the whole cluster just the primary and the prime the read write member the primary of the primary cluster the primary member of the primary cluster dies what happens is you know the cluster or group application will just elect a new primary this one will become the primary and automatically the async replication will also move to the new primary so that's fully automatic so we had to add some features to asynchronous replication to make this all happen you can actually do this all yourself if you just use regular asynchronous replication MySQL 8 you have the concept of not change by replication source to master host you can give a list of it even and even the replication knows about the group so it kind of knows about the whole cluster let's say this one dies the replica cluster primary which is read only but is one receiving the transactions if this one dies replication will be restarted and will continue here fully automatic so you don't have to do anything so these are kind of the common cases right if you do the clean switch it also does it automatically now that's what you mentioned the use case let's say it's partitioned not it didn't go on they didn't go on fire they didn't lose connectivity we have a two clusters right this is asynchronous so application is stuck nobody can access this application so what do you want to do first is fencing you want to fence it you can actually fence it we have features that we added to fence it you can say fence writes so we know because we don't want to writes can still happen to this one it's the old primary so we need to reduce split brain as much as we can so you can do fence writes and it will put the whole cluster and read only or shut it all down you can kind of use what you want assuming you can get to it yes correct if you can get to it maybe it went power went out that's a good thing right but if you cannot get to it there should be ways to get to that isolated one right yes correct but maybe you can access through some console some other way or there's another internet line to get in yes correct yeah you put the whole cluster to read only yes correct not cluster set correct you do it on a cluster yeah well this one is not this one is still a replica at this point it's still read only so you fence first if you can fence it if you don't have to fence it but it's up to you we do recommend it of course so suddenly everything now is read only what you now want to do is you want to so you do force so not set primary cluster you force primary cluster it and you say you make this one read right so router will notice oh I have a new primary here and it will start sending right traffic here now because we fenced if any if the application still so cron job might be running and might be doing some updates on the database that without fencing that will still happen here and we have a split brain situation so with the fencing we reduce that possibility now what happens if the network partition goes away and it recovers these routers will notice that there's a new primary so in the metadata it knows that this one should be the primary not this one anymore and they will automatically send all the traffic there yes then is this yes it it will know it as soon as it can connect to that cluster again yes so here it's read write and that one will be read write but if we switch it knows correct and I have one minute left lot of questions now to summarize so you can see the slides online to decide what went to use what right if it's HA we have inodb cluster no data loss RTO the time to recover seconds with replica set there is possibility for data loss RTO is minutes with multi-region disaster recovery we've showed you this right inodb cluster set which means RTO is zero RTO has seconds within the region write performance is great because it's asynchronous the RTO is higher on disaster recovery because you have to manually fail over now one thing that I haven't talked about is that you could also deploy a single cluster across three regions and you'll have full automatic fail over you'll have no data loss when a region fails you can add two nodes on each fine because then you have some kind of local high availability so this is also something possible but you need very good network for that and your write performance will be affected by the latency of this so this is another one this is well synchronous none of them are considered synchronous right it's or virtually synchronous whatever concept you want to call it but you can get synchronous if you want to yes anyway that's it there's some links everything is available open source use it so and that's kind of the direction and a lot of effort that we put in so if you have any more questions that I have to stop because the next talk is the slides I will put online slide share or on Oracle Tina will add it and I'll upload them now today yeah okay thank you sorry Dale sorry I have to say it again and then I'm used to having we drift off all the time right you have questions because you were falling asleep I was listening I know I know I know there's more people there's more people it keeps people more awake I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I