 in the shared notes. So our very special guest today is Mr. Ivan Ma. He's the MySQL user group lead in Hong Kong and has over 20 years of experience in enterprise system infrastructure and software technologies. He's also currently the MySQL principal solution engineer, Asia Pacific with Oracle, responsible for pre-sales consulting, technical and education and product promotions. He joined Oracle in the system team from Sun Microsystem Acquisition. And prior to that, he was principal solution architect with CyBase APEC covering APEC regions. So now let's welcome Mr. Ivan Ma. Mr. Ivan, over to you. Thanks. Thanks everyone for joining these sections. So today we have already covered all two sections, one from Helen to talk about the quick stuff MySQL show and last session, we got the idea of the MySQL service on cloud, then same day. So this section is about handling data consistency. Data is the key. Data is very important. How can actually be do it? Okay, we do it this actually on MySQL. Today we covered this. So you can see basically I'm as mentioned, I'm from Oracle, is a group leader in Hong Kong for MySQL L and as well I have been actually doing a lot of work with like exit data, Oracle, CyBase over 10 years. So in database, I believe I'm good, I'm good at it. So let's look at what my database consistency is. So basically consistent means it follows the rules. So the rules has to be done beforehand. So as a database system, we refer to like database transactions and changes must be actually on the way, okay, it's being defined for what it can be done, right? This does not guarantee correctness of the transaction in all way you want it to be done. Okay, so we can see, okay, there are other terms, so called asset or CAP. So asset atomic consistency, which is important for today, isolation's durability. So asset transactions is also set of failure, it's logical unit of work. So this actually we believe everyone's using database, we need transactions, but over time we also see distributed database system and distributed system in all assets as well. So people also can see this as multiple system, yeah, multiple nodes actually running together. We need more than just so-called asset, there's other terms called CAP. So there's also a theorem called CAP theorem. So we see consistency, availability, petitions tolerance. So consistency in a sense multiple system, actually multiple update concurrently is doing the update or doing things and it has to be in the same order, right? So this has to be consistent. So availability is in a way that binary outcome with request, yes or no, right? So it may be actually sticking too long to response, meaning it is not available. So there's also petition tolerance as a light-sharing angle. So one can only be actually achieving consistent or availability only two or three. So this is so cost-efficient. Of course somebody is trying to challenge this theorem. Anyway, today we talk about consistency, we want this to be done, right? And what exactly in database it is a matter. So for my SQL, we have the transaction storage engine which is NLDB. So 90, 95% or even more people actually today using NLDB. So it supports transactions. There are this storage engine in my SQL, like memory, CSV, black hole or some others on my ISM in those days. But what we do, okay, today people use NLDB, you can 95 or more people using it. So as the transactional storage engine it supports like how we do begin and come meet as well as like save point and low back or actually release the save point. So those are important. And from the database management, we need to shut down, bring down the server and from time to time, maybe we need to restart the server. So all this actually is talking about we need to, yeah, still kind of undo and redo, right? So in database, we have undo logs, so it is important for like crash and recovery, especially when a system or a power plot. So the system crash and then we restart it and something actually just in progress. So what can this be recovered? We base on those actually intensive data which either being done and have not yet finished, we need to undo, we need to clear all this data from the template from the data files. So yeah, we actually we can do it from undo seven and as well as something has been committed but it hasn't actually been put into the data files. So in a sense, we need to apply it based on the redo logs. So when the startup comes up, the server actually read the redo logs and tries to apply all of this. So it makes the database to be consistent as certain state and then we can start to use. So again, when we talk about this like consistency, we also talk about how we backup and restore and recover. So how can we back up the data? So when we recover the data from this backup data, it has to be consistent. It is not just look at the restore time, it's actually how we backup the data. If the backup is inconsistent when we restore, there's no way we can recover. It's actually inconsistent state. So backup, recovering, we talk about it. And there are things that today is about H8. So it means there are redundant server and data can be one place as a shared storage. It's likely people talk about active access and maybe using cluster system or VM, you start and start and actually bring another server and looking at the same shared storage and then start input, having the same IP. So yeah, it's looking at the same data. It's like, okay, data consistency and it's based on undo and redo in that case. What about actually when we do replications, it's actually been not redo log, how does it work? And as well as there are latest things, we talk about replications and in the DB cluster. So this is like multiple servers that can actually handle a request. It can be a single server for primary and the other server nodes to provide weak capabilities. So as long as we create data, we may pass data through other server and people may access the data from other, how can we read the data in a inconsistent or we have the right consistency, conflict can actually be detected and handled properly. So this will be actually one of the topics today over this 25 minutes, it is quite short. And there's also some other things with multiple server, which it is an DB cluster. Today, we are not putting this as focus, but anyway. So I just put it here. And as well as there is actually people creative applications, it can be programmatically is to create a whole bunch of the logic. It's not just transaction, it can be something else. It can actually be across different systems. So programmatically, how can we do it? So as the backup, how can we create this consistent state so that we restore, it is consistent backup restore. So basically we can bring down the server and nobody actually updating the data to the data. As a file, as a storage, it is consistent. Is it in a consistent state? And then we just backup the data. But yeah, it is actually reduced or actually steady. There is a downtime. So today we have always available systems running as productions 24 by seven every minute after seconds people's doing update. So as the running server, people may try to say, oh, we take a snapshot, how can we take a snapshot or hold the VM and not doing anything. So it is capable to just give the VM frozen. Nothing's to work in the VM. So does it actually work? Yes or no. So basically when we keep this VM, it is frozen. The data is not actually applied to storage. It's like the data. Okay, it's like crash. This is a crash recovery models. So it is good, not really. And as well, if this is a like cluster system, you know, DB cluster replications, it has to happen. So when one actually tries to paint the other server, it does not respond. So because frozen, so the note will be kicked out. So this is not a good idea. So we believe a better way of doing this backup is to use some other way of applying so we do locks and backing up and bring the data to be in a consistent state. Okay, so we introduce like enterprise backup from the Oracle Enterprise Editions which actually can backup the data with multi-threaded process. Okay, it actually backup the data much, much faster physically. Okay, it is not logical backup, it is physical backup. And also it bring the data in a consistent state by applying the transaction law. So it brings to a state when the data is coming back, right? So we start data can also be faster because it can be parallel. It's not like SQL, it has to apply SQL SQL one line by line. And a lot of more features with enterprise additions full incremental backup monitoring and encryptions and all like this. And also when the data with the MySQL is encrypted, we need to read the data with more secured way than is actually enterprise additions. Anyway, people also use MySQL DOM. So the DOM take an example why it is inconsistent, how we can actually get it to be consistent. So like in the pictures down here, so a table at one time is table 110 records, table 200 records. But over maybe another time, it has the table ones getting two more records, a master, okay, custom record. All those there are changing more, it's 120. But the backup process actually start in the middle. So the table one actually being captures for 10 records. And the table two is copy, it was dumped and the backup data has 120. At the end, the backup data has 10 records for table one, 120 for table two, is that effective? Is that actually consistent? It is inconsistent backup when we restore. This backup is not quite usable. So how can we do this? So if we use MySQL DOM, basically people has to apply single transactions or people tries to use locked table. Single transactions are in my work for NLDB, okay? So if there are other kind of storage engines, okay, you may consider to use others, okay, locked table. Particularly when you're using MySQL 5.7, there are some other tables, which is a system table. To data dictionary, they are not NLDB. So the way you backup the full database, it has to be more than just single transactions. So what is single transaction and locked table? Single transactions means MySQL has isolation level, which is repeatable read. When I start the transactions, it actually captures the versions. So all this data actually at that snapshot time. And by then all this NLDB tables, when it's actually based on the time which we started. So that actually can be actually quite helpful for sections. But remember, this actually for a sections to go, for a sections to go. If actually you break in the middle, you want to do it actually table by table by table, it doesn't work because single transaction only within a single sections. And as well as how we do this, like, okay, locked table. We can lock all the table upfront and then nobody can touch. So this may be actually work for the system having more than just NLDB. And there's also another options. We, yes, too many rows, a big table. And the quick options, which is very useful, it actually dump the data row by row, consider a 10 minutes records when we be trivus. It may be overflow your memory. So the way we may actually dump the rows to a table, it may take quite long. So there is also the MySQL shell domino. So, Hanando in the earlier sections mentioned this and just refresh in here. So MySQL shell is quite good. Okay, so to do this to export data, to actually kind of logically backup the data and to bring the data back. So basically there are dump instance, dump schemers means dump many, many databases within this instance. And also to restore it, okay, by having the low dump tables or actually export the table in CSV format, in certain format and terminated by some fields, right? And import the data tables, which in a way people use low data for this utilities, which you to dot from MySQL shell. It's also considered consistency. How can this be? Firstly, this is multi-threaded data, multi-threaded domino. So the way how it does, look at this, at the very beginning when it start, the MySQL shell, we try to lock, okay, to issue a lock if so-called flush table with lock and start all the sections, start all the threads. So all the threat when it is started, it will try to set the section, transaction isolation level, and it is repeatable read. And then start the transactions with consistent slap shop, meaning at the very beginning, all the data is in the consistent, the same way as this time zero. It is consistent, okay? And it is repeatable read, isolation level, no matter, you will actually set it to commit a read committed. So this is quite good and threat, multi-threaded dom consistency in here. When we load data back, it is big data. So how can it be actually just one way to load, maybe in the middle, the connections fail, something wrong, so it's a long process. So what we can do actually, it can be partial. We actually, how can we bring this consistency? We start to restore and then in the middle, we crash or we start, and then we can continue. We can actually use this MySQL shell to load the data as partial and resume. This is quite a very good tools. And there is also in MySQL 8.0, we have new features, the clone. So clone is also kind of backup, okay, to reproduce an other image, which is the running server. We want to reproduce another running server. How can we do this? As like there is a sender and receiver, we call this recipient and donor. So as here, the donor, we have to install plugin, so install the clone plugin and create a specific users, which actually the remote server, it will connect back to here. And we actually grant the specific right, the percentage to this users, okay. By then in the recipient side, we as well, install the plugin and we tries to set the where we can actually get data from. The donor is, and create a users and grant and clone. So we can clone instance from this specific servers from the donor. So having said that, so what is actually the progress? We can actually see from here. So as like, this is like backup, backup, okay, from the donor side and then sending the data over the networks to the recipient and apply the log. We do log, so it is consistent. It is consistent like backup and restore, like, okay. This is a very good tool and good features in MySQL 8. And MySQL Shell in MLTP cluster, we support natively using this clone, natively using this clone to reproduce a server. Here, you can see we do log copy. So we actually have, do we do log, okay, copy across and apply when it is restarted. So do we do log, we apply and bring the database to be a consistent set. There is also programmatic SQL, which considered like GTID consistency. It has to be consistent. How can this actually be done? Before MySQL 8.0.21, people has been asking for like create a table from Celeste statement. You can see create table from a Celeste statement. It is DDL and DML. So DDL, it is like, okay, on its own create table, it's either yes or no. Celeste statement can actually be a transaction, a create table, there is no transaction concept. So in here, after the 21, we support this as like real back, okay. It has the GTID consistency. And there's also, if we consider multiple nodes and someone actually execute DDL on one node and execute DML. So it might not be very consistent. So we have to bring this like consistency how we combine DDL and DML running on the same server. And there's also, when we problematically is to concede the things, there's also data dictionary system data, which can be my ISAM, like MySQL 57. And MySQL 8.0, the data they use in ODB. So there are other things problematic SQL for consistency control. When we want to provide consistency, we have to provide kind of locking mechanisms to have the serialized manner accessing the critical region of data. So we can use that instance for backup when all this is actually, you cannot create table, you cannot actually manage the DDL. But DML update inserts, yeah, it's okay. And there's also, as mentioned earlier, flush table with redlocks or flush tables, specify a few tables and then, yeah, give a few locks. And lock tables, specific tables, a few table locks and programatics to have the mutex get locks and release locks. And there's others rule locking, okay? Basically, locking can actually be, be actually in the rule locking mechanism. So meta locks. So all along you can see, so I try to flip over. So replications, so replications, more than just data, there is also repository. So when server crash, there are some point GTID, something actually put into the so-called file. In the old days, five, seven is actually the repository's file. 8.0 is on the table and the table is in the DB. When server crash, it will not be just partial file and parcel in DB, it will all be in the database. So it is crash safe. So the database is consistent. It's not just on one server, but actually across this H8 capability, okay, fine. And master info repository defines table. Relay locks info repository defines table as default. And there are other things because consistent means redundancy, multiple things happens, multi-structured here. So when the replication start and slave, we call the replicas has to apply data along all this. So we can actually execute the parallelism logical cloud and making how many actually workers to run and the commit orders and how good actually we can use the tracking. Means which two, which three, which section can actually be in parallel. It can be vice set as like hash number or whatever. So it can be based on the commit orders. So there are a lot of things in 8.0 we can do better and faster. So last, it is about InnoDB cluster. InnoDB cluster is more than just one server. It's multiple server. We need write and read consistency. How can this be done? So consistency level can be eventual, can be before, can be after, before and after, or a last before and primary for over. What that is eventual, meaning all this data eventually will be actually consistent. No matter what you write in A server, B server, C server, eventually all server consistent, before, after and this looks at us. So what this actually event shows, event shows means when we execute each and sections and writing data, the data is executed and all this data can be at the same time and this actually eventually will be commit all over. So they send to all this, okay? So we may get the data at one point, at one point M1s get the latest data but M2 may not get it because they do the work on its own. So that can actually introduce delay. So what is the before consistency when we execute and commit? So the before consistency is to, actually when we execute, we ensure all the data at that point from the relay log in the R server to commit first. So to commit first and then to execute the next one. So this actually, when we do this, this can be the next statement. And before all the data, if there's outstanding data backlog, it will apply first and when we read it, the data is there. So before consistency, we can actually have the data consistent data with consistency to all those. After means when we write the data, when we write, update the data, create the data, we commit and actually we wait until all the server commit and apply the logs to the server. So this all commit, it is like full synchronizations. So this is all this and before and after combined before behavior and after behavior. And this last one, it is like before on primary fade of what happened when we fade over and switching. When one server is down, we have the in-loaded cluster, the routed attachments to a server. When this primary server we update, there is backlog after on the node two and node three, we switch over, new servers coming up. So the backlog on the node two when it is elected as new primary, the data will be applied first before we get the connection. So the data will be consistent. We guarantee consistency when fail over connections occur. So in a summary, my score I consider I'll give you the latest features, my score Shell, Chrome, you know, the because then and many things. And consistency can actually happen programmatically back up with store replications, you know, the because the maintenance all this way. Yeah, that's all for today. Time's up, I know. Any questions?