 And then, hi and hello. I just thought, like, a tiny reminder. Can you put your phones on silence? FLCNOT for, like, the newcomers. And then, thank you, Haran. It was really nice. And now we have Venkatesh Devirella talking about new replication features. Here you go. OK. Thank you. My name is Venkatesh. I'm going to present about my SQL replication, about the new things that we actually added in A.O. This is the safe harbor. What it means is whatever we are going to tell you, later you cannot sue us. You are on your own. You have to do the testing before it goes into the production. From morning, you've been listening many things about my SQL A.O. And if you observe, every presentation will have something to do with the replication. So we are the heart of the MySQL. And we have so many features that we added. And you can say there is no debate. It's never too late to replicate with version 8. So we have divided the features into these categories. Performance related, monitoring related, operations related. And many more we have added, which we cannot cover it in the time that we have got. So I have just one slide to explain other features. This is the typical replication picture. Before 5.7, it used to be just one particular node. And then there is a link between this. Now that we have group replication, we can have a picture like this. Every particular one you have a cluster of nodes which represents group replication. The link between this cluster and this cluster is asynchronous replication. And I'm going to talk about asynchronous replication. And there is a presentation at 3 o'clock that talks about group replication between these clusters. So you can put anywhere in the world and then asynchronous replication will work across them. So this is how, just to give you the background how replication works. You have master and you have a replica here. By the way, we changed the terminologies of replication related terminologies. Few things I will be telling you whenever I'm using a new word. So we are going to use a replica word instead of slave word. So if you see in the documentations of 8.0, you will start seeing few new words especially related to replication. So if I say replica, it is nothing but the slave that we have. There is no other technology that changes just the terminologies that we have. So this is the master and this is the slave. You write something here that goes to the binary log over the network. It goes to the relay log which will be picked up by receiver thread which is the old one is IO thread if you are familiar with the old terminology. We call it as receiver thread now. So the receiver thread picks it from the network, put it in the relay log and then the uplayer threads which the old terminology is SQL threads which uplayer threads will pick it up from the relay log, puts it on the replica and then if the replica has log slave updates enabled we will write it on its own binary log of the replica. If there is no enable, there is no binary log here. So this is how the regular replication happens. So my main concentration for the first feature will be on the replica slave. So that is the reason I have removed the other part of the picture. So what happens is you have the packets, they are coming through the network and there is a coordinated thread that picks up and then puts it on the parallel worker queues and worker threads will be taking it and then they will be getting applied on the replica. This is the typical how master replica happens. So this is if you have enabled the parallel threads, that is how you look. Otherwise it is going to be just the relay log will directly pick and then uplayer thread will pick it and then put it on the replica. This diagram will not be there. So that is the background. So a replica can apply more transactions in parallel. That is there in 5.6, that is there in 5.7 and we have improved more on the same area. In 8.0 what we have done, I will show you. So in 5.6 what is the parallelism that you can achieve? You can achieve the parallelism based on the database. So anything related to DB1 it will go to worker thread 1, anything related to DB2 it will go to worker thread 2. So if there is no transactions that based on DB2 there was worker thread 2 will be sitting idle forever if there are no transactions. So we can do much better. In 5.7 what we have done is on the master if these are the three transactions that happened exactly at the same time they are all three of them are there in the commit phase at the same time. We can 100% sure that they are not conflicting. That is the reason they were there in the commit time at the same time. So on the master we were able to take all these three and then put it on, parallely execute them on 5.7. So we can still do a further more parallelism here because these three transactions should happen only after this even though these things are not conflicting with them. So that improvisation is done here in 8.0. So what we are doing is we know which transaction depends on what tuples. That information we already have. Now using that information what we can do is if you increase the parallel threads on the slave and then we will see there is 1, 2, 3 and 4 so we can actually do a parallelism on all these things. Let us say the next one is just the row 5 and you have 5 threads running on the slave we still be able to take all these things and then execute them parallely. As long as there is a thread and there is something that is not conflicting we will be able to achieve it. So that is the reason. In fact if you have more parallel threads on the slave you can even achieve much more performance on the slave than the primary. How do you do it? How do you enable it? So there is a parallel works, parallel type. These two are the common in 5.7 also. So what we are doing in 5.7 also you can achieve the same thing using logical clock and this is the main important thing that we have added. The transaction dependency tracking should be the right set. So based on the right set we are doing it now in 8.0. So if you enable it, this one it is going to be the one that I told you, if you do not have this it is going to be the behavior of 5.7. And for this right set to work the slave-pressure commit order is equal to 1 is what we must be used and the slave-pressure commit order is nothing but it maintains the commit order of the master on the replica. So what we are doing is let's say you have transaction 1 and transaction 2 even though they are not conflicting we should be able to parallelly execute them on the replica. But there are some applications where you don't want a state on the replica which doesn't exist at all on the master. For example transaction 1 is not executed but transaction 2 is done. The application doesn't want that to happen. They want the transaction 1 followed by the transaction 2. So that's why you have to set this slave-pressure commit order 1 for you will achieve the same behavior. And this is one more thing, performance related which is you don't have to set anything it's just by default the code is enabled. So what we have observed is the relay log is being picked up by the receiver thread is writing something to the relay log file and this coordinator thread also reading it from the same file we are seeing a lot of contention there a lot of log related issues which is reducing the performance. So what we have done is we have done a lot of code improvements there which will give you a huge performance difference when 5.7 to 8.0. So this is one more point that we say that you can upgrade to 8.2 if you are using the 5.7 and this you don't have to set any variables here. And in 5.7 we have introduced JSON in MySQL and in replication in 5.7 what happens is you update a JSON document. So the full document if the JSON is kind of a 1 MB or 10 MB if you modify one particular attribute the full data is getting transferred from master to replica which nobody wants it obviously. So what we have done a new improvement in 8.0 if this is the only thing that you are actually modifying it only that particular thing we will be transferring it. There are settings if you want for some reason for example no we don't want only that particular thing because we have some special mechanism of replica relay logs we want to full JSON document that is by default. But if you want this particular behavior to happen we have settings. So this is the setting that you have to do. This is the minimum which exists in even 5.5 where you have to say only modified columns you have to transfer. So that you have to anyway set it. Even in the modified columns if there is a JSON and you want only the partial JSON which you actually modify then you have to set this. Will log row value options is partial JSON. So you have to set this to make that enable. The same applicable for the other functions. Okay so there are many more performance related features that I will cover at the end of the slide but these are the important ones. So the next section is monitoring. Many people now in 5.7 starting from 5.7 many people converted to RBR nobody uses very less people use SBR statement based replication. But in the RBR the main problem main issues that we receive from the community is the replica is hanging there and we do not know what is happening. Is it that the long running query it is getting executed or is it that what is the query that is running all those information is not there in 5.7. So we have improved upon the same area which is you can still see what is happening and you can see what is the query in 8.0. How do you do it? So this is the now in thread stable if you see the process thing earlier it was kind of saying that row event that's what you see in 5.7 but now you can clearly see the query what exactly you have done it on the master you can clearly see insert into T1 values 1, 2, 3 in RBR that's the and so let's say what we have done is every DML, every DDL internally we have some stages in the process list if you are familiar with process list table there is a column called info that will tell you it is taking system log it is going so and so on but still that will not give you how much percentages of the query is done how much more is done that information is not there. Now we have added in event stage current table where you can say that this is the query that is happening and work completed is 1% of total 4 so 25% is done and then 75% is what we have estimated. So you keep checking this status and then see okay my replica is not hanging it's progressing if you see this 1 is improved 2, 3 like that so you can see the progress now. Another monitoring related things typically in the real time you don't have just AB you will have lot of nodes in the replication chain we never had anything to like always referred what is the lag of D with respect to 2A the original one we never had okay what's the immediate one is it that it really received machine C a long back or is it just that it received where is the problem is it between B to C or is it between C to D all those monitoring was next impossible till 5.7 but what we have done is we have introduced the concept of so this is what I was saying so here we don't know when it is actually added when this coordinator thread picked up when it is a worker queues picked up all those things are not there so what we have done is we have introduced lot of hooks that's called in monitoring all these places and then we have introduced the new terminologies which is you call it as this particular let's say an event is originated on A we call that as the event time stamp is original commit time stamp if you see an event name with original commit time stamp it is actually the one that is actually started and C which we called as for this D this is immediate commit time stamp so if you see some information on D with these two you can say that okay it is actually C immediate commit time stamp and then it actually started long time back on A that information is there now and for various steps here like when it is added when it is done when it is executed all these things we have a performance schema related queries where you can give transaction latency versus primary you can compare and last hop when was it actually happened and when it is actually added when is the worker queues that you actually picked up and you can see when is it added to the worker queues when is the coordinator that got picked up and then when it is actually though it came out from the network when is it available on the relay log and this is when it is relay log downloaded it on the replica so all those places we have different information that can be seen from the performance schema table next one is related to the operations that is more about the variables that we added and few things that we deleted removed and this is multi-source filters which is the new feature that we added in 5.7 but the problem with this is you have a master 1 you have a master 2 and there is a replica here that is getting data from both the nodes but if you want to filter some data there is a concept of replication filters that exist long back but the problem you have a filter on this channel that is applicable to this channel as well it is a global filters but now in 8.0 we have introduced the concept of per channel replication filters so you can actually say there is no filter here every data is coming from this master this replica but there is like I do not want anything that comes from the DB users related I do not want from this guy so that kind of repair channel replication filters you can enable it now in 8.0 so we have changed a lot of defaults in 8.0 because of the community feedback that we received many people I mean in the real time nobody will do it without the replication they want the replication to be enabled so why an extra step from the DB is to enable it so we have done the default since log bin now it is off on 5.7 on 8.0 by default because it is on by default we have to do a give a value to a server id so it is on so log slave updates is enabled expire log days was 0 this is 30 now so this is the main important thing that you have to really care in case you do not want those expire log days to be done so what we have changed it based on the community again and the information that we store the updates the metadata information that we store on the replica was actually happening on the file based which nobody likes nowadays because it is not a crash save so we changed the defaults to table now and the transaction writes that extraction which is the related to the first feature that I was talking it is off now but it is hashed will be calculated which is a this is the name like xx has 64 and another performance related things that we changed the default based on the bugs we are receiving the default one is index scan and table scan many issues were kind of replications is getting lagged it is taking too much time when we debug it is nothing but it is taking table scan by default if you don't have a index so everybody will face it and only after facing it they are changing it to this so then we have done a lot of research on it and which is the best value for the default for the many users to work so it is now to if there is an index for the query that you are doing there will be index scan otherwise it is going to be hashed scan it is not a table scan anymore the default value these are the new variables that we introduced some of them I already covered so bin log expiry log seconds there are some users where the data is huge that they can't wait they can't give expiry logs in days like they want it in off day for example if you take facebook it could be the many gigabytes or even terabytes will generate just in off day they are doing some work like all these particular binary logs that generated transfer it to some other place and they don't want that to be there on the master even the off day also so that's the feature it is actually contributed by the facebook guys where bin log expiry log seconds now what you can do is I want two days and off day let's say if it is off day 1800 seconds or whatever it is so you can mention in terms of granularity is now seconds and bin log row metadata which is one feature which you can enable it if you want an extra information on replica we are writing all the information now the table columns earlier we were not doing anything we are not writing anything into the binary log but if you enable this binary log row metadata many information about metadata we are writing into the binary log if they are available on replica's relay log now there are some people who actually get the information from the relay log after it is saved they are missing all these metadata so that's the reason we added a new feature where the metadata will be transferred and this is the one that I related the one that I told you about the JSON so you can just say bin log row value options is equal to partial where only the modified attribute of the JSON will be transferred from master to replica and this is the one that I was talking about right set you have to set so that it will be taking the new 8.0 feature of paralyzing the transactions as long as they are not conflicting and so as long as they are not conflicting so you should have a particular set where you have to compare with if your load is not too much you can just say that okay after 100 transactions I am pretty sure that the conflicting transactions will not happen so you can set it to 100 we will reduce it to 100 and then the conflict resolution will happen only for those 100 transactions and these are the if you are familiar with the 5657 replication these are the pretty common variables system variables that you use what is the heartbeat period what is the last heartbeat that you received all these information is there as a variables but since we are going towards the performance schema we actually removed all of them and then we have replaced with some information in the performance schema so this is information yeah so many more earlier when the server is disk full there is no way you can execute any of the replication commands because the disk full situation is it's a disk full and it has taken acquired the log log which is very important for the binary log and we are making all the commands related to replication to hang till you clear the disk and then do it but if somebody wants to see what is happening why is it hanging there was no way in 57 but in 8.0 we added the performance schema related where it can do anything to do with the logs still give you the all the information so that feature is there and then since atomic DDL in the morning you have heard it from my colleague so since it is there we have done more improvements on the replication side so we recover DDL after crash is happening even when you do a replication enable also interoperability which is the one that I was talking about metadata is added now more in the binary log and then the transaction byte length also is added that's actually part of the metadata that we have added and earlier in the 5.7 if there is a non-empty server you cannot add GTT support is not it's not possible to restore it so now even in the non-empty server also this if you are enabling GTT that still we can do it you can set some variables so that it works so that feature results for all these things we have a lot of blogs lot of documentation that we have already written so you can go through the documentation of my skill and the reset master the specified binary log file is always that reset master will go back to one the binary log name is going to one which nobody wants it because if they already done 100 the reset master they wanted from 101 right so you have 100 binary logs you actually stored it somewhere because you the master server you don't want to pile up the disk you changed it and then you executed the reset master for some reason it comes back to one nobody wants it because which one that is one and again you have one so now there is a new thing reset master 201 you mention it so the new binary log will be from 101 not from the one yeah so this is the one that I was talking with second precision you actually can mention and so for the maintenance purpose we have removed 4.x 5.0 compatibility code there is a huge code related to 5.1 4.0 and 5.0 related work which we are keep on supporting it look okay what if the master is from 4.0 what if the master is from 5.0 we don't want to maintain that anymore because nobody is actually having on still on 4.x so we have removed completely so if you are still using 4.0 something 5.0 you are not going to support it anymore so that's for the maintenance and as you know we are open source we take the contributions from many people so we just wanted to say thank you for these people who are actually contributed lot of things so in case if you want to contribute go to bucks.mySQL.com or when you face some issue in case if you are able to figure out that what is the issue that is happening you just have to sign some agreement and then put your patch if you figure out that there is no issues at all with your patch we will take it and then we will give you the credits to you so if you can't wait to upload or to download the 8.0 so this is where you can download this is where you can read the more information and the features will write the moment the feature is released we will write a very good blog and then it will be there here and as I said if you find some issues we will put it here and if you have already a patch we will put it there thank you can I have three questions sure you mentioned you mentioned that you can have some books in the correct so is it layer 7 or does it have any sort like networking layer layer 7 does it have any sort of coordination with the ASX or like no it's just that when it is received there will be an in-memory buffer that we maintain saying that it is here and then the performance schema that in-memory buffer this is the time that it is received and we will display it to you so it's application second second third one yeah so you mentioned the per channel correct what's the isolation level of that is it at the level of database or is it at the level of the table yeah we have from the db you can actually say ignore db you can even say ignore table so we have till the table level we cannot say column level but till db there is a do db there is an ignore db you can even mention wildcots the regular expressions you can even mention like you don't want anything that starts with db you can say db star you can say wild do ignore db you can mention the wildcots there it's not a new feature in 8.0 even 5556 you can do a db level you can even table level as well the only thing that we added in 8.0 is the per channel thing it was a global level all the channels are getting affected if you just set it on one on the replica so that is what we have changed per channel you can actually mention that's all I think thank you you mentioned the binary log can you query the binary log for audit purposes yes show bin log events is the way the command that will give you all the information but you should have it's human readable but of course if you are on the RBR it's not a human readable but there is a variable called bin log row query log events you have to enable it for us to write the query into the binary log some people doesn't want that query to be security reasons or whatever it is so even in the RBR if you want the query to be seen from the show bin log events then you have to mention a variable called bin log row query log events so the query events also will be written to the binary log if you have rate it like that then we will say in the comments there you will see insert into T1 followed by the row format that so the row format is obviously is not human readable but you have a query there show bin log events is the command but for show bin log events to be executed you need some certain privileges anybody cannot execute that for the security reasons obviously thank you so much thank you