 Good morning, ladies and gentlemen. My name is Sveta Smirnova and I work as a MySchool Support Engineer for quite a long time. For so quite a long time that I remember that MySchool User Manual contains warning about using asynchronous replication built in MySchool asynchronous replication in active-active mode is not supported and not recommended for use. But watch, I do not remember. I do not remember a big number of companies who followed this advice. Eventually, this advice was removed from the User Manual because my school developers and my school documentation writers accepted the fact that people used this way and the world did not end. But sometimes I still see this advice in different places where people are recommended for others to not do it. And here is my talk, it's review, it's why we still have this in year 2020. And watch is real, really in a bunch of rules using active-active with asynchronous MySchool replication. And why do we need these active-active setups when we have synchronous clusters such as Galera orDB cluster. First, watch is the issue with asynchronous replication, so let's remind ourselves how it works. So, first is Slave, it initiates actual replication, it asks for update from a master to apply it on its site. Master reads this update from binary logs and sends it a packet. Slave, it's actually that's all. It's this time, master doesn't know if slave received packet. And even if it's received, if it's stored, its packet was stored successfully. And if it was stored successfully, it was not updated, was update happening or not. It also doesn't know if update, even update is happening, even if it updated without error. Was it, like, actually that's update, which is same data on master. There are options which allow some control. Like in the first year versions of MySchool, there was only statement-based replication, so it practically was like you wrote statements. It's no matter, like, it was executed the same way on Slave. In version 5.1, we have a raw-based replication, then MySchool ensures that exactly the raw which you wanted to update is updated. Using primary key and the way to identify it. In version 5.6, we have global transaction IDs, which also prevent users from applying the same transaction points. But it still doesn't answer on the questions of widgets. Was it, it's just like this protections help to ensure that transactions came and transactions applies its same raw as on master. But it doesn't, actually, we still don't know if it was successful. It was not, like, some kind of error. Like, for example, if master and slave table definitions are different. So what happens, it's like, it's a problem for, which we just described. It's a problem for even active passive configuration. But what happens, we have to active nodes. If to active nodes, we have more issues. Let's first, we have two nodes, and we try to insert a row with same ID. It's 42. This once insert happens, they both send updates to each other. They both receive the update, apply the update, and both nodes get a duplicate key error. Because they already inserted this row. At this time, replication stops. In case of delete, it happens similar things. So we are trying to fix previous error. We are deleting this error. Again, restarting replication, sending updates, receiving updates. And now we are getting key node founder. But this is actually not, well, it's, I think, some recoverable. Because in the same time, in the third time, DBA can come in and decide what to do. But in case of update, we may end up with a worse situation. In this case, we are having updates at a different value for the same row. Again, we are exchanging the updates. And now we are having node 1, which originally said ID i equal 42. We are having value coming from node 2. And on node 2, we are having value coming from node 1. And this is, which is worse, it's data corruption. It's data corruption that can get unnoticed, not metized. And then it can lead to forfer errors when we update the same table. And we actually have data, which we don't know which data we have. So in this case, why to write to multiple nodes, why people need it? So why don't just write to one? So, in this case, there are many scenarios. And again, as a support engineer, I'm trying sometimes something, which looks for you crazy. And sometimes it's actually a very good business reason. It's like it may have, each node can have its own purpose, like for different applications. It's possible to use multiple machines for better scalability. I put here word read, because writes needs to be updated on all nodes, so that we cannot scale writes this way. But we can read from multiple nodes, and we can distribute a lot this way. Nodes can be in different geographical regions. And, for example, in Belgium and Brazil, it can be delayed, network connection can be interrupted. And sometimes you just cannot rely on a Brazil server to get data in Belgium relatively fast. Sometimes you just need hot stand by. This safest way to use active-active setup. So it's practically set as active-active. But, importantly, it doesn't accept any writes and reads. While active nodes, it's active everything. And then it dies. Hot stand by comes into place. Only thing with this setup, which is almost safe, is you should not like, for example, in case of recovery, you should make sure that they do not write, writes don't go to both servers at the same time. For a short period of time. For example, when this server comes back. So it should be done like proxy SQL, and some other tool, which can distribute balance. We also have, in 2020, we have synchronous solution. I put them here. It will give you very short overview. It's actually, which is our topic. I don't explain the difference between them. But what is the issue with fully synchronous cluster? So in this case, we are getting update on node 1. Send the packet to other nodes. They receive the packet, the plus changes, confirms. And actually node 1, it's updated, still waits ready for new update. Of course, you can do updates in multiple frets. So it's not, that means, so that's full server, full node start. But that's update. It's client waits for this answer before continue. So sometimes this takes too long, and sometimes it's not acceptable for application, especially on geodistributed locations. So it's actually, it's speed of slowest member. So like if it's right, so it should wait. To enable, it's actually to enable for parallel reads and writes, but not for the client, which is performing update. And what is most important, it's like if georeplication, like in the servers in different countries, in different cities, that synchronization, you either should accept that synchronization will take a lot of time, you applications have to wait, or nodes will, and also nodes may often disconnect, in case, like, because changing data between data centers, exchanging data between center centers is more risk-prone than exchanging data in single data center. Okay, so watch, we need how to set up, save asynchronous master-master. At first, I did not have it on slides, but I assume you do not use bad practices. It's usually, in the user manual, it's called unsafe statements. That's like using update with limit, but without order bug. It's like insert ignore and things like this. I assume it's like a raw-based replication and a raw-based replication, and GTID is also protection, protection field, but I won't cover this here, because it's also true for active-passive and actually should be used this way. So, actually, most important thing for active-active is you need to divide and rule. It's based on business logic or database objects. So, let's see it for example. So, for example, it's very easy way. You can just agree that each server should write to its own set of tables or databases. Like this yellow nodes writes only to this, and these are replicated. And same for orange nodes, which writes only here, and these are replicated. This way, you won't be able to corrupt this data, and orange nodes won't be able to corrupt this data. So, this is relatively safe. For tables, sometimes you have to write into multiple nodes, into active-active nodes, into same table. In this case, it's a little bit harder, it's practically more airplane, but you just need to see agreement on which rows to write. For example, yellow nodes writes to even rows with even ID, and orange nodes writes to ID. It's easier like using after increment. After increment, you have to set its number of servers, and after increment of set, I put here server ID, but it's not server ID, it should be like if you have two servers, it should be for node 1, should be 1, for node 2, should be 2, node 3, should be 3, etc., etc. You should stop before starting 1. For if you are using custom index, which is not after increment, you should create your own unique pattern yourself. I will do this way. Should you create your unique pattern yourself for each node, and follow this pattern. So this is actually practically what you need to curve in to be safe with asynchronous active setup. First, make sure you do not use unsafe statements, but practices for replication, which we have in user manual. Second, divide and rule, and make sure each node writes exactly to the predefined number of objects, which belongs to either tables, databases, or split rows using some patterns. And conclusion, it's a hot stand by, easy to set up and maintain. It's like you just need to care about you do not write into servers, in case of recovery happening to active node. For real active setup, separating writes by database table is comparatively easy to set up and maintain. Separating writes by row is challenging, but again it can be done, and this way you can use this setup easily. More information is all active links. This QR code, they are where works. Slides will be available on my slide share account and also on the conference website, and you can just click and follow. It's MySchool user reference manual. It's a webinar about troubleshooting MySchool replication. And this is a book MySchool High Availability by Chuck Bell, Lars Starman, and Matt Skindal. This is actually the first book, which accepted the fact that asynchronous replication and active-active setup is something which can be used in a real production environment and actually successfully used. It has best practices how to do it safely. Thank you. Please ask your questions. Please. So it's not a question. It's in MySchool cluster. We also have automatic conflict resolution. Do you speak about MySchool NDB cluster? It's about NDB cluster. Any other question? Please. It's up to you. You can do it on application level. I think you can do it on proxy level too. So suggestion to add an article about MySchool replication is a bit option, and I should consider to add it to my slides. Maybe for the next version. Thank you very much. Some people still agree it's bad. Okay, so that's it. Thank you.