 All right. Hello everyone. My name is Luis and I've been working on my score replication for a little over 10 years now. For those who don't know me, I'm Luis. I lead the replication team. I've been around for 10 years. I'm based out of Portugal. I've spoken here at FOSM a few times already. It's one of the places I like to come and speak about much more technical deeper kind of things. And I could talk about replication for hours. I think that some of my colleagues are well aware of that. But today I'll just be talking about a very specific thing which is the binary log and more specifically how the binary log is created inside the server, how it is utilized by different components inside the server, including replication and group replication, and how we can look into this framework that exists inside the server and maybe end up this presentation with a little bit of boilerplate code, if you will, to create maybe some plugin that can actually tap into the server and extract this. So I'll start just, of course, I'll talk a little bit of my score replication, but not that much, I promise. And then I'll describe the binary log. I'm pretty sure that everybody here knows what the bin log is for short. Actually, one raise of hands. Who here uses statement-based replication only? Who here uses statement-based replication? And who here uses row-based replication? The rest, right? So we will also be focusing on row-based replication for obvious reasons when it comes to change data capture using the bin log. And then I'll try to go into this, looking into a couple of use cases, the MySQL group replication plugin, for instance, which extracts the changes from the server and pushes it out to this PAXS communication framework and does all this coordination and then decides something and tells the server to communicate to the board transactions and so on. And then I'll conclude the session. This is basically the plan for this 20-minute session. So MySQL replication, this is a very common slide that I usually pull up on our sessions. Replication is very simple, master and then there's a log that is shipped around. This log records the changes that happened in the primary or the master and these changes are then propagated to the slave and the slave applies them and if it generates its own change log, its own binary log, then you can build all these different topologies that, for instance, Peter was talking about earlier. For better or for worse, MySQL replication is almost like a Swiss Army knife or a loaded gun, as Peter said as well. So you can hurt yourself plenty with it, but it can really get you out of some nasty problem situation that you may run into. The interesting thing that we're talking about here today is the binary log and the master generates a binary log, the one that is shipped around and the slave also generates the binary log. There's a persistent buffer, which we typically call the really log on the slave, which is basically, you know, get the changes, store them on disk while I'm trying to apply the log and, you know, just persist them on disk while this is going in case so that we don't saturate, for instance, memory. This was very important a long time ago, but not so much nowadays because nowadays we have parallel slaves and different parallelization schemes with right sets. We can come to an almost optimal execution history in terms of parallelization, so this persistent buffer is not so much needed anymore as it used to be, but it's still there. We can build multiple replication topologies, simple primary, secondary or master slaves, a chain or a tree or a star or a circle or a group now with group replication, where we can have external secondaries from the group so you can mix and match group replication with regular asynchronous replication or you can have an external primary and then replicate into a group. Right? And why is this important in this session? It's important because we can have all these combinations of topologies together and therefore we can have dedicated instances of MySQL where we can actually play around to do some interesting things related to change data capture where we don't disturb the rest of the topology. Over the years I've seen a lot of people doing these kind of things like having a dedicated slave with rope-based replication enabled to be able to mine the binary log, transform it, load it into some other system like a dup or something else. Okay, so the binary log. The binary log is used for replication but not only replication, point-in-time recovery, integrating with other technologies, rolling upgrades is a very important thing when it comes to rolling upgrades because it makes it so easier to just do a rolling upgrade on your topology and more. And as I said, we're focusing on rope-based replication format because it's a feature-rich format to be able to load data, capture the changes that happen inside the server as they are recorded in the bin log. Over the years, the bin log or the metadata that we put into the bin log format in rope-based form, sorry, over the years the metadata that we put in the bin log in rope format has been extended so that we also make it easier for you guys that are actually implementing these special cases of capturing and loading and transforming the bin log and loading it into something else becomes easier. For instance, in NATO, we have more metadata when it comes to the table definitions. So the table map log event, for those that know the contents of the bin log, has additional information such as the name of the columns, such as which fields are the primary key, whether this column is signed or unsigned and so on and so the character set and so on and so forth. And at its core, it's really just a sequence of events. We call them events, which there are some type of events that are control events like rotate, format description, GTIDs and so on. And then there's the data itself, which is carried over as a query log event or row-based events. But in the end, it's a sequential history of the execution that happened on the master with the changes that were produced by then. And how is this log, change log, formed? First of all, and mind you that, again, I reiterate, we're focusing on row-based replication. The way it is formed is that when a transaction begins and the statement is executed, at the handler level between the SQL layer of MySQL and the storage engine, MySQL, as many of you probably know, is built on a layered architecture where you have this SQL layer which has the runtime environment and so on and so forth. And then there's the handler that separates the SQL layer from the storage engine itself. So when the data is propagated from the SQL layer to the storage engine, we intercept the changes, the row changes, the record changes, we capture those, we put those changes in the memory cache, in the memory buffer, and we buffer this until the commit comes. When the commit comes, what happens is that we get this prepared in the storage engine. So internally we run a two-phase commit, for instance, between the bin log and the storage engine. But right before we flush these changes to disk, to the bin log file, before we persist these changes, this cache, in the bin log file, we notify a plugin or plugins that could be listening to these changes. Think about group replication, for instance. At this point in time, before we flush the transaction to disk, we notify a listener for this change, group replication, for instance. And group replication takes this data and pushes it out into PAXOS. By the time it gets back from PAXOS, OK, transactions is actually committed. So we flush it to the bin log, and then we notify plugins that might be listening that I want to be notified once the transaction commits. But the important thing here is that to note is that the capture is all done in memory, and by the time a transaction commits, we share these capture changes with plugins or other listeners inside the server. And in this way, you can think of it as an event bus for data changes that is running inside the server. And this stream of changes materializes as a set of files on disk, which we typically know when we talk about the bin log is really very much overloaded nowadays, so it means many things. And usually people, when they talk about the bin log, it's a set of files that you can go in, inspect what their content is with MySQL bin log tool, like Shlomi was talking earlier here today as well. But at the end of the day, the change stream materializes as a set of files on disk. Yeah, so there's the MySQL bin log tool to actually go in and open these files and see what's in there. So use cases for change capture using the bin log. I'm going to talk about or show some examples of some projects that build on the bin log to do some kind of change propagation or some kind of procedure automation that really benefits from this log. I'm not going to talk about things like flashback, because Shlomi already did, and I don't have it in the slides, but that's another use case that you can use is change log to actually replay it backwards and do these things like recovering if possible, recovering an instance to a certain point in time by replaying the change log in reverse order. So it's actually like a compensation action rather than reversing. OK, change tracking. I think Rene is somewhere here, or yeah, Rene. So Rene in ProxySQL, for instance, taps into this change stream. And in this ProxySQL case, the process that mines the stream is actually ignoring the changes, but capturing the metadata that is in the change stream. It's extracting the GDIDs that run through this change stream and collecting a global knowledge of the system to know and understand in which state each of the replicas in my system so that I can make decisions about the routing of queries to the proper replica. This is, for instance, ProxySQL. This is very roughly how ProxySQL does implement consistent reads across a set of replicas. Shlomi also has this tool called Ghost, which stands for GitHub's online schema transformations, which basically is a tool that tries to do a user land online schema change by copying data from one version of the table to the newer version of the table. And in the meantime, build on, tap into this change stream that is being executed while this process is running and trying to collect these changes to later on apply them to the schema transformation as well. And do all of this in an automated fashion and online. I mentioned already online rolling up res is a really powerful use case or a really good fit for logical replication or change stream based replication, where you can have servers in different versions. They probably have, for instance, different on-disk layouts, different physical layouts, so you can rely on the logical stream to actually make sure that you have a sound way forward with respect to rolling out different versions of the server. Data integration. This is, again, a very common use case. I put here Oracle Goal Gate, but a lot of people do it with, for instance, Maxwell and other kinds of framework where they tap into the bin lock stream, connect, for instance, as a fake slave to a master. They bring the bin lock. They process it, transform it, maybe into some more language neutral format, JSON, or re-encoded in protobuffers or something like that and push it out into some other message, but in this case, for instance, Kafka or something else. Some people actually use that as well to move data back and forth from different databases from MySQL into Oracle or from Oracle into MySQL or from MySQL into some other databases and back and forth. And for point-in-time recovery as well, right? We can take a snapshot. We can rely on the bin lock to make it so that we can take that snapshot and roll forward some of the change log that happened after this snapshot up to a certain point in time. So we can roll forward the changes, just like Slomy did on his flashback presentation. He said, oh, I want to roll back, compensate my or revert the state of my server up until this point. We can do that, of course, in the opposite way as we all know. So advanced data change data capture. So coming back to this interesting diagram here, we capture the changes while the transaction is executing. Once the transaction is about to commit, we notify that it's going to be committed, and then we flush these changes to the bin log, and then we notify that these changes have been flushed and committed in the bin log in the storage engine locally. So let's have a look at the group replication use case. Group replication is a plugin. It taps into the server. It has all these fancy diagrams and stack, very nicely layered architecture, different modules for capture, applyer, recovery, conflict handling, and so on. And it has this very interesting module here, which is the capture part. And the capture part is really the guy that implements these hooks to tap into the server and say, when you load the plugin, it says, OK, I'm going to register myself as a listener for the events that are propagated. And then at commit time, I want to be informed that these events have happened. So I want to take a part of the decision of what's going on on this transaction execution timeline. And I want to know what was actually changed. So it intercepts the change buffer and sends it out to the group in the case of group replication. The execution takes place. It captures it. Then it pushes it further down in the stack to the group communication engine. And at the very low end, PAX's consensus round is done, is reformed, majority acknowledges. We all make the same decisions everywhere. We have total order delivery in the system. And then everybody advances in the same order. At the end, it results in this distributed, replicated state machine. But at the end of the day, what I'm really interested here is to show is how does group replication intersect these changes. So the intersection takes place here, right before we flush to the bin log. This is not that at this point in time, the changes are still in memory, in this buffer, in memory buffer. It's an IO cache. For those that have actually looked into the code, it's an IO cache. And this means that the cache is mostly in memory. So it has a fixed size memory buffer. If the transaction changes are bigger or larger than this buffer, it spills to this disk so it swaps. But at the end of the day, we can consider it just as a memory buffer. So we take this buffer and we share it with the group replication. This is the same diagram, with slightly more detail. We execute, we commit, we prepare, we start the two-phase commit protocol inside the server. Then we take this, we replicate in the before commit hook, we give it to group replication. Once group replication says, yes, OK, you're good to go, we commit the transaction. Otherwise, we roll back the transaction. Maybe there was a conflict, maybe this guy was sent out of the group, maybe something wrong happened. But if the group replication says yes, the commit procedure continues, we flush and sink to the bin log, we commit to the storage engine. Committing into the storage engine means we externalize to the SQL layer what has been changed. So all logs are released and now you have a different view of the data. When we flush and sink to the bin log, it actually means that we externalize for consumers of the bin log. So slaves will already see it and other components in the system that are listening to this replication stream will already see it. When we propagate to group replication, it means that we externalize already there to the members of the group. So that's one minute. So that's when the actual commit takes place. So it's externalized to the group here as well. And what are the internal APIs for this? If you go into the source code and you look, you'll notice that group replication implements these APIs. The before commit hook and then the parameter for this callback which contains these caches that contain the changes. So it gets this notification with the caches and the group replication plugin can go in, open these caches, learn the changes, and so on and so forth. This is the unover simplified version of the before commit hook. It just basically says, well, I got the caches. I did whatever I needed to do with them. And at the end of the day, I sent it out to the group. I issued a send message to the group communication system. This is, at its core, it's that simple. So the key point is that the capture and extraction are the same as for regular replication, except that it happens earlier in the commit phase. And just to finalize this session, there is this observer plugin, which is kind of like a boilerplate code that is also shipped in the source code of the server, which just implements a very dummy capture plugin. So this is the way, I mean, this is the source, kind of like the summary of the source code of this plugin. It just implements the before commit hook. And the only thing it does is increments of counter. Then, at initial, OK, so this is actually just the before commit hook, so it implements a counter. And the way it works is that when this plugin is loaded, it registers this hook in the server so that it intercepts this diagram that I showed earlier so that it intercepts this notification and then do something. In this case, it just does before commit counter plus so increments this counter. And that's what I said, basically. And if you're really interested in looking to the source code and whatnot, of course you can. It's just go to GitHub's MySQL server repository and look at this replication observers example.cc file. So to conclude, the binary log is a very useful building block. We see it every day. Today we've had, even today, we had the Shlomi's presentation, for instance, explaining it. It is not just for replication. And since very long time ago, since early 5.7, we have this framework where plugins can tap into and listen to notifications. One comment before I close. We didn't have back then, but now we have in MySQL 8, we have this beta infrastructure, which we call the service registry, where it's so much easier to register services inside the server so that we have another different set of consumers that just go to the service registry, look up the service that they want to listen to, register as listeners, and so on. And we are thinking about moving these hooks that I was explaining here into this service registry to make it even more general for you, me, anyone, write a plugin to actually make use of this infrastructure in a much easier way going forward as well. So that's that. Thanks for listening to me. And this is really cool stuff. I really hope you enjoyed it as much as I do. OK, thank you. Any questions? OK, good. The question is, maybe you just answered it in this last sentence. I think the problem we see today is, like you say, there's many drivers or many processes that will tap onto the binary log in different programming language. And anytime the binary log format changes or there's a new data type or whatever, everybody needs to update their code by 12 different programming languages. So the question is, could there be a mechanism where we would just tap onto the client protocol or some API and just get the changes, like JSON or whatever, something that is easy to repeat the question. So the question is, will there be a less opaque format to the bin log so that we can make full sense out of it? That's the question. We are really looking into that because the bin log is actually an API to integrate MySQL into other systems. Someone earlier today was on this stage saying, we have this, this is very good, we have that, that is very good. The real power comes when we combine these two things together. So any two components, MySQL, some other framework, they need to talk to, they need to speak to each other to provide combined value, and we see that many times. So we are looking at making the bin log format much more interoperable, if you will. All right, so if you have more questions, just grab me outside and I'll be happy to talk to you. Thank you.