 Okay, thanks. For those of you that don't know me, my name is Nick Visas. I've been working with René as a proxy SQL commit over the last few months. And I'm also my SQL DBA. Before I kick off this presentation, just a question, a show of hands for me to gauge, who's familiar with asynchronous replication? Semi sync? Proxy SQL? Okay, GTID. Okay, great. So let's kick off with proxy SQL. Since everyone's familiar with this, I won't focus on it too much. It's a layer seven database proxy. Okay, primary integration is with MySQL protocol. There is some support also for Clickhouse. And by the way, there's a birds of a feather session at quarter to five in room H3228. For those of you that are interested, for those that are not familiar with Clickhouse, it's blazing fast. You should check it out. So moving back to proxy SQL, it's built for high performance, high availability, and it's feature rich. So there's a wide variety of stuff that you can configure to get various functionality as well as to tune the proxy as you'd like it to behave. So quick overview of the architecture. Clients connect. The requests are evaluated. And depending on the configuration you've put into proxy SQL, certain actions will be performed. So key areas are read, write, split, sharding, query, write. And for this session, we'll be focusing on read, write, split. And specifically the challenges that we have with master slave replication. So we're all familiar with replication. The biggest problem is replication lag and being able to retrieve data that is fresh enough from our slaves. So, I mean, we have this problem in asynchronous replication. In semi synchronous replication, we can be more sure that our data has reached our slaves. So the integrity is there, but we can still read stale data. So in order to avoid reading stale data, especially when an application connects both to a master and a slave, it has to maintain two sets of connections. It also has to be aware of how fresh the data is on a slave. So looking at this in more detail, okay, typically an application will write data to a master. That data may or may not have been replicated to the slave. The application will then have to connect to the slave and read that data if we're doing a read, write, split. So stale data can be received. This makes our developers unhappy. With proxy SQL, the read, write, split, we overcome the problem of maintaining separate connections to masters and slaves because we have the concept of reader and writer host group. So the application can just blindly send all requests, reads, writes to proxy SQL. Proxy SQL will route the writes to the master and then route the reads to the slaves depending on the configuration that we set up in the query rules. So the big benefit is that we don't have to make application changes in order to route our read queries to a slave or to route our writes to a master. All traffic is sent to a single IP, to a single port, so you don't have to reconfigure it on failover. And we can dynamically modify the configuration we have in proxy SQL to down slaves when we need to do maintenance, to add more slaves when we need to scale. So that one area is covered, but even with this configuration, stale data can be an issue because proxy SQL in itself does provide replication monitoring. So we can set the threshold, for instance, of 30 seconds and say if our data is more than 30 seconds old or even one second old, don't read from the slave. So we overcome that issue, but there's still a possibility that our data will be one second stale. And in addition to that, this monitoring happens from proxy SQL and it's basically polling the slaves at a configurable interval which has a minimum of one second. So up to two seconds of delayed data. So we kind of have to think about how replication works and go into a bit of detail to see how we can overcome this problem. With asynchronous replication, as we all know, it operates basically with a bin lock file in position. So all the events are written to the file, it's sent to the slaves, and all of these events are processed one by one. So both in terms of processing, it's kind of a little bit of a bottleneck. Things have to happen in sequence and masters and slaves have to be completely synchronized. In addition, when we're doing things like failover, we have to ensure that our slaves are at a certain position in order to connect to our masters. So GTID was introduced to help solve this problem. Of course, the main difference here is that we have a unique identifier for every transaction, which is not unique just to that host, but unique to every single host within the cluster. So the specific area that I want us to keep in mind for today's session is the structure of the GTID. It's basically a colon separated identifier with the source ID and the transaction ID. So why is GTID important for us? Essentially, this gives us the ability to detect which transactions have been executed and have not been executed. It allows auto positioning so we can just restructure our master slave topology as we wish. And starting from 565 and later in 569, variables were introduced so that we could actually delay our selects on a slave and wait for a certain GTID position to be reached before executing the statement. So this is quite good. It's a better approach. It allows us to get the data we want and avoid stale data. But there is still a delay because we have to wait until that data is replicated. And if it doesn't replicate up until a certain point, we'll get an error on the connection. So in MySQL 5.7, a variable was introduced that's pretty important. Now this is available in Pocona server and Oracle's MySQL, but not in MariaDB as of yet. So the variable is session track GTIDs. And basically when you enable this, you can get either the own GTID, so the GTID generated for a specific set of transactions, actually the last transaction that you executed, or the full set of GTIDs returned in the OK packet. So whenever I'm executing a write, I will know what GTID position a server will have to have processed in order for my transaction to be consistent. So having this information, I can query a slave and say, what's your GTID position? Okay, you've already reached that point. I'm good to execute my select statement. So, I mean, we thought about how this can be leveraged in Proxiesql. And Rene was like designing this and thinking about how can we do it. There's basically two approaches to solve this problem. The one is making a request to the slave and asking it, what's your GTID position? So I mean, this implies polling. This implies some delay. Essentially it's the same problem that we have when we're monitoring for slave delay with a regular host group. So we're polling. We have that delay of the interval. It can work, but it's not the best approach. So what's the other option instead of pulling, pushing? I want to ask if there is any question because otherwise the next section would be very difficult to follow if something there wasn't clear. Is everything clear? Any questions? I mean, it's very important that this, everything discussed up until this point is completely cleared for the next slide. So if anybody has questions, gray area, okay? So the push method, instead of polling slaves to get their GTID, why not have the MySQL server itself have a process running there and push the GTIDs that have been processed to all the Proxiesql servers connected? In this way, we avoid having to request the data. We avoid having to wait. The slaves just process binary log events and then send their GTID positions. So I mean, it's more optimized and it's especially important in a large scale deployment where you have a lot of hosts, both MySQL servers and Proxiesql servers. So to solve this problem, René came up with the Proxiesql bin log reader, which is essentially a lightweight process that will run on a MySQL server. It's designed with Proxiesql's principles of high availability, high performance. It's very lightweight. There was a lot of work done to minimize the CPU usage and the network overhead. In fact, how does the process work? It processes bin logs as if it were a slave but running locally on the same MySQL server. It strips all of the information except the GTID and it even strips the source ID and just sends the transaction ID until the source ID is changed. So there's very little network traffic, there's very little CPU overhead, and it also has an auto restart mechanism in case it fails and also reconnect in case Proxiesql is not able to contact it. So let's just have a look at this on a diagram. Yes. The source information, you mean the server ID? Yes. So it will send it and then it will not resend it until it changes. So basically what's happening is that it takes the status of the old GTID executed on the server and as soon as a client in this case will be Proxiesql connect to this process, it will send the full GTID set and from that moment on will send the first GTID executed with all the server UID and the transaction ID and then from that moment on it will only send the transaction ID until the moment the new UID is generated. So it's just a way to minimize network bandit as soon as possible because it's designed to have hundreds of Proxiesql connecting to every database server. So of course you should try to minimize network usage as much as possible. So MySQL will send the replication data and all of the GTID and whatever needs to be written to the slaves and the binlog reader will run locally, read all of these as René just described and send them to all the Proxiesql instances. So every single Proxiesql instance that connects to MySQL server will have, if it's been configured to collect that, will have the exact position for every instance. So at this point we know what position our slaves are at but how do we achieve the consistency? So Proxiesql can be configured to enforce this consistency on specific host groups and if you've enabled this, what Proxiesql will do is when a session is initiated and some transactions are occurring, Proxiesql will check the status of a host before rooting a selector and it will check to see if the last GTID that was executed for that session on a group of servers has been executed on the slave before sending the query. So let's take a simple example of two slaves and one master. It will check the first in its own internal in-memory list. It will not find it. It will check the second. If it still does not find it, it will then root the query to the master. So you'll always have fresh data, your latency will be low and you can even configure this just on a host group. You don't necessarily need the replication host group. So let's just have a look at this in terms of a replication host group. So the application will send the reads and writes to Proxiesql. DML will be routed to the writer host group. This will be written on the master. The next request will come in, some select or something like that. Then Proxiesql will evaluate which slaves are up to date and if none of them are up to date, it will root it to the master provided that the master is included in the reader host group. So to sum up, let's have a look at what is supported for GTID consistent reads in Proxiesql. With basic master-slave replication, we support both asynchronous replication and semi-synchronous replication. For multi-master, InnoDB cluster and group replication is also supported. Finally, a quick note about the requirements. GTID is required for all of the servers that are part of the host group and of course MySQL 5.7 so that we can have that information returned in the OK packet. And finally, at the current moment, the bin-let format must be configured to row. So what time is it? Time for René to do a live demo and for all of you to see how this works in action. Take it away. Questions so far? Yes. Basically, the way it works is that the GTID of the transaction executed from the client is returned from the OK packet. So for every client session, Proxiesql is tracking the GTID for that specific client. So when it needs to execute a select statement, it's check which slaves has that specific GTID. It does not check the last GTID you've written for whatever connection, just for that specific client. Just technically, yes. Actually, you might also send queries to a slave that is a few seconds lagging behind, but you can ensure that the write has been written by one specific client has reached the slave. One important thing is that you don't have to enable these features for any sort of traffic, but you can specify which select statement you want these features to want enabled. You don't enable it globally. You enable for a type of select statement. It's technically not correct. It doesn't query them to check if they have the GTID, but it can generally check if those slaves have already the data. Now, the principle of a check if the slaves have the data first and then the master, because normally if you're using replication host group, the master isn't the same host group as the slaves, because it's part of the reader, but with a lower weight. So the probability of the queries being sent to the slave is higher than probably being sent to the master. What if the GTID is not executed on one of the slaves and neither on the master because actually technically it is possible that the bin log reader reply slightly later, even unlikely, but the possibility is there. The same principle apply of how proxy SQL behave when there is no server in our host group. It just wait for the server to come for a server to be available. So technically you might have a slightly delay, the worst case. The error will come after 10 seconds. It is the default for a server not being available in the host group. Let me go back. Let me see which is the best slide to... Yeah, probably this one. Let me just go to the side. So the way to work is that the application is right in here. So then the master reply with the OK. The SQL knows what is the GTID in the token packet to reply to the client with OK. Then by the time that the client is sending another query, you can pretty much ensure that at least the master has sent the GTID to the proxy. So the probability of having this guy sending a query before this one returned the GTID to the master is unlikely to happen. Technically possible, but the probability is extremely low. And then it depends how fast are the slaves to process the GTID and reply back to the proxy. But the idea is that we are eliminating all the extra network roundtrip because when this will execute a query, a select statement, these already knows what are the GTID executed both here and there. It doesn't have to check them. It already knows. Yes. Every client has its own state. So the GTID, so if you have only one proxy, what you are saying is very easy to do to implement because this proxy will know all the query that has been sent to the database server, master, slaves, whatever. But if you have multiple proxy, then this process does not know what has been executed by that one. Technically it can be implemented in some other way like relying only on the bin log reader. So whatever bin log has been executed up to that moment. But a consequence of this might be that, let's see, let me think. It probably has to wait some interval to make sure that nothing else was executed in between. I'm not really sure how to implement this or what are the challenges. Okay, so demo. So we are trying... Oh, you cannot read here. I'm trying to zoom out. No, it's okay, right? Yes. Okay. So there were two things that I wanted to show. One is one proxy SQL instance that is collecting metrics from 30 database server at the same time. So it's collecting metrics for all of them. And I wanted to show the CPU utilization and the net utilization, but unfortunately the VPN doesn't seem to work. So I'm going only to show another one. So in this session, what we have is... Okay. So I don't know how many of you are familiar with the way proxy SQL is being configured. So it has an admin interface which basically executes DML to configure the proxy or the configuration is on tables. So for example, the table, my SQL servers. In these tables, there are all the servers that are configured as a backend for proxy SQL. And in our case, what we have introduced for these new features is these new variables, these new columns, these GAD port. So we have hostname port, then the GD port. GD port is basically where the binlog reader is listening. So for example, we can run select, hostgroup ID, hostname port, GD port from my SQL servers. I'm just removing everything else. So we have two hostgroup, hostgroup1 and hostgroup2, where GD port is enabled, so it's not zero. So a proxy SQL will connect on that port and retrieve the GD. And all this is happening in real time. So there is a new tables, show tables from start. So in start, the start schema is where all the metrics are being collected and specifically we are interested in this table. So select start from start, start SQL GD executed. So here we have the proxy SQL is collecting GD from those three server. This one is the master and those two are the slave. And in real time, it's telling you the GD executed set and how many GD events has been read so far from that server. If you executed it a few times, you will see that of course those counters are changing. And you can also guess which one is the master and which one is the slave. Oh, wow. Yeah, okay. So show create table starts. My SQL. So this table also collect, this table collect all the statistic of all the queries that are being executed. And it also, now there is a new columns that is queries and query the GD sync. So select star, select queries, GD ID sync. So here at the host group, there are some servers that have no these features enabled because this actually is a proxy SQL disconnected not to one cluster, but there are multiple cluster behind it. So it's only tells you where the queries has been executed. So a lot of queries have been executed without GD enforced while other query have been executed with GD enforced. So it will read, when it's reading, it's making sure that the server is in sync with the master. Same is over. Okay. Is there any other question? Actually, the demo was going to supposed to be a bit longer. Yes. Two hours? Yeah. You can finish the question at that time. Sure. I will set up if you can answer questions. Yeah. We have question while.