 Okay, thanks for those of you that don't know me. My name is Nick Visas. I've been working with Rene as a proxy scale commit over the last few months, and I'm also my scale DBA. Before I kick off this presentation, just a question, a show of hands for me to gauge. Who's familiar with asynchronous replication? Semi-sync? Proxy SQL? Okay. GTID. Okay, great. So let's kick off with proxy SQL. Since everyone's familiar with this, I won't focus on it too much. It's a layer seven database proxy. Okay. Primary integration is with MySQL protocol. There is some support also for Clickhouse. There's a birds of a feather session at quarter to five in room H3228 for those of you that are interested. For those that are not familiar with Clickhouse, it's blazing fast. You should check it out. So moving back to proxy SQL, it's built for high performance, high availability, and it's feature-rich. So there's a wide variety of stuff that you can configure to get various functionality as well as to tune the proxy as you'd like it to behave. So quick overview of the architecture. Clients connect. The requests are evaluated. And depending on the configuration you've put into proxy SQL, certain actions will be performed. So key areas are read, write, split, sharding, query, write. And for this session, we'll be focusing on read, write, split. And specifically the challenges that we have with master-slave replication. So we're all familiar with replication. The biggest problem is replication lag and being able to retrieve data that is fresh enough from our slaves. So, I mean, we have this problem in asynchronous replication. In semi-synchronous replication, we can be more sure that our data has reached our slaves. So the integrity is there, but we can still read stale data. So in order to avoid reading stale data, especially when an application connects both to a master and a slave, it has to maintain two sets of connections. And it also has to be aware of how fresh the data is on a slave. So looking at this in more detail, okay? Typically an application will write data to a master. That data may or may not have been replicated to the slave. The application will then have to connect to the slave and read that data if we're doing a read, write, split. So stale data can be received. This makes our developers unhappy. With ProxySQL, the read, write, split, we overcome the problem of maintaining separate connections to masters and slaves because we have the concept of reader and writer host groups. So the application can just blindly send all requests, reads, writes to ProxySQL. ProxySQL will route the writes to the master and then route the reads to the slaves depending on the configuration that we set up in the query rules. So the big benefit is that we don't have to make application changes in order to route our read queries to a slave or to route our writes to a master. All traffic is sent to a single IP, to a single port so you don't have to reconfigure it on failover and we can dynamically modify the configuration we have in ProxySQL to down slaves when we need to do maintenance, to add more slaves when we need to scale. So that one area is covered but even with this configuration, stale data can be an issue because, I mean, ProxySQL in itself does provide replication monitoring so we can set the threshold for instance of 30 seconds and say if our data is more than 30 seconds old or even one second old, don't read from the slave. So we overcome that issue but there's still a possibility that our data will be one second stale and in addition to that, this monitoring happens from ProxySQL and it's basically polling the slaves at a configurable interval which has a minimum of one second. So up to two seconds of delayed data. So we kind of have to think about how replication works and go into a bit of detail to see how we can overcome this problem. With asynchronous replication, as we all know, it operates basically with a bin lock file in position so all the events are written to the file, it's sent to the slaves and all of these events are processed one by one. So both in terms of processing, it's kind of a little bit of a bottleneck, things have to happen in sequence and masters and slaves have to be completely synchronized. In addition, when we're doing things like failover, we have to ensure that our slaves are at a certain position in order to connect to our masters. So GTID was introduced to help solve this problem. Of course, the main difference here is that we have a unique identifier for every transaction which is not unique just to that host but unique to every single host within the cluster. So the specific area that I want us to keep in mind for today's session is the structure of the GTID. It's basically a colon separated identifier with the source ID and the transaction ID. So why is GTID important for us? Essentially, this gives us the ability to detect which transactions have been executed and have not been executed. It allows auto positioning so we can just restructure our master slave topology as we wish. And starting from 5.65 and later in 5.69, variables were introduced so that we could actually delay our selects on a slave and wait for a certain GTID position to be reached before executing the statement. So this is quite good. It's a better approach. It allows us to get the data we want and avoid stale data but there is still a delay because we have to wait until that data is replicated and if it doesn't replicate up until a certain point, we'll get an error on the connection. So in MySQL 5.7, a variable was introduced that's pretty important. Now, this is available in Pocona server and Oracle's MySQL but not in MariaDB as of yet. So the variable is SessionTrack GTIDs and basically when you enable this, you can get either the own GTID, so the GTID generated for a specific set of transactions, actually the last transaction that you executed, or the full set of GTIDs returned in the OK packet. So whenever I'm executing a write, I will know what GTID position a server will have to have processed in order for my transaction to be consistent. So having this information, I can query a slave and say, what's your GTID position? Okay, you've already reached that point. I'm good to execute my select statement. So, I mean, we thought about how this can be leveraged in Proxiesql and Rene was like designing this and thinking about how can we do it. There's basically two approaches to solve this problem. The one is making a request to the slave and asking it, what's your GTID position? So, I mean, this implies polling, this implies some delay. Essentially, it's the same problem that we have when we're monitoring for slave delay with a regular host group. So we're polling, we have that delay of the interval it can work, but it's not the best approach. So what's the other option instead of pulling, pushing? I want to ask if there is any question because otherwise the next section would be very difficult to follow if something there wasn't clear. Is everything clear? Yes. Any questions? I mean, it's very important that this, everything discussed up until this point is completely cleared for the next slide. So if anybody has questions, gray area, okay? So, the push method. Instead of polling slaves to get their GTID, why not have the MySQL server itself have a process running there and push the GTIDs that have been processed to all the Proxiesql servers connected? In this way, we avoid having to request the data, we avoid having to wait. The slaves just process binary log events and then send their GTID positions. So, I mean, it's more optimized and it's especially important in a large scale deployment where you have a lot of hosts, both MySQL servers and Proxiesql servers. So to solve this problem, René came up with the Proxiesql bin log reader, which is essentially a lightweight process that will run on a MySQL server. It's designed with Proxiesql's principles of high availability, high performance. It's very lightweight. There was a lot of work done to minimize the CPU usage and the network overhead. In fact, how does the process work? It processes bin logs as if it were a slave but running locally on the same MySQL server. It strips all of the information except the GTID and it even strips the source ID and just sends the transaction ID until the source ID is changed. So there's very little network traffic. There's very little CPU overhead and it also has an auto-restart mechanism in case it fails and also reconnect in case Proxiesql is not able to contact it. So let's just have a look at this on a diagram. Yes. So it will send it and then it will not resend it until it changes. So basically what's happened is that it takes the status of the old GTID executed on the server and as soon as a client in this case will be Proxiesql connect to this process it will send the full GTID set and from that moment on will send the first GTID executed with all the server UID and the transaction ID and then from that moment on it will only send the transaction ID until the moment a new UID is generated. So it's just a way to minimize network bandit as soon as possible because it's designed to have hundreds of Proxiesql connecting to every database server. So of course you should try to minimize network usage as much as possible. Okay. So MySQL will send the replication data and all of the GTID and whatever needs to be written to the slaves and the binlog reader will run locally read all of these as René just described and send them to all the Proxiesql instances. So every single Proxiesql instance that connects to MySQL server will have, if it's been configured to collect that, will have the exact position for every instance. So at this point we know what position our slaves are at but how do we achieve the consistency? So Proxiesql can be configured to enforce this consistency on specific host groups and if you've enabled this what Proxiesql will do is when a session is initiated and some transactions are occurring Proxiesql will check the status of a host before rooting a selector and it will check to see if the GTID the last GTID that was executed for that session on a group of servers has been executed on the slave before sending the query. Okay, so let's take a simple example of two slaves and one master. It will check the first own internal in-memory list. It will not find it. It will check the second. If it still does not find it it will then root the query to the master. So you'll always have fresh data. Your latency will be low. And you can even configure this just on a host group. You don't necessarily need the replication host group. So let's just have a look at this in terms of a replication host group. So the application will send the reads and writes to Proxiesql. DML will be routed to the writer host group. This will be written on the master. The next request, the next DQL request will come in, some select or something like that. Then Proxiesql will evaluate which slaves are up to date and if none of them are up to date it will root it to the master provided that the master is included in the reader host group. So to sum up let's have a look at what is supported for GTID consistent reads in Proxiesql. With basic master-slave replication we support both asynchronous replication and semi-synchronous replication. For multi-master, InnoDB cluster and group replication is also supported. Finally a quick note about the requirements. GTID is required for all of the servers that are part of the host group and of course MySQL 5.7 so that we can have that information returned in the OK packet. And finally at the current moment the bin-let format must be configured to row. So what time is it? Time for René to do a live demo and for all of you to see how this works in action. Take it away. Questions so far? Yes. Basically the way it works is that the GTID of the transaction executed from the client is returned from the OK packet. So for every client session Proxiesql is tracking the GTID for that specific client. So when it needs to execute a select statement it's checked which slaves has that specific GTID. It does not check the last GTID you've written for whatever connection. Just for that specific client. Just technically yes. Actually you might also send queries to a slave that is a few seconds lagging behind but you can ensure that the right that has been written by one specific client has reached the slave. One important thing is that you don't have to enable these features for any sort of traffic but you can specify which select statement you want these features to want enabled. You don't enable it globally. You enable it for a type of select statement. It's technically not correct. It doesn't query them to check if they have the GTID but it internally check if those slaves have already the data. Now the principle of check if the slaves have the data first and then the master because normally if you're using replication host group the master isn't the same host group as the slaves because it's part of the reader but with a lower weight. Being sent to the slave is higher than probably being sent to the master. What if the GTID is not executed on none of the slaves and neither on the master because actually technically it is possible that the bin log reader reply slightly later even unlikely but the possibility is there. The same principle apply of how proxy SQL behave when there is no server in our host group. It just wait for the server to come for a server to be available. So technically you might have a slightly delay. The worst case. The error will come after 10 seconds it is the default for a server not being available in the host group. Let me go back let me see which is the best slide to probably this one. Let me just go to the slide. So the way to work is that the application is writing here so then the master reply with the OK the proxy SQL knows what is the GTID in the token packet to reply to the client with OK then by the time that the client is sending another query you can pretty much ensure that at least the master has sent the GTID to the proxy. So the probability of having this guy sending a query before this one returned the GTID to the master is unlikely to happen. Technically possible but the probability is extremely low and then depend how fast are the slave to process the GTID and reply back to the proxy. But the idea is that we are eliminating all the extra network round trip because when this will execute a query a select statement this already knows what are the GTID executed both here and there. It doesn't have to check them. It already knows. Every client has his own state. So the GTID so if you have only one proxy what we are saying is very easy to do to implement because this proxy will know all the query that has been sent to the database server master, slaves, whatever but if you have multiple proxy then this process does not know what has been executed by that one. Technically can be implemented in some other way like relying only on the bin log reader so whatever bin log has been executed at that moment but a consequence of this might be that let's see, let me think it probably has to wait some interval to make sure that nothing else was executed in between. I'm not really sure how to implement this or even if what are the challenges. Yeah, okay so demo. So we are trying you cannot read here but I'm trying to zoom out. No, it's okay, right? Okay, so there were two things that I wanted to show one is one proxy SQL instance that is collecting metrics from database server at the same time so it's collecting metrics for all of them and I wanted to show the CPU utilization and the network utilization but unfortunately the VPN doesn't seem to work so I'm going to only to show another one. So in this session what we have is, okay, so I don't know how many of you are familiar with the way proxy SQL is being configured so it has an admin interface which basically executes DML to configure the proxy or the configuration is on tables so for example the table my SQL servers in these tables there are all the servers that are configured as a backend for proxy SQL and in our case what we have introduced for these new features is this new variables that is this new column that is GID port. So we have hostname port then the GID port GID port is basically where the binlog reader is listening so for example we can run select host group ID, hostname port GID port from my SQL servers I'm just removing everything else so we have two host group, host group one and host group two where GID port is enabled so it's not zero so a proxy SQL will connect on that port and retrieve the GID and all this is happening in real time so there is a new tables show tables from start so in start the start schema is where all the metrics are being collected and specifically we are interested in this table. So select start from start start SQL GID executed so here we have the proxy SQL is collecting GID from those two server that is this one is the master and those two are the slave and in real time is telling you the GID executed set and how many GID events has been read so far from that server. If you executed few times you will see that of course those counter are changing and you can also guess which one is the master and which one is the slave. Oh wow. Yeah, okay, so show create table starts my SQL so this table also collects this table collect all the statistics of all the queries that are being executed and it also now there is a new columns that is queries and query the GDD sync so select start select queries queries GTIB sync from so here in the host loop there are some servers that have no these features enabled because this actually is a proxy SQL disconnected not to one cluster but there are multiple cluster behind it so it's only tells you where the queries has been executed so a lot of queries have been executed without GDD enforced where other query have been executed with GDD enforced so it will read when it's reading it's making sure that the server is in sync with the master. Same is over okay, is there any other question actually the demo was going to be supposed to be a bit longer yes yeah I will set up a test if you can answer questions yeah, do you have question while