 Pozor na vsem. Taj prezentacijo je o proksisekru in miroring. In tako, kaj smo priča, tko smo vsega vsega vsega, ali bo se početno priča vsega. Kaj smo? Renek Anno, je autor proksisekru, je vsega vsega kompani. My zelo vsega vsega vsega vsega. Zato, nekaj vsega, kaj je proksisekru, nekaj vsega je zelo vsega, nekaj nekaj vsega je zelo vsega? Nekaj vsega je zelo vsega? Ok. Ok. So basically in actually what is, is a proxy that is protocol aware, that sits between the application and the backend, the SQL server, it gets requests from the application, it process them, decide what to do with them, and then performs action based on internal, some internal logic that is all programmable. Ok. So there is a bunch of features that I'm not going to go on this presentation, but the most important features that we are going to discuss is about multiplexing. And the reason behind this is to try to solve one specific problem that is quite common in my SQL. In my SQL server when you have a lot of connection, a lot of client connection, performance tends to degrade very badly. The higher the number of connection, like thousand connection or ten thousand for connection, performance drops drastically. Here we have graphs about one specific use case in which replication was lagging, and the only reason why replication was lagging was because there were too many connections to that server. Those connections were absolutely idle, they were not processing any requests. The only thread that was running and doing some work on this server was replication, and replication was just lagging. As soon as they killed all the, all the idle connection replication was able to catch up. And they identified this one, because they had two different server with same specs, one with connection connected to the server, one without, and the one without connection was not lagging. So the only reason why one of the two slave was lagging was because it has tens of thousands of connection. So quick overview of what are the various trading model that this can apply on any sort of software. So you can have either one thread per connection or in some other system you can have even one process per connection, like in Postgres. Or you have some thread pool. Pros in cons of both of them. Well, the pros of one thread per connection is that it is easy to develop, because every thread is just handling one connection. On the other hand, one of the problems is blocking an EO. So that specific thread has to handle all the specific connection and nothing else. On the other hand, if you are using thread pool, it is way more difficult to implement, because you have one single thread that has to manage multiple connection. By the same time, it is way more scalable, because you don't have so many connections. Sorry, you don't have so many threads. So my SQL, by default, it has one thread per connection. And of course, this has some cost. The cost again is not specific to my SQL, but any application that uses this model of having one thread per connection. Basically, in short, you have a lot of software thread, while the number of physical thread that are in the CPU are way less. So you have a lot of context switch, because the CPU every time has a switch from one software thread to another one. The number of registry has to do all the context switching. And you have mutex and contention between all those threads. And at the same time, the CPU cache, most of the time, is absolutely useless, because all the context has been moved in and out from the CPU cache. And for the same reason, you have to access memory. So access to memory is way more expensive than accessing the cache in the CPU. So what about implementation of pull, of thread pull? So normally what you have is one thread that is accepting of the connection, and then these threads normally passing the connection to a set of threads pool that is the one processing the request. In my SQL, this is not the default, but if you are using my SQL Enterprise, you have the thread pool, MariaDB and then Percona implemented their own version of thread pool. The way it works is that all the new connections are being accepted by one listener, and then this listener is passing the connection to a thread pool. And that specific thread pool is the one that is handed the connection until the connection disconnect. And of course, because of the use of the thread pool, performance gets a lot better if you are using this. Great. What about in proxy SQL implementation of thread pool in proxy SQL is slightly different. Instead of having one listener, what we have is that all the threads in proxy SQL they are all listening on the same port. So one of the thread is the one that is going to get the connection, and it will be handling the connection until the client disconnect. So it has some pros that basically there is no context switching between the thread because all the information are always stays located to the specific thread. Of course, it has also some cons because it can be happened that the threads are unbalanced, but this is a quite rare case. OK. So we have proxy SQL that has his own thread pool. Does this solve the problem of my SQL having a lot of connection? In reality, this by itself does not solve the problem. The reason why it doesn't solve the problem is that, of course, having proxy SQL in his own thread pool, it improves the performance of the proxy, but it does not solve the problem that you might have still a lot of connection to the base server. The reason being is that a lot of proxies, if they are layer 4 proxy, but also layer 7 proxy, what they normally do is that for every client connection they create one backend connection. So there is a one-to-one mapping. And, of course, this, of course, doesn't scale because if you have tens of thousands of connection on the proxy SQL, you will have tens of thousands of connection to the database server. So proxy SQL has his own different implementation. What it does is that it has his own connection pool and when a connection is being used after it executes a query, normally this connection goes back immediately to the connection pool, assuming that the connection is safe to be shared. When a connection is safe to be shared, it means that there is no transaction running in the connection, there are no temporary tables, there are no user-defined variables, et cetera, et cetera. So there are a certain number of criteria that define whatever the connection is safe to be shared or not. And if it is safe, just go back to the connection pool so the next client request can be executed in that specific client connection. And, well, so basically what we can have is that we can have thousands of front-end connection that go final into very few back-end connection. So basically this is how proxy SQL performs multiplexing. So again, you can define the number of back-end connection and the number of front-end connection can be extremely, extremely higher. Of course, there are certain criteria that automatically disable multiplexing. They are listed here. For example, if there is an active transaction, the connection cannot be shared until the transaction is completed. Or if, for example, if there is some table lock, so if you have executed a lock table or flash table with read lock and that specific connection, that connection cannot be shared again until unlock table is executed. Same thing happen with get lock. If you execute get lock, that connection cannot be shared anymore. Surprise if there are still a lot of application that use it. Other criteria for disable in being lock is that if you execute set SQL lock, SQL lock being equal to zero. After you execute this, of course, that connection cannot be shared anymore because otherwise this means that other connection will have bind lock disabled. And of course, this is not something that you would desire. And multiplexing is automatically re-enabled when you issue a set SQL lock being equal to one. So as soon as you re-enable it, the multiplexing is automatically re-enabled it. Another reason why multiplexing is disabled is when you execute create temporary table or if you use SQL lock found rows or prepare. Prepare is the text protocol version of the prepare statement. Other reason, there are a lot of session variables that when they are enabled, proxy SQL automatically disable multiplexing. Here I listed some of them, like set unique check or set out increment increment foreign key check, and so on. If the client execute one of those, multiplexing is automatically disabled. Now, by default, it's also disabled multiplexing for every query that has the app that is normally used for user variable or session variable. And no matter if you are setting it or if you are reading it. And this, of course, can be problematic if you are using some RAM or some JDBC driver that is actually... It wants to read all the session variables, so it performs some set statement in which specify this app. So for those specific query, you are also able to create rules that automatically re-enable multiplexing if there are such queries. OK, questions so far? OK. One thing to remember is that multiplexing does not automatically disable routing. Those are two different concepts. So you can have multiplexing to be disabled on a specific connection, but still you can have the client that send traffic into multiple connections. So one connection to the writer, one connection to the reader, or if you have sharding, you can have one connection to one shard and another connection to another with shard, and traffic is being load balanced in criteria. Now, let's show some example of how it actually proxy SQL helped with improving performance using multiplexing. What we follow is some example of performance improvement that we got at Shopify that they are using proxy SQL on all their infrastructure. So those are initial testing they were doing. Those are benchmark. They were running the same traffic directly on my SQL and through proxy SQL. So you can easily see the difference. I don't know if you can see the numbers, but here we are close to 15,000, while here we are barely around 700. So the number of connection was 5% of the original. So they drastically reduced the number of connection. And during the same benchmark they noticed that basically the number of thread running this with my SQL and this with proxy SQL. So thread running with my SQL was spiking up to 100, while when they were using proxy SQL the number of thread running was spiking up to 25. This was the maximum. Thread inside in ODB before they had 80 while with proxy SQL they have some few spike to 15. And more interesting is that the queue the thread in the ODB queue with straight connecting to my SQL was spiking to 100, while with proxy SQL was thread to 0. Next. So those are other graphs of some benchmark. Let's start from down actually. With proxy SQL traffic passing through proxy SQL the number of QPS is the same. Actually with proxy SQL is slightly higher so in proxy SQL they were having more throughput. And the number of the execution query time of the query before was around 25,000 while when the pass to proxy SQL was 6000. So the execution time of the query before proxy SQL. Proxy SQL was not filtering any traffic was not caching anything. It was just ordering the execution of the query before sending them to the server so it was basically reducing the number of connections to the server. Also the response time was drastically reduced here before the response time was averaging at around 1000. I think this would be millisecond. And when the switch to proxy SQL was 0.3 or 0.4. So huge performance improvement. Again those are other graphs. This is actually production traffic so before when they are connected to my SQL and when the switch to proxy SQL. So it passed from around 15,000 connection to probably they had 300 or something like this. Thread running in my SQL again before they were having around 20, 20,000 up to 35 when the switch to proxy SQL the number of thread running was around 8. Less than 10 for sure. And again they got huge performance improvement in the total query execution time that passed from an average of 10,000 up to 20,000 down to less than 4 something like this. So huge performance improvement. Those are these graphs about some spike traffic. So when they had those spike traffic with my SQL the number of thread running spiked to 439. Huge number. And the number of QPS on my SQL was around 35,000 at the spike. When the switch to proxy SQL and they had a similar spike the number of QPS was around 70,000 in this throughput. But proxy SQL limited the number of connections so it spiked thread running spiked to a bit higher than 100. OK. One of the misconceptions that people normally do when configuring proxy SQL is that they configure the number of backend connections to be as high as the number of client connections. If you expect the client to open 1,000 connections they configure that proxy SQL to open up to 1,000 connections to database server. In reality this is something that you should not do if you always try to configure proxy SQL to open way less connections to database server. And what happen if all the client executes some query at the same time and you have 100 client query sort the query in base on the start time and will basically queue them internally and execute them serialized so based on the number of connections that are open. And as I was showing in the previous slide is actually improved performance. And this is even more easy to understand on the next slide. So basically this was a benchmark done 16, so most 3 years ago done by Percona. So this is some read write workload using mySQL 5.7 and Percona server 5.7 with head pool and proxy SQL limiting the number of backend connections to 200. At low number of connections the performance of both mySQL server and Percona server were higher than proxy SQL and that's expected proxy SQL has to analyze all the traffic before executing it. But then as the number of connection increased at around 300 or something like this around 200 the performance with mySQL server dropped drastically the performance with Percona server dropped as well but because they had trade pool enabled didn't drop so much when I said proxy SQL was dropping the number of connections to the server to 200 and performance keeps stable no matter how many connections were on proxy SQL. So of course if here we were adding more connections again the performance in proxy SQL was going to be quite linear while mySQL and Percona were somehow dropping. Question? Yes? To the right master, to the second master we have to duplicate the status ok it doesn't duplicate so it will only execute it to the to the rider, to the master depending according to the query rules you have configured and of course if you are executing a select statement the select statement has to be executed in the same connection proxy SQL does not guarantee this but it really depends how you configure your query rules. This is why normally saying do not send all the select to the reader because this is wrong because your example is a very clear example of when you should not send all the select to the reader. And actually what you said is basically I'm highlighting this here that multiplexing does not disable routing so you can still have different type of routing so like the right statement goes to the master and the select statement goes to the slave and if you are sending the select to the master you should not send the select to the slave because it is wrong you should send the select to the master but this is why what I normally recommend is that you identify which are the select that are safe to be sent to the slave you don't send all the select to the slave yes I mean you need to understand repeat the question so the question is about what happens if you have a query that has succession variables so specify an act and it is a dml statement that goes on the writer and then the select statement you have configured a proxy to send the select statement to the reader if you are sending the select statement that use the same session variables to the reader the reader won't have that session variables because you haven't executed dml there so what I'm suggesting is that you don't send all the select to the reader but you only send to the reader and the select statements that are ok to be sent to the reader so you order in proxy SQL you check which are the query that are safe to be sent to the reader and you only send those instead of everything yes so the question was if here actually this one so the query time in those graphs they were about established connection they were counting also the time to establish the connection those are about established connection so in those graphs you don't even see the benefit of having proxy SQL and his own connection pool because those were already persistent connection any other question thanks