 Hello. Good afternoon, everybody. Thank you for coming in this rainy afternoon to hear about the various MySQL proxies. Before we start, I'm curious to know how many here use MySQL router, or I've tried to use it at least. Okay. What about ProxySQL? Oh, wow. And what about MariaDB MaxScale? Okay. Excellent. So, good. So, we can start in 20 minutes. So, I'm Colin. I work at Procona Inc. And before that, I was at MariaDB as well, doing stuff on MariaDB server. So, quick question is, what is a proxy? Basically, it's a very lightweight application between the client and the server. It has to be lightweight, because otherwise you introduce latency for not much gains. And there have been benchmarks done by multiple people, so you can see benchmarks against other proxies to see, you know, how lightweight it is. And typically, some proxies will use things like E-Poll, and so forth, and the threading will change as well. It is largely a man in the middle, you yourself, installed between a client and server. And of course, it can communicate with many clients as well as many server backends. And this idea of proxies is not very new. So, this is an image taken from Giuseppe Magia from 2007. And he couldn't join us this year at FOSTAB, but this is an image giving you an idea of what a proxy does, and leading you on to the fact that there is this thing called MySQL Proxy. MySQL Proxy was around 10 years ago. It was probably the first-ever MySQL proxy out there. It had an embedded Lua interpreter. Generally, you could define what it could do with query passing through it. And the Lua interpreter was handy, because back then, a lot of people would play this game called The World of Warcraft, and a lot of people could know how to script in Lua, so you could write really short Lua scripts for MySQL Proxy. And today other databases with embedded Lua interpreter include things like Tarantul. It is still used in MySQL Enterprise monitor from what I gather. And the Lua was extremely flexible. You could rewrite queries, you could add statements, you could filter, and you could just write these short chunks of Lua code. Lua was very easy to pick up and very easy to write for. It is unfortunate that this product is sort of archived now, but you know, never fear, because there are now multiple other products for you to use. So we'll start with MariaDB MaxScale, because I think this was, by and large, the first GA-ready proxy out there. It became GA in January of 2015, and the idea was percolating for a lot longer before that, and there were multiple alpha and beta releases before that. It is a level seven proxy router. It understands the MySQL protocol. It has a fully pluggable architecture. So what MaxScale does is it monitors the state of all database nodes constantly, and you can also react to monitoring-based information or hints, or even from filters. So, you know, MaxScale largely became sort of this Swiss Army knife, right? It was extremely pluggable. You could also do things like logging. You could write to other backends besides MySQL. The idea of MaxScale was such that it should be able for you to translate even to other non-MySQL backends if need be. It also has a DB firewall filter, which you can use because you can't parse SQL through regular expressions. You actually need to use a parse tree. You can feed it, Regex, but the parse tree needs to exist. You could also route via hints. You could rewrite queries. All this is still possible, and the query level analysis, you could also filter logs. You could also call external tools with MaxScale, things like MariaDB replication manager, or even MHA via scripting. It also provided schema-based sharding. One of the most popular use cases for MaxScale that became rather famous was the binary log server popularized by Booking.com. So a couple of talks ago, Jean-François he's got excellent talks about using the binlog server. And the idea there was to not use intermediate masters, so you would have masters served by MaxScale and then get all the slaves to read off MaxScale. And this has been obviously used in production, and other database, heavy users of databases have also got their own versions of binlog servers that are maybe not open source yet. And one of the most popular use cases was just to put it sitting in front of a three-node Galera cluster, three-node being the minimum to start a Galera cluster. And the MaxScale from 1.0 right up to 1.4 is linked against MyScale Client Library, so either LibMyScale Client or LibMarieDB Client, because you do need to have access to the parser. And for what it is worth, it is still getting development. The 1.4 branch is still getting development. If you pay attention to the GitHub tree, you'll realize that there should have been a release on the 1st of February to fix the blocking of prepared statements, but that release hasn't been out yet. MaxScale also managed to create some kind of ecosystem around it. The first known plugin was a Kafka backend written by Yves Trudeau. If you look at GitHub logs, it started probably around February of 2015, so shortly about a month after it became GA, and it was announced around middle of that year. And the context, obviously, was to have real-time change data capture that would come in via your MaxScale client to a Kafka backend, and you'd see this only appear in MaxScale 2.0, which was something that got released in August or September of 2016. And that was also the first known credible fork called Airbnb MaxScale. I say credible because you may have read news articles about another fork called GPLScale, and that was just a fork out of anger, I think. It was not a real fork that you could use. And Airbnb MaxScale, the idea was to have a database proxy focusing on connection pooling, reduce the number of direct connections to the MySQL database. They deployed Airbnb MaxScale on all of Airbnb from early 2016, and it's pretty much powering all MySQL that touches it there. And some of the features I've listed up there, and generally speaking, if a client completes a successful authentication with a backend MySQL server, Airbnb MaxScale will then serve the link between the backend connections and the client connections and the connection pool itself of the backend server. Now denialist query rejection as an addition for them was quite important because sometimes they use Ruby, and the Ruby VM will trash. And when it trashes, they've also seen bad queries go to do delete where 0 equals 0. And MaxScale would actually stop that from happening. This is obviously no fault of MySQL, it was a fault of their Ruby VM crashing. And this is not a problem that only Airbnb faced, other people have faced it too. So having Airbnb MaxScale is quite useful. So we are at FOSEDEM, and this is a free and open source conference, so I have to apologize that I spoke about MaxScale for length. But the abstract did say I was going to speak about MaxScale. So I would be cheating you if I didn't. So MaxScale 2.0 came out August or September of 2016. Same GitHub repository. It is not linked against the MySQL client libraries any longer. It's replaced with SQLite, and SQLite is now known to parse MySQL. You can also do change data capture to Kafka. You can also push your binary log events to Avro or JSON. But the most important thing was that the license had changed. It went from GPL V2 to this thing known as the business source license. How many of you are familiar with the business source license? Well, that's quite a lot of you. It was in the news. It was in the news. Yeah, hard to avoid. Yes, okay. So the business source license is time-delayed open source. It is not the first time this has been tried. GhostScript and the Aladdin license did this as well. It was GPL after a year during the GhostScript days. In this case, they tell you you are not allowed to use MaxScale in production if you have, well, more than three database instances in a free fashion. I don't know how that is enforced except through this use limitation. And after three years, it becomes GPL. And Richard Stallman himself, back in the day during the GhostScript Aladdin license debacle, he said he considered it a problematic compromise because it gave us free software after a year. But thankfully, that was back in the GhostScript Aladdin days. I'm guessing many of you even remember GhostScript or Aladdin because that was in the mid-90s when free software was sort of just sort of becoming a bit more famous. So thankfully we have choice now. MaxScale router fully GPL v2. So thank you Oracle. It became GA in October of 2015 and they made a labs release about a month before that. And I guess if you look at the release notes, you also see that they had internal releases so they were playing with this for a while. The idea there is obviously to do transparent routing between all applications and back-end MySQL servers. They also have a plugin interface via a harness that you can use. So the harness provides dependency tracking, loading and unloading of plugins, configuration, logging framework and so forth. It can do failover. It can do load balancing. It can also distribute application connections in a round robin fashion. So it will forward the MySQL packets to a back-end server without ever inspecting or modifying them so you get maximum throughput. And I think one of the coolest things is that this is one of the key features that you need if you wanted to start playing around with MySQL InnoDB cluster. And your host, Frederick, has actually got a less-than-five-minute video on YouTube that you could watch in your own time. So just search for MySQL InnoDB cluster on YouTube and you will be able to see it. It's a very good video about how you can use MySQL with router and group replication and the new shell, which I'm guessing there will be talks about. And then there is proxy SQL. The main author, Rene, is actually in the audience. It has been stable since December 2015. There has been many times at many conferences that people did talk about proxy SQL. I would say 2015 was the year we went full circle on proxies, right? Because we had three proxies become GA in one year. And the selling point for this is it's by DBAs, for DBAs. And at Prokona, we obviously like proxy SQL a lot. So we included it inside of Prokona actually to DBA cluster 5.7. We've also got a proxy SQL admin tool that you can use for configuring your PXCs to have proxy SQL in front of it. And the idea behind this of course is to improve your database operations, have H8 topologies and so forth. It also does things like connection pooling, read-write splitting, things you'd expect from a proxy. I think the very important things for me as opposed to just going through this entire list is that it's runtime reconfigurable so you don't have to restart your proxy SQL. This is extremely friendly for users or DBAs. And the monitoring built-in is also extremely useful. This monitoring can also then be pushed out to things like Prokona monitoring and management and so forth or consumed by other things. And it does query rewriting. And there have been multiple blog posts and performance tests done on this. And I think one of the best things about proxy SQL is that Rene has spent some time comparing proxy SQL to other proxies out there including HAProxy, Nginx. And he released this sometime in January of this year. So well worth taking a look at this comparison because, hey, if you find a problem with said comparison, you're always more than welcome to actually report a bug on the mailing list. Proxy SQL, of course, has a few missing features if you want to compare it to Maxcale. One of them is front-end SSL encryption. So that's going from client SSL encryption to the proxy to the application. It does support back-end SSL encryption. And there is an open issue, issue 891. This is kind of important for HIPAA environments and so forth. And I believe it is on the roadmap. It will get fixed in time. So if you care, you can track this on GitHub. There is no bin log router and from what I understand, very large Internet sites may open up their bin log routers for you to end up using. If you want to stream binary logs to Kafka, there's this wonderful thing called Maxwell's Demon that you can also use. So it's another tool that is actually used in production and bin logs to Avro. How many here use Avro out of curiosity? That's what I thought. And I think the cool thing about Proxy SQL is it has lots of great resources. It's great when other people talk about your project, not just you talking about your project. And we see things, we see Mako Tusa have extensive blogs, several lines cluster control also works with Proxy SQL and they talk a lot about it and how you can use cluster control. PCN talks about it, Prokona obviously talks a lot about it. So you can get a lot of resources. And since we are at FOSDEM, it's also worth talking about the health of these projects. So they're all at GitHub and MaxKill has been around probably the longest. Router, like every other MySQL software, is not really developed in the open so it is sort of dumped onto GitHub from time to time I guess. That's why you may see less contributors and so forth. But that's in no way the integration of how the software is being developed. It's just a different process. And Proxy SQL also has a bunch of many stars, multiple forks. The other thing is to see pull requests. There are a lot of pull requests on Proxy SQL as well. So do these statistics tell you that the project is healthy? Kind of. If you see regular commits, issues being closed, it shows that people care. GitHub has this other very interesting thing called punch card. I thought this was kind of appropriate for FOSDEM as well because this is the punch card for MaxKill which you see lots and lots of activity from Monday to Friday during working hours. Seems fair. This is the punch card for Proxy SQL, where you see activity all the time. Not because Rene is madly typing away. There are many other contributors to this but I think this is sort of the spirit of open source is you also work not when you're paid to work on it. We work on it because you love working on it and you improve it and that shows in the product. And a metric that I don't think is a metric but if you care about Google trends, this is a joke of course. This is insert time to laugh is this Google trend thing and the red is Proxy SQL and it starts spiking a lot in August of 2016 and I'm guessing you all read the news and know why. So what do you use as we come to wrap this up? I think MySQL router is going to be very interesting going forward. So it is something to definitely watch. I think Proxy SQL is great for you to use today because it's got a lot of support. It's well integrated. It works with other additional tools like PMM and so forth. And if you need the binary log router today before someone else makes a binary log router open source, there is GPL MaxScale that you can use that works and I highly recommend you if you need a bin log router to use that. But there is also one other consideration to think about is that what server are you using? If you are using MySQL and Pekona server, you can use all these tools and it wouldn't be a problem. If you are using MariaDB server it is pretty obvious MySQL router will not work for you so that one choice is now gone and that would only leave you the option of Proxy SQL and MaxScale. So again, server choices also play a role and I would be remiss not to mention Vites. I didn't put that in the abstract but Vites doesn't sell itself as a proxy either. It is a database clustering solution for horizontally scaling MySQL and it powers YouTube. That alone should make you sort of open your eyes. So if you go to Vites.io, you will actually be able to try this out with something like Kubernetes. You don't have to run it in the cloud. You can run it yourself and if you run it yourself, you install something like at CD or ZooKeeper so you also ensure your cluster view is always up to date. Vites is definitely interesting because it makes connections using very lightweight Bison that each connection only uses 32 kilobytes of RAM. This is very unlike regular MySQL connections where there could be anywhere between 256 kilobytes of RAM to even 3 megs. It also does the whole rewriting of queries so you can, you know, no limit clause, no problem. You can make sure that you get the query. You don't see a limit. You add it automatically. So, and Vites, of course, works with MySQL and MariaDB as well. So there are a bunch of resources worth taking a look at. The proxy SQL Google group is extremely active. The MariaDB MaxScale Google group is also quite active. MySQL router doesn't have a mailing list but it has this forum which you need to log in to use. I'm not a huge fan of forums but it is a discussion place and the Vites forum is extremely active as well and they have a Slack group as well so you can join the Slack group and chat. I'm guessing not many people here like to use Slack. We are at FOSTAB. Okay, so before I wrap it up, insert ad for Kona Live. If you're interested in MySQL, you should probably be there and you get your 30% discount if you use that code. 20 minutes and 43 seconds. Thank you. I'm open for questions. I'm open for questions if you have any or if you plan to shuffle. Yes. Can the proxy rewrite a subselect to a join, for example? Why do you want to rewrite subselects? Now, if you use MySQL 5.6 or MariaDB 10, they can process subselects. Join is more performant. That is a good question. I have not had to do this recently because I trust the optimizer but this is possible, maybe. Renate. If you use regular expressions, you could probably get it going. But yeah, I don't think you need to now because it kind of just works in the optimizer. Trust the optimizer. Any other questions? Here. Oh, sorry. I couldn't see. The question is, what if you want to do sharding? They do range-based sharding so you can get some level of sharding. It's not a proper sharding solution but you can get some level of sharding. Yes. They do. Well, does router? Not yet but it's in the roadmap. Yeah. The other two do. And Vitesse obviously does sharding too. But much harder to set up. By schema. By schema, yes. So they have their own limitations in terms of sharding. Question. So the question is, where would you put the proxy on the application servers or dedicated servers and the advantages and disadvantages? If you're going to use it for load balancing, maybe you want to keep it on a dedicated server. I would say dedicated would make more sense in case your application server goes down or runs out of memory and kills your proxy. So I'd just keep it on dedicated machines. Though from what it's worth, it was from Alkin and Rene a while back they made a presentation about how proxy SQL has one proxy SQL instance has hundreds of servers sitting below it as well and it works in production. So I would just use a dedicated machine overall. Okay. Thank you very much. Thank you.