 Hello everyone, welcome to this session where we will be doing our VITES maintainer talk, and the talk is titled, Scaling MySQL with VITES. My name is Deepthi Sigredi, I'm the technical lead and a maintainer of VITES. Hey everybody, thanks for coming. My name is Matt Lord, I also work at PlanetScale and I'm a VITES maintainer, I've been focusing on V-replication which is one of the things we'll be talking about today. We'll start with an overview of VITES and then we will go on to the migration part of it and a demo. So these maintainer talks are typically structured as an intro and a deep dive and the first part will be an intro. Some of you may have already heard of VITES so you can feel free to let your attention wander in that case until we get to new features because we will be talking about what's new. To start with, what is VITES? VITES is a cloud native database, it's massively scalable and the way it achieves scalability is through sharding and there are really no theoretical limits to how big your VITES cluster can be or how much you can scale with it. It is also highly available and it's compatible with MySQL, both 5.7 and 8.0. So VITES was built as a scaling solution for MySQL and that is its origin story. VITES works with various database frameworks, with ORMs, with legacy code which talks directly to MySQL and with third-party applications. What VITES does is that it presents a logical database view of many physical MySQL instances to applications. And this logical database is what we call a key space. And the physical databases are basically representing partitions of your data. So if you have a large amount of data, the data is being split up into multiple shards and these terms key space and shard are important because we'll come back to them. And the way to think about a shard is that every row in a table lives in exactly one shard and if you combine all of your shards, that is your entire set of data. What VITES is doing when you have an architecture like that is that queries are being routed to the correct physical database. So as far as application is concerned, it looks like a single monolithic MySQL instance but behind the scenes, queries are going to be routed to the appropriate MySQL instance. VITES supports both GRPC and MySQL clients. Basically it speaks the MySQL protocol. And clients can have a single connection to a VITES cluster which behind the scenes results in multiple connections to individual MySQL databases. This is what the VITES architecture looks like. From an application point of view, connections typically go through a load balancer which is not actually part of VITES, it can be any load balancer. Then queries go to the component that we call VTGate. VTGate is a lightweight, stateless proxy which understands the MySQL protocol but can also serve GRPC requests for data. And parses the query, plans the query, decides where to send it. The component to which it ends up sending the queries is called VTTaglet. And VTTaglet is a sidecar management type of process for MySQLD. So it mediates all access to MySQL and it is how we achieve high availability. So there are typically multiple MySQL instances in a shard, in a replication topology, and you have many shards. So each shard has a primary, all rights always go to the primary. And it may have one or more replicas of one or more different types. So there's a specialized type called RD only, which is used for long running analytics queries and so on. So the primary and replicas are running in this replication configuration and VITES will monitor for failures and repair failures and make sure that each shard is always serving, always available. So this management component that tracks the failures and repairs them is called VTORC. In addition to that, there's a control demon layer called VTCTLD through which users can manage their VITES clusters. So for instance, if you want to run a maintenance operation, you may want to actually plan a failover and move the primary to a different MySQL instance so that you can do an OS upgrade or software upgrade or whatever it is, right? And there's also a web UI called VT admin which talks through VTCTLD to all of the other VITES components to provide a management view of the VITES cluster. In addition to that, there is what we call a toposerver. This is a metadata store that allows different VITES components to discover each other. VITES is in production in many places. Some of the notable ones are Slack, Square, HubSpot, and PlanetScale which provides a database as a service. And so has many, many thousands of VITES clusters running. GitHub is also an interesting one because it took them a while to migrate into VITES and they have a very large amount of data. So I sort of talked about this already a little bit, but these are some of the key adopters. Slack has been running 100% on VITES for several years now. GitHub runs on VITES, JD.com in China. They are running 10,000 plus databases on VITES and then PlanetScale's database service. We have a robust community. We have about 15 maintainers. And there are 81 companies from which people have contributed to VITES in the last two years. All of these stats are from DevStats and feel free to go look at them as well. Out of these 81 companies, we got PRs from 47 companies. And there were 200 plus PR authors and 400 plus contributors during that two year period. And obviously, when we say contributions, we are counting non-code contributions as well. So these are issues, issue comments, reviews on PRs, documentation, all of those things count towards contributions. Now, let's talk about new features. We just did a GA release that's with us 18.0. There are several things which are new in this release. And we talk about them in the release blog, but I'll go over a few of those here as well. In addition to that, I'm actually going to cover what is new in the last one year. So some of this may be repetitive for people who went to the KubeCon in EU. But I just wanna make sure that I cover everything that happened since the last KubeCon in North America. So we've done three GA releases in the last one year. And we've also done 12 or 15 patch releases. We typically do a GA release every four months, so that ends up being three releases a year. One of the things that's new are not new, but new is point in time recovery. So we added support for point in time recovery in Vitas several years ago. But the way it was implemented required people to run a third party bin log server. And all the bin log servers that are available for MySQL are either unmaintained or difficult to operate or whatever. It's really not been practical for people to run those things. Or it's just an additional thing that you have to run, maintain, make sure it's highly available. It just adds to the deployment burden. So what has happened in Vitas over the last couple of releases is that we have built out incremental backups within Vitas, which basically just depend on saving the bin logs. And using those incremental backups, we can now do point in time recovery, either to a specified GTID or to a timestamp. And you can actually do this across shards for a key space. So you can recover every shard in a key space to the same point in time. And we'll compute what GTID that needs to be and replay the bin logs until that point. We have experimental support for foreign keys. This is one of the biggest things that's landing in 18.0. Historically, Vitas has said that people can run with foreign keys and certain things won't work. And that is still true that certain things won't work. But the number of things that won't work is much smaller now because we have actually brought the logic of supporting foreign key relationships into Vitas. And currently it only works for unsharded or within a shard. But in the future, we plan to support cross shard foreign keys, which is something that you simply cannot do with vanilla MySQL. We have also added native view support across shards. So a view definition can span data that's across shards and it will work. You will get data when you select from a view that physically lives in different shards. This is, again, something that you can't possibly do with vanilla MySQL if you are doing some sort of application-based sharding or custom sharding. We have much better support for union queries, derived tables, subqueries, all of those things. We've added support for prepare, execute, deallocate through the MySQL protocol. We already had support for the prepare protocol, but now these things work through the MySQL protocol as well. We've built a session-aware connection pool. Previously, if you wanted to override certain system variables on your session, you would end up getting a dedicated connection to the backing MySQL and that could lead to connection exhaustion at the MySQL level, because you were bypassing the connection pool. And that created a lot of problems, so we finally said we are going to make the connection pool session aware so that the same connection can be reused by the next client if the settings haven't changed. We can now do query buffering during migration cutovers and get near zero downtime cutovers for migrations. Previously, this was only supported for resharding within existing VITAS clusters. We also have more efficient replication through support for no blob and compressed bin logs. We have built support for safer, safe MySQL minor version upgrades through the VITAS operator. Previously, we would be shutting down MySQL with fast shutdown and there were minor versions where the upgrade won't work, so that has been fixed. We've built a number of MySQL syntax extensions. The one I want to highlight is VExplain. VITAS has a standalone tool called VTExplain that you can use to see how the queries get routed to different shards. VExplain lets you do that from a MySQL client. We've changed our CLI to use Cobra and that just gives a much better user experience for users of the CLI. And along with that, we've also built a structured server API on the server side that the CLI talks to so that responses are in JSON format and are easy to parse versus just being raw strings that are much harder to parse. We've also implemented or changed our flags to use Viper and this enables us to auto generate the docs and in future we can support dynamic reload of configuration. And we have more efficient connection pooling coming in the next release. It's already been merged to main, but it didn't get into 18. And that's going to be much better than what we have today. And also non-recursive CTE support is coming soon as well. Cross-shad foreign keys planned, I already mentioned that. So now I'll turn it over to Matt to talk about migrating to it as. All right, thank you, D.T. So I'm just going to highlight what features we're going to be leveraging conceptually, a little bit about your application. And then we're going to jump right into a demo. So what this talk is about is, as you can guess from the title is at some point you're going to reach the limits of what you can do with a single MySQL instance. So normally you'd have a replica set with a primary and one or more replicas. But eventually you just can't scale that vertically anymore. You may literally not be able to get bigger machines than you currently already have. It becomes cost prohibitive because the cost is exponential. As you start getting larger and larger machines. Or for various operational reasons it might just not be practical anymore. It becomes much more difficult to do online schema changes, to manage backups, and a whole host of things at some point once you're managing your entire data set on a single server. And so that's when you realize at some point, okay, we're going to have to start partitioning our data set. And that's when the test comes in. That's when most people start exploring different options and that's when they will become interested in the test. And the replication is the technology that allows you to take that single standalone MySQL instance, import that into the test, and in the process, shard that data set as well. And we'll actually, in the demo, we're going to do the two types of sharding that people would typically do at the same time. So one is vertical or functional sharding. So where you say, okay, we've got this subset of tables, which are all related to each other, but they're not tightly related to the rest of our tables. So they may be products, all related to your products, are related to orders, whatever it is. So you can split those out and put them in a different database. And then even, that gets you a little bit of the way and then eventually you realize we still need to horizontally split these tables. And so we're going to do both of those. So where you take half of your table or fourth of your table and that's housed in one MySQL server, the other quarter is on another and so on. So we're going to look at how that actually works in just a second. VReplication is a framework that supports many different kinds of data flows. So we're going to be looking at migrating into the test. There's a move tables command which supports that. But there are other commands as well. So once you get into the test and you're there, let's say you start with just two shards and you realize, okay, sometime in the next year we're actually going to need to split again and go to four shards or could be eight shards, 16, whatever it is. So there's a reshard command which allows you to do that. For online schema changes, it actually leverages VReplication under the hood to execute those. And then there's some additional things, event streams is quite popular as well. So I think as your company grows, you also have to integrate with other systems. A data warehouse, for example, being a typical one. And so VReplication provides an RPC, which then allows you to basically stream. So copy all of your initial data if needed, and then stream changes that are happening across your entire Vittest cluster into an external system. So the Debezium connector, for example, is a pretty popular one. There's a Debezium for Vittest plugin that's used by a lot of the larger shops. There's some additional ones like messaging. We won't talk much about that, but that's, if you're using something like Sidekick, for example, which is Redis based job queue, you can use the messaging feature in Vittest to accomplish that same type of thing without having to spin up a whole other stack just for that particular purpose. Okay, and again, if you have any questions, we're gonna have time at the end. And it can be about what either of us have talked about. It can be about the demo that we'll see coming up, but that'll be immediately following. But this just gives you kind of a visual representation of what the migration will look like. So VReplication includes various primitives. You can see a couple of them here. The row streamer is what will handle the copy phase. So conceptually, you can kind of think of that as a MySQL dump to copy a snapshot of all your data consistent with a particular GTID snapshot. And then the bin log streamer is what you might guess from the name. And that just allows you to replicate data and changes that happen after the fact, which is essentially the same thing as a MySQL replication. So this is just a high level flow of what we're gonna be doing. So in this example, the commerce database is where all of our data is. So that's what we started with when we first created our application. We've gotten to the point now where we can't manage everything with that single instance. And so we're going to split off these customer oriented tables. Those, the customer and C order tables are specifically related to customers. And we can shard both of those tables by customer ID, which is what we're actually doing. So a customer's records are always housed together. Our customer's orders, rather, are always housed together. So you'll have data locality as well. Because even though you can do cross shard reads, of course, there's a performance penalty in doing that as well. So data locality, that's one of the pretty important things when it comes to performance. And we're gonna have two shards, we can see that here. So I'm gonna largely refer to them as shard one or shard two. But the reason it says shard dash 80 and 80 dash is, as Deepthi mentioned, there's a concept of a key space, which is the logical database. And each key space has a 64 bit key range. And then each record gets a key space ID, which is what determines which shard it lives on. So that's represented normally with hex digits. So it'll be 16 digits on either side of the dash. So the shorthand notation for that dash 80 is 16 zeros, and actually dash seven, and then 15 f's, cuz it doesn't include the upper number. And then the next one is 8, 15 zeros to 16 f's. So it's just splitting that particular one in half. And there's a link here if after the fact, you wanna look at the demo, what it actually does, how it works. And then at the end, when we get into the Q&A, we can also look at, if you have questions or you wanna look at something, we can actually peek in a little closer, look at the Vitesse cluster. So what we've set up ahead of time just to save some time. And I've scripted this just so that you don't have to watch me fat finger some commands and then have to hunt down where I've made the typo and spend a lot of the time on that. But we started up a Vitesse cluster, has all those components that Deepthi already talked about. And we have our single uncharted commerce database. But we have these three MySQL instances and that's the primary and the two replicas. So again, that's pretty typical of you, if you're running wherever you're running. That's a pretty typical installation where you have a primary, one or more replicas. So you use the replica for high availability, for additional durability, for backups and many different things. So the only difference in your case, as typically, as in that previous slide with an RDS instance, is that initial commerce database is not gonna be part of the test. So in that case, you would start up an external VT tablet. And that VT tablet then will interface with your on-prem or whatever it is, your RDS instance in that previous slide. And that is what allows that instance to effectively become a part of the test cluster, and allows you to then migrate your data from that external instance into a VT test cluster. So all we have now is that uncharted key space. And we have our C order customer and product tables. And you can see the schema here. They're using auto increment columns, which is pretty typical. We'll see at the end that we actually leverage a VT test feature called sequences that allows us to continue to use the tables in the same way. It's charted, so you can't use auto increment anymore directly, because then you could have duplicate IDs across your shards. So VT test sequences allows you to get that same behavior once we shard the tables. We just add a foreign key. So as D.P. mentioned, one of the big new features in VT 18 is foreign key support. And again, what that really means is that we've moved the management and implementation of that up into the VT gate layer. So that's what gives you that single logical view of your entire VT test cluster. And it allows you as a user, but more importantly, your applications to work with VT tests. And now you're sharded databases, as if it was still just a single mysql of the instance. Now then we try to add, this is demonstrated actually working. We add an orphaned record into the C order table, so there's no customer for that ID. And then we get the expected foreign key constraint failure. And just to show what data we currently have in these tables, there's not a lot. We've got a few customers. Each customer has one order, and there's just a couple of products. And now we're going to spend a second key space. So this is the customer key space. And it's going to be starting up VT tablets and mysqld. So now we've got two shards. So in total there's actually going to be six mysqld processes, because we've got one primary and two replicas for each shard. So that's what you see starting up here. And you'll notice that they each have a unique identifier for the cell, which is going to be logical concept in VT tests, which kind of maps to an availability zone or more generally a failure domain. And then they'll have a unique identifier within that particular cell. And that'll come into play because when we start looking at the data and where it actually lives, and kind of demonstrating that everything is replicating as we expect across these key spaces, we can connect directly to the mysqld instances using the socket that each of those have. So now we can see here we're talking to the VT gate. So we've connected to the VT gate process. And if you're using the MySQL command line client or any other tool, whether it be like PHP, MyAdmin, or MySQL workbench, whatever it is, you can connect directly to the VT gate. Just like if it was a MySQL instance. And you can see now we've got our two databases, those are our two key spaces. And then you have the other typical ones that you would see when you're talking to a MySQL D instance, information scheme and so on. So it looks just like we have this cluster that's now spread across nine different MySQL D instances as in VT tests. But as far as the application is concerned, it looks the same as it always did. Looks like they're connecting to that single MySQL D instance and they're able to query both of them just as they normally would. Here you can see what that looks like in the cluster. And again, you can see the IDs here, those will come up shortly. So 301 is the primary for the dash 80 shard and 400 is for the 80 dash shard. So then we're going to actually run a vreplication command. This is move tables and this is what's going to migrate our data. So it's going to migrate it from the commerce key space to the customer key space for these two tables. And it starts up and it goes into the copying state. So that's where it's going to initially start copying all of the data in bulk. And then once that completes, we have very little data, so it was immediate basically. Then it goes into the replicating or running phase and at that point it's just going to be replicating changes, right? So the entire migration could take weeks or even months. It could take if you have a terabyte of data, it could take a couple days to do the initial data copy. And then you may be running where you're replicating changes for some time as you plan the final migration with your application teams and so on. So then we're replicating changes. And the next thing you would typically do is there's another vreplication-related command called vdiff, which allows you to then. So now if we've copied this terabyte's worth of data, we want to be sure before we make any changes. So the application is actually talking to the Vitesse database that it has all of the data that we expect. So vdiff is a very handy tool to just give you that confidence. So it'll tell you whether there was any mismatch overall. And if so, what the mismatches were. If there was an extra role on the source or if there was a difference between the record somehow on other side, then you would see that there. There's safety mechanisms put in place. This is just the routing rules, just to demonstrate there quickly. But the basic idea is that now, remember these tables live on two different databases. And this will ensure that you can't accidentally now query the wrong one. So however, even if you explicitly tried to query either of those two tables on the customer database, it will get routed in VtGate to the commerce side. Because that's where all of our traffic is still going. We haven't switched anything. We haven't actually made any application affecting changes. Now we're going to insert a new customer in a new order. So we did this in VtGate, so this went to the commerce database. And now if we look at the customer database, this is where we're connecting directly to the MySQL instance. So remember 301 was the primary for our dash 80 shard and 400 for the 80 dash shard. And here we can see our new records were replicated as expected from the commerce database over to the customer database. And you can see it's pretty uneven right now between the shards. Most of it is on the dash 80 shard. And that can happen, of course, when you have a small data set. But as your data set size grows, it will be pretty evenly split. I won't get into necessarily how that works. So there's a V-index concept. But by default, we would typically use xx hash as kind of a mechanism to generate that key space ID from your primary sharding columns. So we're using customer ID for these tables. Okay, so now we've confirmed everything is working. We're going to run, you'd run Vdiff again. Typically because you can be in this state for a long period of time as you're building confidence, as you're getting things in place. And so you might run the Vdiff immediately after the copy phase ends, but then as you get to the point where you think you're ready to actually switch traffic, move your applications over, you can run it again just to make sure that everything is correct. And there's no data mismatches. And then you can use the switch traffic command. And what that does, as you might expect, is now it basically flips it. So now, anytime you're reading from the C order or customer table, no matter how you're doing it, it's actually going to get routed to the customer database now. And you can see that reflected here, basically all the traffic has been switched. And this is the routing rules again, just to show you that now that's flipped. Even if you explicitly say, connect and say select star from commerce.customer, it's actually going to get routed to the customer database. And just to demonstrate that replication, now we'll just do a delete. So by default, the application will set up replication going back in the other direction. You can opt out of that, but typically you would want that because you want the ability to revert. So let's say you switch traffic and suddenly application starts getting errors. You want to be able to cut the traffic back over and basically revert that so that things continue to work while you can then investigate what that issue may have been. So just to demonstrate that that's replicating back the other way, we delete the customer. And now we're connecting directly to the customer, the primary for the customer shard, it's unchartered, it's just a single shard. And we can see that now that new customer and its record is gone. So that's a combination of the replication and the foreign key that we added. So this is where that setting up, automatically setting up that reverse replication flow comes in handy. Again, you may realize, okay, now we've moved everything over to Vitesse. But now there's some slight difference in behavior. There's not a 100% compatibility yet with MySQL 8. And all of the query related features that it has, all the syntax. So let's say maybe you realize that something is producing a little bit, results that are slightly different. The output is a little different. Or maybe you didn't realize that Vitesse doesn't support recursive CTEs, for example, is a big one that we're working on now. So then you can reverse the traffic back, leave things running. You're back to the starting point where you were before. And you can take however long is needed in order to address that particular situation on the application side or whatever it may be. And then once you've done that, then you can switch traffic again. And you can switch back and forth as many times as you need. Until finally, everything is good, and then you can complete. So these are all sub-commands for the move tables. And you can see the client binary here. And then when you complete, it's gonna clean everything up. So the tables by default on the original commerce database will be cleaned up, along with all of the artifacts from our actual migration. So the workflow itself, the routing rules that we looked at before, all that stuff will be cleaned up. And here you can see, remember I mentioned the test sequence feature. That's what, in the demo, we added those as well. So that's what allows us to remember we were using auto increment columns, which is typical on those tables. So you can continue to have that same behavior, even though now the table is sharded. So VTGate will basically take over the management of those effectively auto incrementing values. This just shows the cleanup, and that's it. We've got- Let's switch back to the slides. Okay, so we've got not quite five minutes, but we've got two and a half minutes for questions. And hopefully answers. And it can be about the demo, it can be about the slides. Anybody has any questions? Remember the demo also, and the slides, which are already up on skedge.com has a link to the demo as well. Which is just a scripted out version of what you would normally run, or what you would typically run just as an operator. But just saved us some time. On the slides, it looked like it said CDC. I'm assuming you mean change data capture, is that correct? Yeah, sorry, there's so many acronyms and initialisms. My conclusion is that in some environments, say if we have a pipeline that's reading and consuming that, and the pipeline itself goes down, then you're at risk of using your database storage. What is the solution for that? Do you have to just move the CDC data out of the database itself into some other S3 buckets, or how do you deal with that type of data? So that's a pretty common issue. So we handle that through resuming effectively. So that was one of the things that I mentioned on the V-replication slide. So remember VTGate is the component that gives you this single logical view of your entire data set, spread across however many MySQL instances and charts. So there's an RPC that VTGate has, which leverages V-replication, which allows you to say, you say, I want all of the changes. Well, from the beginning, you just say, I want everything from the beginning for these tables. So it'll give you a snapshot effectively of all those tables and their state, and then start replicating. But each event that gets replicated in that CDC stream also has metadata, like the GTID, which effectively gives you a position in that stream as well. So if your consumer goes down for any period of time, you can resume from where you left off using that metadata that was in the event stream. We are actually out of time. So there are some resources that you can join. Okay, we have one minute. So we have the website link, the docs, tutorials. You can go to the source code on GitHub. Please do join our Slack workspace if you're interested in being a part of the community with us.slack.com, should be easy to remember. And you can scan this QR code to provide feedback on our session. Thank you very much for attending, and we'll be right here to take more questions if there are any. All right, thank you.