 Welcome, everyone, to the VITES maintainer talk. My name is Deepthi Sigreddi, and I'm the tech lead for VITES, and I'm a software engineer at PlanetScale. Hello, I'm Kazimira Salulis, software reliability engineer in VITES databases team, and we'll be presenting user-style of VITES invented. And I'm Florian Poinsard. I'm a maintainer of the VITES project, and I'm also a software engineer at PlanetScale. So today, we're going to start by presenting to you what is VITES. We're going to give a brief overview of the project, and then Kazimira will talk to you about the vintage user story, their adoption, and how they're using VITES. And finally, Deepthi will talk to you about the new and upcoming exciting features of VITES. And at the end, we'll have some time for questions and answers. All right, let's get started. What is VITES? So VITES is a cloud-native, skatable, and distributed database. It is built around MySQL. In fact, it started in 2010 at YouTube as a skating solution for MySQL. Later, it was donated by YouTube to CNCF in 2018, and then it became a graduated project a year later in 2019. It is massively skatable because we have sharding in VITES, which allows you to partition your data across multiple primaries. And then it is highly available because whenever there is a failure on the primary, we're going to detect it and repair it. So VITES is widely used in production by many, many companies, from small to extremely large. And we have a few key adopters, like Slack, who is running 100% on VITES. So every time you send a Slack message, it's going through VITES. We also have GitHub. They're running all of their issues and pre-request on VITES. And I think they have a little bit less than a million QPS on average. We also have Vinted, which Kazimiras will talk about soon, but they do about 2.2 million QPS. And finally, we have BlendScale Database Savers. They have approximately 10,000 different VITES clusters running in production. So of course, we're an open source project. We have about 15 maintainers working on the project. And over the last year, we had a little bit more than 250 contributors, from which 115 were code contributors. All of those came from 47 companies. And the code contributors came from a little bit more than 20 companies. So before I move on to the more technical part of the talk, we should introduce four key words for about VITES. So first one is a key space. So a key space is basically the same as a MySQL logical database. So you can have the user key space. And inside of that key space, you're going to have a bunch of tables related to the user data, like users, users, metadata, et cetera. And then we have a shard. A shard is basically a subset of that key space. And you can have one or more shards per key space. And every shard is composed of one primary and at least and one or more replica. We have Vschema. So Vschema is a specification on how you want to shard a specific table. And it is user-defined. It's flexible. And that's it. And then we have a Vindex, which is used inside of the Vschema. That Vindex is basically going to be the same as a MySQL index. Here's a diagram of the architecture of VITES. On the right-hand side, we have shard 1, 2, 3, N. And like I said before, those are composed of one primary and then one or more replicas. And then we can see if we look at the primary, for example, we have MySQLD and VGTablet. So the MySQLD is basically the MySQLD instance process. That's where we're going to store the tables, et cetera. And then attached to it as a sidecar, we're going to have VGTablet. So VGTablet is going to send and manage all the queries. Sorry, it's going to send all the queries down to MySQLD. And it's going to help manage MySQLD as well. VGTablet is connected to VTGate, which is the central component here. And it communicates through GRPC and the SQL protocol. So yeah, VTGate. So that's the most user-facing component. It connects to your application using also GRPC or the SQL protocol. And it's going to receive a query. It's going to parse the query and interpret the query. And then it's going to send it down to the proper shard and key space. In yellow, we have the control plane. So VGT-CLD is the CLI tool to administrate VTS. VT-ORC is the orchestration tool. This is what allows us to detect a failure and repair the failure in the cluster. And finally, we have VT-admin, which is the UI of VTS, the administration UI of VTS. And in red, finally, we have the topology servers. So those can be usually ETCD, Zookeeper, or I think that's all that we support. Maybe we support more. But usually it's ETCD. So why would you want to use VTS compared to vanilla MySQL? So we're trying to be as compatible as possible with MySQL. So we're adding query supports over each release to be as compatible as we can. We have restarting, which allows you to restart your data in as many shards as you want. We have materialization, which is almost the same as the MySQL materialization, but we update the view in real time. We have cluster management. So we have different tools to allow you to manage your cluster. We have non-blocking online schema changes, seamless backup recovery operations. We also have query consolidations. So let's say you have a student spike in your queries using the same query. We're going to send that query only once to MySQL, get the result, and respond to all the queries at the same time from VT-gate to not overload the MySQL. And then we have automatic failure detection and repair. So that's thanks to VT-org. Now I'm going to pass it to Kazimela to talk about the vintage user story. Hello again. I will present how and why we use AdVinted with those clusters. Let's start what we do at first. Vinted applications allow to sell secondhand fashion easily and safely. And with over 80 million users in 19 markets in Europe and North America, we are on the aim to make secondhand fashion prefer a choice worldwide. I'm very excited to present here in Paris because even all the France, our user community, is the biggest. That's why a few years back we decided to start here our second brand Vinted Go that helped to make better control of shipping packages and take an impact on climate of shipping. And currently Vinted Go works in over 90 cities in France with over 1,500 pickup and drop off points. And whenever a user adds some item to sell on Vinted Marketplace, sends any message or ships any package, all this information is stored in one of our VTest clusters. We are currently running VTest version 11 with some backports, mostly some bug fixes or improvements for operations. All the clusters runs on beer metal managed by all the hosts are managed by Chef. And for any cluster level operations like monitoring, backups, provisioning, we write workloads using a temporal system. As VTest is cloud native, we would like to move this way. And first steps we're taking, we're moving VT gates to Kubernetes. And currently we have over 80 clusters. And these clusters run over 200 shards. It's all these shards consisting from eight VT tablets paired with a SQL instance. And they're spread through two regions, like in every chart four VT tablets run in one region and four in another. And all this allows us to serve over 1.2 million queries from primaries and about 1 million queries from replicas. We have about 30 terabytes of data. We run 5,000 VStreams to export data changes from VTest to other systems like data warehouse or search clusters. And why VTest? One of the features that I like in VTest is throttling API that allows developers when they need to backfill some data or change a lot of data in the database, they can write a job which asks first the database, is it healthy enough? And then write a small batch and ask again, in this way, allowing to throttle if the database comes up to its limit and allows for us to avoid some outages if we accidentally try to write too much data then the cluster can sustain. When we first moved to VTest, we initially had multiple SQL clusters that were started from application side. But we saw that we need a better solution. We started testing VTest, brought a benchmarking tool that allowed to capture our workload from application. Like, we had a feature switch that when we send some query to VTest, we would send the same query to Kafka. Then we would query on some metadata. And from that metadata, we would collect queries from the same request into separate batches and could replay on a test environment. Then, let's say, migrate that environment to VTest and compare how it goes. And our tests show that VTest added just 2 milliseconds per query latency, which is very nice with all the additional features that we get. So we switched to VTest. Initially, we switched to version 8 and later upgraded to version 11. Used the same benchmarking tool to see how the upgrade would go. Another very nice feature of VTest is moving data inside the cluster and even importing from my SQL clusters. It's called VReplication. And we did for verifying this data move. And it allows not only to easily move the data, but after cutoff, it allows to replicate data back in case something goes wrong, like to provision two small cluster on the other side. Whatever, we can switch back to previous configuration without losing any data. But the primary reason why we chose VTest is that it allows horizontal sharding on top of my SQL. When you have one cluster of my SQL having only one table that you do not have like what you move out. And it's already too big, like migrations takes multiple weeks, backup sticks very long, and the query latency starts to rise because of the load. This veritas comes to help. We initially moved our clusters as they are into VTest, like every cluster moved to one key space with one shard. And later on, we started sharding the most loaded ones horizontally. This required some more tooling. First thing that we did, we enabled testing in our CI CD pipeline on a horizontal sharded VTest that developers could just stain the code. Oh, these tables are horizontally sharded. And the test would run on actually sharded VTest and tell them if anything fails because the query compatibility on sharded is a bit different. And there's other things to watch out like cross-shard transactions or cross-shard joints that are more complicated. And the other thing that tests do not catch all the query. So we built in our query logging solution that collects all the unique queries and we can run VTXplain tool. Similar to machine call explain, VTXplain allows to see how VTest would execute the query. If it fails, if it succeeds, and running on these unique queries pretty much is enough to understand if your code would work when switched to horizontal sharding. What benefit we got there is historical screenshot from our Grafana when we switched one of the most loaded key spaces into horizontal sharding. How latencies dropped on queries going to primaries. And all the magic behind this was like VTest horizontal sharding that's four times more hardware serving this key space. So here is a very short overview what we do with VTest, but you can read in our blog, winter.engineering. This story is in much more details. We have multiple blog posts. Winter VTest Voyage, how we migrated from SQL to VTest, how our CI CD pipeline works, and we hope to have more blog posts about VTest in the future. Thank you. We'll move on to the new and upcoming features. We did VTest version 18 in November of last year and VTest version 19 earlier this month in the first week of March. So I'll cover what is new in these two latest releases and then talk a little bit about what we plan to do in future releases. Let's do query serving first. This is always a hot topic because MySQL is a moving target. They keep adding syntax, new syntax, new features, and we have to keep up with those. Some of the things that have been added in one of the last two releases, basic support for selects using commentable expressions. We've added experimental foreign key support. Most VTest users don't actually use foreign keys because at the scale at which people run VTest, it's simply not practical to use foreign key constraints. There is a perceptible performance hit, but there are smaller users of VTest who would like to keep the foreign key functionality while still adopting VTest. So this is still experimental, but it looks good and people can try it out. The other thing that we added was support for views across shards. This is something that you just can't do in MySQL. Views are local to a given MySQL server, but with VTest you can actually create views that are cross-shared, that are managed by VTest, and queries against those views work even when the underlying data has to go across shards. We've added better support for unions, derived tables, and subqueries, and we also revamped our benchmarking website. So this is the VTest subproject that we call, Are We Fast Yet? Benchmark.witters.io. And we run a certain number of benchmarks every day, and we also measure every new release against the previous release to make sure that there are no regressions in terms of query performance. And these include OLTP, OLAP, which are Sysbench, and also TPCC workloads. We have a new UI for the website, and it's much more usable than the previous version. We've added Vschema validations. As Floron mentioned, Vschema is how you specify to VTest how you want to do the horizontal sharding, and it's done per table. Previously, it was possible to have errors in your Vschema, and you would not find out until you actually started running the queries, whereas now we actually do some validations up front. We've added some MySQL syntax extensions. We do this periodically just to provide usability for people who are using VTest, that through VTGate you can execute some MySQL-like commands, which are not actually MySQL commands. And one of them, the new one, is VExplain. So the VTExplain tool that Vinted used to check the query compatibility before they actually did their horizontal sharding, you can actually also use on a running VTest cluster using VTGate. And VExplain will tell you how that query is going to be executed, which shards it will go to, whether it is cross shard, which Vindex will be used, and so on. We've also added support for deletes and updates with joins. Moving on to other parts of VTest, the CLI migrations and so on, we've completely rewritten the VTest CLI to use Cobra, and we reimplemented all of the flags using Viper. Previously, we were using the built-in Golang flag library, which led to flag pollution. So if different VTest binaries shared the same Golang package, they would all get all the flags, whether or not they were relevant. So with Viper, we've actually been able to clean up all of that. And we also have the nice benefit of auto-generating the reference documentation for the CLI and the flags, and the flag reference docs actually make sense because whatever you see in the docs are all valid flags for those binaries. The other thing Viper lets us do is dynamic reload of the configuration. Right now, a very small subset of the flags can be reloaded dynamically, but this allows us to add more flags to that set of dynamic reloading of flags, which means that you don't need to restart the process, you don't need to send it a SIG hub. When the config file changes, it will automatically be reloaded. VTest comes with a Kubernetes operator, and in the most recent release, we actually started letting the operator manage MySQL minor version upgrades. It is still considered too risky to do a major MySQL version upgrade purely through the operator, so this is minor releases only. So for example, MySQL 5.7 to 8.0 is probably not a great idea to do an in-place upgrade on a running VTest cluster. But if it's an 8.0 minor version, it's generally safe. And you just change the image tag, and the operator will manage a rollout of all of the components with the new MySQL version. We did a security audit of VTest last year, and the results were published by CNCF as a blog, and there were some recommendations that came out of that. The high priority things were fixed right away, but there were some medium priority issues that we fixed in the last couple of releases. We've also added support for incremental backups and point-in-time recovery. Point-in-time recovery is a VTest feature that has actually been around for four or five years now, but it depended on being able to read the bin logs from a bin log server, which had to be yet another component that you needed to deploy, make it highly available, and all those things. And that was complicated enough that most people would actually not use the feature. We have reimplemented the feature to not require a separate bin log server. We can just use the binary log files from MySQL to do point-in-time recovery. The other thing I want to talk about is near-zero downtime migration cutovers. So there are various operations you do in VTest where it is possible to have downtime. Your primary goes down, you have to fail over to a replica, you have a certain amount of downtime. It might be seconds, it might be a minute, but you also have planned operations when you're doing maintenance on your cluster, you want to be able to fail over from a primary to a replica MySQL so that you can do upgrades and other types of maintenance on your primary. And those planned failovers, we want to be zero downtime. And VTest has a functionality for buffering right traffic to the primary, and after the failover, once you have a new primary, that traffic will be sent to the new primary. Reads will still work. You can still do replica queries. While this operation is happening, there will be no interruption. But for writes, there will be a pause and then the queries will be reprocessed so that you don't return errors to the client, to the application. But we've also implemented this buffering functionality for data migrations. So if you're using VTest move tables to move data from your existing MySQL into a VTest managed cluster, there is a point in time where you've finished all the copying and you want to cut over traffic to the new cluster. Previously, there would be between 10 and 30 seconds of downtime during that cutover. But now that we've implemented buffering, you don't have that even during those cutovers. And the same thing applies to resharding. When you do resharding, as Kazimira has mentioned, when you do resharding, it's possible to switch back and forth and keep the old cluster and new cluster in sync. The same thing works for importing data or doing vertical sharding where you're migrating tables from one key space to another. You can go back and forth. But each time, there would be a short amount of downtime. So what we have done is with buffering, that is as low as possible. Next up, performance and reliability. We've implemented more efficient connection pooling. Previously, we would end up basically doing a first in, first out type of connection pooling where all of the connections would get used, which meant that when you initially started, it would take some time for things to warm up because you had to establish those connections to MySQL and fill the pool. We now have a much more efficient connection pooling where existing connections, which are free, can be reused much faster instead of having to fill the pool. So you may not even fill the pool if you don't need to. So that just improves the startup times for clients. And it's more efficient in terms of memory as well. We've implemented faster hashing for sharding. Hash is the most popular function that people use as their sharding key. And what this means is that whenever you're trying to access data, first we have to compute the hash and that tells us where that particular role lives. So this is in the common path and faster hashing means that all the queries get faster. We have faster in-memory aggregations. When you have cross shard queries, what we end up doing sometimes is fetching data from multiple shards and then in VTGate, in memory, we may be doing a sum or account or an average or some other type of function and all of those have been made faster. And the end result of all this is that when we ran the benchmarks on version 19, the most recent release, it ended up being faster on all the benchmarks compared to the previous release. We've also implemented backups for online DDL cutovers. So online schema changes is one of the areas where VITAS really shines because this is a problem that the MySQL community has grappled with for many years and many people have built tools to make those things easier and in VITAS, we have a VITAS native way of doing online schema changes and we've just been making performance improvements to those to make sure that we don't overload the database when we are attempting to do a cutover for an online schema change. And we've also implemented faster cleanup of the artifacts that online DDL creates. The other thing we did, which I think is a major win is that the topology server, whether it's at CD or ZooKeeper, where VITAS stores its metadata, tends to be a hotspot sometimes because all of the VITAS components read data from it and they set watches on keys in the topology server. They pull it periodically to reload the topology data. So what we have done is that, for instance, there are many places in VITAS where we will be saying fetch all the shards from the topo. And previously, we used to get them one at a time. What we are able to do now is to say, here is the prefix key, fetch all the shards in one call. So this actually leads to an order of magnitude reduction in the number of network calls we are making to the topo and the load that we are putting on the topo. We've also made functional and reliability improvements to the incremental backup feature. Okay, so that was the previous releases. What's coming up? We have started working on multi-table deletes and updates. We will be publishing fewer and smaller Docker images. Over time, the Docker images have grown and we want to slim them down. And the way we are going to do it is that we are going to stop shipping MySQL binaries in the VITAS Docker images. Instead, you can run with any published MySQL Docker image and just run it with VITAS. We've implemented, we are in the process of implementing Vindex hints. So a Vindex hint is where you tell VTGate which Vindex to use for the query because sometimes if there are multiple Vindexes on the same table, it may pick a less efficient Vindex. We are adding support for more functions in cross shard queries. We are also adding new metrics in various parts of VITAS. For example, we have a whole bunch of new metrics for throttling. We will be adding more support for CTEs. So deletes and updates using commentable expressions, recursive CTEs, those are all planned. And we also plan to spend more time doing performance optimizations. We have a bunch of resources. There's the website where we have documentation, getting started tutorials. So the website covers the gamut from people who are just getting started to people who have been running VITAS for many years, but there are some new things and there is new documentation around that. Source code is on GitHub and we have a Slack workspace. There is a link from our website to join the Slack workspace. And there are a couple of blogs that are worth reading. One of them was done by Slack, how they actually migrated everything, all of their data to VITAS. And another one from Square, now Block, where they migrated their Square cache app to VITAS and also did multiple rounds of resharding. That's everything we had. So it's time for questions. Yes. Thank you. In the Vinted presentation, you mentioned in your architecture something about eight. I'm not sure. It was eight nodes per shard. And can you explain why eight? I'm a newbie. Does it mean that one primary, seven replica? So you have a lot of read. Yes. We run one primary and seven replica, but like two replicas reserved for exporting data to a data warehouse or some on-the-fly queries from developers if they need to investigate something. So effectively like six serve production and two extra for longer workloads. You talked about a failover time of up to one minute. So how do you detect a failure of the primary and what needs to be done during failover? So I can understand this one minute delay. So we have a component called VTRK. Which monitors all of the VT tablets. VT tablet is the side guard instance that is managing the MySQL. So VTRK, it's basically polling all of the MySQLs every second essentially. And if three consecutive pins fail, what VTRK will do is that it will check whether the replicas are able to talk to the primary because it is possible that the VTRK itself is network partitioned from the primary. The primary may be fine, but VTRK is not able to reach it. So VTRK does the failure detection, which takes up to five seconds after your MySQL actually goes down. And then it'll do a failover to one of the replicas. And typically it doesn't take one minute, but it is possible for it to take say 15 or 20 seconds. So typically we see failover times of around 15 seconds, but sometimes it may just take time for, some replicas may not be eligible for promotion, the way you have configured the cluster, right? Because maybe you have distributed it across regions and you want to keep your primary in a certain region. And it's possible that the replica you've, that is eligible for promotion is not fully up to date. Then you have to find something else which is up to date and replicate all the pending or the missing transactions from there, which is why I said it could take up to a minute. Mostly we don't see it taking up to a minute as long as the replicas are pretty up to date. Okay, hi. You were talking about point in time recovery, utilizing the MySQL bin locks inside VTAS, so to speak, without using a bin lock server. So the question I have is, where are those bin locks kept? And to what size can they reasonably grow before you have to cut them, allowing for which window of time without breaking down major function of the VTAS service itself? Right, so typically when VTAS is managing the MySQL, there will be a directory specified where everything is stored. All the MySQL files are stored there and at the MySQL level, you are specifying a bin log retention period. So that could be three days, that could be seven days. Depending on the amount of disk you are able to provision for the VT tablet, you would specify that and that'll be as far as you can go. You will not be able to go anywhere beyond that. So if you can afford to provision a huge disk, then you can have a 30-day retention and you can go back up to 30 days. What we have seen is that people typically run with seven days of bin logs and sometimes even shorter, depending on the volume of their updates, because there are some people who just do frequent updates. They have like a modified at date or last access date. Then you just produce billions of bin logs and you simply can't afford to keep them for 30 days. So all that is configurable. Hey, thanks for the talk. You mentioned that Vintage is running AT closers. Did you, if the technology scales horizontally, do you do it for maintenance purposes? What's the reason for so many closers? Like we effectively run one cluster per separate service that we have. We do not allow separate services to connect to the same with test cluster, like separating database. Hello, thank you very much for the talk. Question. Do you support MySQL 8 and MySQL 7 databases where we are trying different features? With us does support, well, we have just end-of-life MySQL 5.7 support because MySQL 5.7 itself is EOL. So technically, we really only support MySQL 8 at this point, but we are keeping a certain amount of MySQL 5.7 support so that people can migrate. So it is still possible to put with us in front of an existing MySQL 5.7 database and migrate all of the data into a MySQL 8 database. And one last question. Is there any plan to make it work with Aurora database in AWS? We don't currently have any plans to make it work with Aurora. So the stance when it comes to something like RDS or Aurora or MariaDB, all of which people have actually run with us with in the past is that it is very difficult for us maintainers to maintain that many versions. So if there is interest from the community and someone really wants to run it, then they will need to try it out. And we can help them, but it's just too much for the maintainer team to maintain so many different flavors. So there are people who ran it with RDS for a long time and that worked. So I noticed in your architecture diagram there was a little bit about big data and how that queries, does that query differently? Can you explain more about how that works? So in VITS you typically configure timeouts for the queries and those timeouts don't apply if you are running in a specific mode. So when you're doing big data or analytics you can specify that you want to run an OLAP mode and then we don't time out the queries. They can run for a long time. So it's possible to provision a different type of replica and you can provision it with more resources if necessary and execute OLAP type queries. I think we are out of time. So we are happy to chat offstage, but thank you everyone for attending. This was really great and I loved all the questions as well. Thank you.