 So, good morning, happy to be in front of you on this beautiful Saturday morning. And I'm going to be talking about OLTP or OLAP. Why not both? In the context of the open source project VITES. Before I start my talk, I just want to have, get a show of hands. How many of you are familiar with what VITES is? A handful, okay. And how many of you are generally familiar with relational databases? All of you, okay, awesome. Great. Also, I want to mention that VITES is mainly an OLTP system. But what I'm going to show you in the later part of the talk are some new features which also make them useful for OLAP. But most of my talk is going to be about VITES and showing you various aspects of VITES. So let's start out by talking a little bit about me. My name is Jiten Vaidya. I am the CEO of PlanetScale. PlanetScale is a company that me and my co-founder, Sugusugumar, and started about a year and a half ago to make it easy for enterprises to use VITES. We both worked together many, many years ago at Informix. Then we worked together at YouTube from 2007 to 2012, which is where VITES was born. And now this is the third time that we are working together. I worked at various companies such as Google, YouTube, Dropbox, US Digital Service, mostly as a backend engineer. But one interesting fact about me, well, I think that it's interesting, is that I have also written an Android app. I really don't have a lot of design sense. So I consider that quite an achievement. Many, many years ago, 1988, I graduated from IIT, Bombay. The usual reaction from the crowd like you to that is I was probably not even born then. My email is jitentheplanetscale.com. Please send me email. And my Twitter handle is yet another empiricist. So without further ado, let's jump into VITES. So what is VITES, right? It's an open source project that was founded at YouTube in 2010. It's a sharding middleware system for MySQL that sits between your applications and your database clusters and gives your applications the view that it's talking to a single humongous relational database, right? It completely abstracts the fact that the application doesn't need to know that it's dealing with shards at all. This makes the velocity of future development on your application pretty high, as high as though you are working with a single database, right? This is very important. It also natively supports cloud. We will talk a little bit about that in further slides. And also it automates many functions for database administrator. Basically, when you have a handful of databases, you can afford to treat your databases as pets. But when you have clusters of databases, you have tens or hundreds or even thousands of databases, you can't treat them as pets. You need to treat them as cattle. And a lot of automation around the care and feeding of database servers is taken care of by Vitas for you, right? It's massively scalable. In a single key space at YouTube, we had I think about 35,000 hosts in a single key space. I'll talk about that a little bit more later. It's very, very highly available, right? That's what Vitas is. So let's talk about a few of the features and capabilities. It has a full MySQL parser built into it, which enables transparent routing to shards. So this ability that I talked of giving your application the view that it's talking to a single humongous database and not a sharded database is because we have a full parser, right? We also protect MySQL instances from toxic queries using the parser by adding limit clauses to your queries and by doing a lot of other things. Again, I will talk about that when we go to the architecture diagram. And unlimited horizontal scaling. I mean, we know at least one company that uses one, zero, two, four shards and we don't see any reason why you wouldn't be able to say run two, zero, four, eight shards, right? Massive horizontal scale is possible. Configurable sharding and resharding workflows. I think this is one of the coolest things about Vitas. There are other systems that allow you to shard your data, but those systems typically don't give you any control about how you want to shard your data. In contrast to that, Vitas allows you to specify which column you want to use in a particular table as your sharding column. And depending on what type of column it is, it allows you to specify a sharding function that makes sense with that particular column. And this gives you tremendous flexibility in terms of data locality. You have a lot of control about where your data goes. What this also means is that if you have an exotic use case, you can write about 40 to 50 lines of go, Vitas is written in go, and define your own sharding function, apply it to a column of your choice, and all other workflows around sharding, for example, the query routing and so on, as well as resharding workflows, all of that works seamlessly with your custom way of sharding, right? Again, an extremely powerful feature. Companies are using this for GDPR, right? Data locality. It's built for cloud. So when we started, YouTube's databases ran in YouTube's own data centers, and then because of a security incident, we were asked to migrate them from YouTube's data centers into Google data centers, which are orchestrated by Google's orchestration framework called Borg. Borg, as you all know, is the predecessor to Kubernetes. So because we had to do this migration, we went through a period of about eight to nine months. It was a very, very painful period, and but we lived through it. But what that means is that at the end now, Vitas is truly cloud native. Now this word cloud native is hurled around quite a bit. What exactly do you mean? What exactly does it mean when you say that something is cloud native? I think in the context of databases or in the context of stateful workloads, what cloud native means is that your distributed system should be cognizant of the fact that your masters or your pods in general are not long lived. Typically in the database world, your masters are very, very long lived. If a master goes down, somebody gets paged, they take a look at it. Database masters run on exotic hardware, doesn't apply in the Kubernetes or containerized world, right? It's going to run on another pod like any other pod. The pod might get evicted and all of that, your distributed system needs to deal with. So everything that we needed to make sure that databases ran well on Borg, we built into Vitesse. So fast master failovers, a great service discovery story, health checks and excellent observability. So all of this was built into Vitesse because we needed to run in Borg and that's why it runs very well in Kubernetes now. A few stats about Vitesse, just the overall point that I'm trying to make here is that Vitesse is very popular and growing more popular day by day. A lot of companies are using it, but it's not just how many people are using it, but who is using it, right? It's being used by YouTube, Slack, about 40% of Slack's production traffic runs on Vitesse, JD.com, Jiangdong, that they are the second largest retailer in China. They have tens of thousands of nodes running on Vitesse. Pinterest runs their as databases on Vitesse. HubSpot is very interesting because they run about 700 production databases on Vitesse, but none of them are multiple shared databases. They're all single shared databases. They still run them under Vitesse because they want to run them in Kubernetes. And they have been doing this for the last two and a half years. So a lot of people think that running stateful workloads in Kubernetes is an unsolved problem. Talk to somebody at HubSpot and they will tell you that they have been doing it for the last two years using Vitesse. GitHub, SquareCache, so this is a little funny screenshot from a tweet by Jack Dorsey, who's the CEO of Square and also Twitter. When SquareCache app became number one and Sugu, my co-founder, writes to a joke that okay, Cash App uses Vitesse, YouTube uses Vitesse, we need to go after Burger King now. So what's the problem that Vitesse solves fundamentally? It's that single node databases don't scale. You start out by adding memory, you start out by going from spinning disk to SSDs to NVMA, you start adding CPUs, you have this exotic machine that's about $100,000 running this huge database and still you realize that about six to seven months from now you are going to run out of capacity. What are your options? A lot of people many years ago decided to go to NoSQL because of these reasons because they needed horizontal scalability. But Vitesse allows you to horizontally scale MySQL databases with relational semantics which otherwise would not have been possible. So the solution is basically you shard with Vitesse. So Vitesse is the layer in between so that your app server doesn't have to be aware that it's talking to a sharded database. So Vitesse is the middleware and your app server talks to Vitesse and Vitesse talks to the underlying database clusters and does the right thing. So how does it do it, right? Here is where we are going to spend a lot of time on, right? On this slide. So this is the architecture of Vitesse. So those N shards there, they constitute what we call a key space, okay? So a key space in sharded in Vitesse terminology is basically a database. A key space with a single shard is identical to a database. Otherwise, when you have N shards like this, each shard has identical schema and it's basically your standard MySQL cluster with one master and multiple replicas which are all replicating from the master using the standard MySQL binary replication. Vitesse supports both statement-based replication and row-based replication but now we have deprecated SBR and we have said that RBR is the way to go. That's sort of the, that's how it's done in the industry anyway. Two main components of Vitesse are part of the serving path. One is VtGate, which is the stateless proxy and second is VtTablet, which is a minder process that sits with every MySQL. So for each MySQL, you get a VtTablet and you get N of these VtGates. These are stateless. You run as many as you want depending on how many app servers you have. You typically put them behind a load balancer. VtGate supports MySQL binary protocol as well as GRPC. So your app servers typically just use the MySQL binary protocol and connect to VtGate. VtGate has a full MySQL parser built into it. So let's say that this is a key space which has N shards and let's say that it's sharded using user ID or it's a, let's call it user DB and let's say that the sharding key is user ID. So let's say that a query comes which says select start from user DB where user ID equal to 2,514 or something like that. Vitesse parses that query. So VtGate parses that query. Looks at the where clause sees that the user ID, sees that it has user ID equal to some number, realizes that this key space is sharded using user ID, says fine, I'll be able to figure out which shard the query should go to. Looks at the number, applies the sharding function to it, figures out where that sharding key falls for each shard and says, okay, that needs to go to the shard number three and sends the query there. Now it does not send the query directly to my security. VtGate talks to VtTablet using GRPC. VtTablet also has a full parser built into it. Why do we parse the query again there? Because we parse the query, we need to make a determination about this query is a toxic query or not. If we think that this query is going to return millions of rows, we actually put limit clauses on the query. We rewrite the query and then send it to my security. Typically VtTablet and my security run on the same machine and talk to each other over a UNIX domain socket so that that communication is quite fast. VtTablet sends the query to my security, gets the results back, sends them back to VTGate, VTGate sends them to app server, sends it back to the app server. What happens if your query does not have a where clause? In that case, VtGate does a scatter gather. It just parallelly sends the query to all shards. Then it runs on the my security and the queries come back and the results come back. Okay? VtAS also supports read after write consistency as well as eventual consistency. But we intentionally decided that the decision about whether you want read after write consistency for your reads or eventual consistency for your reads should be left to the application and not, we should not do something clever in the VTAS layer for that, okay? So how do you send queries where you can live with eventual consistency? So do you all understand the difference between read after write consistency and eventual consistency? Who does not? Okay, let's talk about what that is. So the best example is that let's say that you have an app which allows other people to browse your profile and it allows you to make changes to your profile, right? So when you are making changes to your profile, you want any change that you make, you want it immediately reflected into what you are seeing. That's the use case for read after write consistency. But if I am browsing your profile, it doesn't matter if it's a few millisecond late or a few seconds late. That's the use case for eventual consistency, okay? So what VtAS allows you to do is that let's say that this key space is called userDB. If your application uses the database name as userDB, then all your read if you get to the master and you get read after write consistency. If your app uses the name of the key space or the name of the database as userDB at replica, VtGate, round robins, those reads across the replicas, right? So then you get eventually consistent reads. So this way you can and it round robins across how many of the replicas you have. So let me talk about this huge key space that we used to have at YouTube. It had 256 shards. So this N was 256. That means 256 masters, right? Each master had between 80 to 120 replicas distributed across 20 data centers all around the world, right? So that's like a hundred times 256 approximately. So 25,000 to 30,000 odd MySQL days distributed in 20 data centers. And the reason that we did this was a lot of YouTube traffic, you could live with eventual consistency, right? In our particular topology, all the masters lived in Mountain View and the replicas lived everywhere else. So in our example, if somebody was making changes to their own profile, anywhere in the world, the rights would go all the way to Mountain View. And then the reads also would come from Mountain View for that particular part of the app. But for the part of the app where somebody else was browsing somebody else's information, those reads were load balanced across the replicas that were running locally in your local data center. So they didn't go, they didn't need to go all the way to Mountain View. This is a very useful property to have. Now you might ask, what are these replicas and what are big data replicas, right? So replicas are the ones where it is cares about the replication lag between master and replicas because these replicas are candidates for feeling hold to be and becoming a master. And these are the replicas where you're serving eventually consistent reads from. Then what are these other set of replicas? These are the replicas against which you would run your batch jobs. You typically have long running jobs that run and these are typically OLAP workloads, right? Online analytics processing as against online transactional processing, right? Also this set of replicas are also used for doing a bunch of administrative stuff. Like we take backups using these replicas. We use them for resharding and a whole bunch of other things. So those are the big data replicas. They are named, it's not a very descriptive name but for historical reasons, these set of replicas are called RD-only replicas. So these are called replicas and these are called RD-only replicas. So if you remember, if the app, so this is the big data app, so this is going to say user DB at RD-only and then we take it and say, okay, that needs to go to one of the big data replicas. I'm going to load balance that across the big data replicas, right? So this is more or less the architecture of VITS. There is one important thing that I want to tell you about is all of this is sort of underpinned by a topology server. What's a topology server? The topology server is either a core, it's a quorum basically, either of ZooKeeper or HCD or Council, okay? And what gets stored there is the cluster topology. So IP address and port number for each MySQLD and each VT tablet, for example, that constitutes the topology for this particular key space would get stored there. Also all the information about how a particular key space is sharded is also stored there in the topology server. So we will see this during the demo, I'll be showing you some V-schemes. So VITS extends the relational semantics and the way you have schema for a database, VITS allows you to have a V-schema that describes how a particular key space is sharded. It's just a JSON file that you can read and that JSON file is also for each key space is saved in the topology server. Each of the VT gates, I mean, you might have like hundreds of VT gates or tens of VT gates, depending on how many app servers you have, each of the VT gates subscribes to the topology server at various nodes for various information, right? So any changes that happen to the, but also what VT get does is that it reads the topology, current topology and the current sharding scheme and caches it. So your query serving path, app server to VT get to VT tablet back to VT tablet to MySQL and then backed out does not involve topology server at all. So if your topology server goes down, your app does not see errors. You can still continue to serve traffic while the topology server is down. But if you needed to, for any reason, failover from this master to one of the replicas and make that the master, now your topology cluster topology has changed. That means that you need topology server, right? Otherwise rights to that particular shard will start failing. But it's a resilient design where the topology server is not needed in the query serving path. VT CTLD is the witness control plane. That's, it's a long running process. It exposes a web UI as well as there are a bunch of command line tools that talk to the API server embedded in VT CTLD. And all the commands that you can do perform using the web UI, you can also perform using those command line tools. So I think that's more or less the architecture of VTS. Most everything that we are going to do the demo and so on, we will keep coming back to that diagram. So sort of remember this diagram. So how do you start with VTS, right? There is your big fat main DB. You think that you're going to run out of space pretty soon. So you start out by putting VTS in front of the front of main DB. Typically what you end up doing is that you do double writes or double reads while you are making sure that everything is working as expected, right? So your app servers might be connecting directly to main DB and through VTS and you are doing some testing. But in general, the moment you put VTS in front of your single main DB, you start getting the advantages of connection pooling deadlines. So one thing that I forgot to mention, let's go back and I'll quickly explain that to you. What does VT tablet do for my SQLD? So I only mentioned rewriting the queries to add limits but it does a whole bunch of other things. For example, it does connection pooling between my SQLD and VT tablet. My SQLDs don't run very well if the number of connections is really, really high. So it has pools of connections and it multiplexes all the queries that are coming to VT tablet over that pool of connections to my SQLD. So this is one thing. Second thing that it does is that it puts timeouts for every query and every transaction that is going through it. Again, a long running query that is consuming a lot of CPU just gets killed after half a second. All of this is configurable but this property is very useful for OLTP systems. Other things are things like hot row protection. So for example, I mean, a lot of functionality got into VT tablet every time there was a YouTube outage, right? So one of the times that YouTube went down was because somebody who has an extremely popular YouTube personality put something out and said that, hey, like this video of mine and that video made it into the front page and like millions of people were clicking on like button for that. And that finally ends up being one single row in one shard in one key space, right? And everybody's update count equal to count plus one kind of a query hitting that database one after another. All the connection pools were consumed by this query. Nothing else could pass because what happens is that if you are trying to update one row in my SQL D parallelly like 200 queries are trying to do the same thing underneath in my SQL D for that row, it's a single lock. So all of them are kind of waiting on the same lock, right? So what we decided to do is that we should serialize that in VT tablet and only let one query go at a time. And the reason that we can do that accurately is because we actually have a parser. So we know that it's actually an identical query that is coming that is trying to hit the same row. So VT tablet is quite important in your my SQL D is even for a single sharded system work much better in VITS than otherwise. So as we said, you have a single main DB. The moment you put VITS in front of it, you get connection pooling, you get deadlines, you get hot row protection, you get row count limit and you can even blacklist queries, right? Then you say, hey, I want to scale reads. And so you start a bunch of replicas. And as I said to you before, if you now start addressing them to main DB at replica, VITS does the job of load balancing your reads which are eventually consistent reads across your replicas. So you're getting replica routing, load balancing and you get master promotion from replica to master. So VITS sort of thinks about master promotion in two different ways, planned reparent and unplanned reparent. Planned reparent is of course the case when everything is going well, you know that you need to do some administrative work on the master and you say, okay, let me choose this replica to be the master now. And VITS handles this beautifully. The way VITS does it is that VT gate starts buffering all the writes going to that master. Then the VT tablet associated with old master waits until all the transactions which are in flight have finished, have finished. Once that is done, then the VT tablet associated with the new master waits until the replication log have been completely applied and materialized into the new master. And once that happens, it puts the new master into read write, old master into read only and then updates the entry in the topology server saying that, hey, for this particular key space, this particular shard, now the master is such and such rather than the other one. And all the VT gates have subscribed to that event. They all get that event saying that, hey, this planned reparent is done. And so if all the transactions that I have been buffering for the last few milliseconds, I can now start sending them to the new master. So the application doesn't even experience any errors at all, just slightly increase latency and things just work, right? So that's planned reparent. Of course, the nub is what happens when there is an unplanned failover, right? What happens if there is a hardware failure? What happens if there is a network partition? What happens basically if for any unplanned reason your master is not available, right? So what we do is we typically use another open source tool called Orchestrator, which is monitoring the cluster topology and it has logic in it to A, recognize that a master is unavailable. B, figure out which is the best replica now to make the master. And C, reset the replication topology from the old master, from the new master to the old master, right? So Orchestrator kicks in, does these things and then there is integration between Orchestrator and Vites. Where Orchestrator tells Vites that, hey, now for this particular shard of this particular key space, this is the master. And don't worry about doing anything, all the usual dance that you do for planned reparent because this is an unplanned reparent. I have already done, reset the replication topology and so on, just start sending the writes here and reads to these other ones. So that's how master promotion with Orchestrator works. So with Orchestrator for the unplanned case and Vites itself takes care of the planned case. All right, so the next step that people typically do is vertical resharding. So what do we mean by vertical resharding? Vertical resharding is just moving a set of tables from one database into another without the application knowing, without the application knowing anything has changed, right? So you're not horizontally resharding it. You're not distributing the data in a single table horizontally into multiple shards, but you are just taking a subset of tables from one database and moving them into another database, right, without the application having to change. And the reason that you typically need to do vertical resharding is that, typically that large main DB that we saw, it probably has a few tables that has billions of rows, but that there are few reference tables that we typically have tens of thousands of rows. And there is no reason to shard those tables with tens of thousands of rows. The only tables that you want to shard are the tables with billions of rows, right? So that's why you sort of move those reference tables with fewer number of rows into DB two and the tables that you would potentially want to shard in DB one. So that's the next step, that's the vertical resharding. Again, a full support for this. I will describe to you how resharding works in general in VITES with the next slide where we talk about horizontal resharding. So now horizontal resharding is where that DB one has now been split into three shards, S1, S2, S3. So how does VITES reshard while continuing to serve traffic? What VITES does is that it starts consuming the binary replication logs from the old master, parses it, applies the sharding logic to it, and splits it into two replication streams. And one goes into the one new master and the second goes into the second new master. So while your application is continuing to write to the old master in real time, so the full complicated, which is that what we first do is that we stop one of the big data replicas at a known GTID from replicating from the old master, right? Then we copy the data inside that replica into the new masters. This is still copy. This might take a few hours, a few days, or a few weeks, right? While this is happening, the application is continuing to write to the old master. And some of the data that you have copied might have been deleted, updated, new rows might have been inserted, and so on. But we know the GTID as of which the copy was done. And once that copy finishes, we start this split replication where we are replicating as of that GTID continuously as the application continues to write to the old master. So this is a cool trick. This is how we do resharding. The application doesn't need to know that now the things that it is writing to the old master are also being replicated to the new master. And then once you are happy, we also give you tools that you can do a row by row comparison between the old data and the new data. And then finally you can fail over so that VTGate now starts directly writing to the two new masters rather than the one old master, right? So that's how resharding works. We use the same technique for horizontal resharding as well as vertical resharding. And this particular functionality is called Vreplication. And initially we were using this functionality only for resharding as I described. But now we have taken that functionality, pushed it one level below and made it composable. And that's the demo that I'm going to show you when we get there. But yeah, this is in the fullness of time, this is how your database looks like as you grow. You have multiple zones or multiple cells. So that diagram that I showed you did not show this dimension of cells, right? So a zone or a cell in Vitesse is basically a failure domain. So if you're running it in your own data centers, it could be a rack. If you're running say in a cloud provider like Amazon, then it could be an availability zone. If you're running or you can treat a region. So US East One, US East Two, US East Three could be cells. Or just US East, US West could be cells. So it just depends on you, what you think of as a failure domain. And Vitesse has support for, it sort of recognizes that a particular replica is running in a particular cell and it can load balance things based on the local versus remote, right? Again, very useful property that can be used for architecting GDPR data locality requirements. We will soon be doing a blog post about how you can use Vitesse's flexible sharding plus the ability to recognize cells to architect GDPR compliant database clusters. So now I'm going to sort of show you, I wish that I could have done this demo live but this is going to be mostly screenshot based. But let's set up the demo by talking about a use case, right? Let's say that we have a marketplace, right? You have products, you have customers and you have merchants, right? And you have orders and each order basically has a order ID, which is the ID for the order and a customer ID because the order is placed by a customer. It has a merchant because the order is placed with a merchant and it's for a particular product, so it has a PID. So this is sort of the schema of our marketplace. And as you can imagine that there are probably 10 more minutes, okay? So this is the schema. So we sort of do this using three key spaces, two sharded and one unsharded, okay? We say that there are hundreds of millions of customers so we are going to shard them. So as I said to you, the customer table is sharded using CID and we decided that we are going to keep the orders with the customers. So we shard the orders table using CID also so that the orders associated with a particular customer will live in the same shard. Merchants are also sharded because maybe we have tens of millions or more of merchants but we have sharded merchants on name. So the reason that I have chosen IDs for this and name for that is just to show you that these are numeric columns, that's a Varkar column and different functions apply for sharding. And then the product, there are tens of thousands of products or maybe 100,000, no need to shard them. They're in their own key space which is unsharded. So let's go and take a look at how this looks. So this is our application and this is how these key spaces look in the Vittes control panel. Customer has two shards, merchant has two shards, product has one shard. If I click on one of them you will see that there are serving shards. The point that I want to make here is how the shards are named. They are named dash 80 and 80 dash. The reason for that is that dash 80 is 00 to 80 and 80 to FF. So the complete key space is 00 to FF and that has been split into two, split in half, zero to 80 and 80 to FF, right? So that's how the shards are named. If you click on one of these, you will see that for this particular demo, we just had chosen master, no replicas, no already on the replicas, but all the replicas associated with the particular shard will show up here. This is just to show the observability. If you clicked on the status link here, this is what you get, all kinds of statistics. This data about QPS. And this is how a V index for the sharded customer table looks like, okay? So what we are saying is that we have, this is the V schema. We are saying that it's a sharded key space and there is a V index of type hash and we are applying the hash function to CID for the table customer. And we are also applying the hash function to the CID of the orders table, right? That's how these two tables are sharded. VITS also allows you to have what we call sequences, which are basically, in the sharded system, your auto increment doesn't work because each shard, they will collide. So we give you, VITS completely takes care of you. You define a particular column as an auto increment and associated with a separate table in another key space or in the same key space and VITS manages the sequences for you, the unique sequences. So that's what this auto increment block and that auto increment block is for that. And keep in mind, there is a customer sequence table and an order sequence table and we will see them defined somewhere else. So this is the unsharded product key space which has the customer sequence table, order sequence table, as well as the product table which is not sharded. And here we did not say sharded equal to true so that tells VITS that this is not a sharded key space. Similarly, this is the merchant key space. I wanted to point out that the column is M name which is not a numeric column. So this time we are using an MD5, MD5 index which is of type Unicode lose MD5. So this is the sharding function, right? So just wanted to show you that depending on the column you can have different functions. There are six predefined functions like this and then you can write your own. So this is our, this is a app. This is my classy app which is the, which will allow you to place orders and so on. Now I'm going to add some data into it, some customers, some merchants, some products and so on. This is how it looks. The data for the customers has been distributed in two different shards. The orders associated with a particular customer have also shown up in that shard and then there are two merchants which are in two different shards, right? So if I ran a query such as select start from product, what shows up there, I would like to draw your attention to the select start from product is the query sent by the app and select PID description from product limit 1001 is the query sent by the VT tablet to my security. So it's a rewritten query sent there and the result is here, keyboard and monitor. So what has happened is that it did a scatter gather to both the product shards. Sorry, it's not a scatter gather. It's a single, it's a single shard. So it just sent it to the product one and gave you the results. But if I said select start from customers, then it did a scatter gather to both shards. So underneath VT gate has run two queries on customer dash 80 and customer 80. Now if we go here where you actually gave a particular CID, again VT gate looked at CID equal to one, figured out the shard for that particular CID is dash 80 and only sent a single query rather than doing a scatter gather, right? Now here is a slightly more tricky thing, it's a join. And then again, because customers and orders are in the same shard, that's a join on orders, it sent it to both the shards, got everything back. But this is where the trickiness comes in. This is now a join over product also because you want product description. So those two were sent and for every row returned, if it is needed to send a query to the product key space to get the description, right? So that one query from the app translated into five different queries to give this result, okay? So what we are going to do is we are going to start V replication where the product table is replicated into all shards of the customer table. And once that is done, this is how it looks. There is a product table, you product table now in the customer shard where everything from the product is replicated. Any new product that gets added there is we replicated using replication into the product. Any updates also get sent, any deletes get sent. And because of this, now if you're on the same, so if you added a new mouse, you can see that the mouse showed up here and got replicated into these two tables also. And this is how, this product table now has been called a reference table so that Vittes knows that it's being replicated into other shards. So you need to change V schema and tell Vittes about this. And now if I run a query like this, you get an error saying that there is an ambiguous reference. I don't know which product table you're talking about. So what you can do is you can change the apply routing rules and say that this particular product table can refer to product dot product and customer dot product. One minute, okay. So anyway, so you tell Vittes about this routing and then if you run this query, it becomes that whole join is correctly resolved in a single shard rather than having to go to this product because you have told Vittes how to route. So this is how V-replication works. It's really useful. You can also do many different things with it. I'm just going to go through the last couple of slides. Anyway, I think we are at the end of our time. I more or less got through everything that I wanted to show you. You can also use this V-replication facility to do aggregates and so on. And that's how you use it for OLAP queries. And let's open it up for questions. Do you have time for questions or not? No time for questions. Thank you for coming. I'm here. So if you want to ask me questions, please, or is somebody else coming here to give the next talk? Okay. Yeah. We don't have time for questions, but please find me outside. I'll be here. Thank you. Thank you, Jutain.