 Alright, we get started. So today I am going to be talking about Vitesse, I actually have some Vitesse stickers later in case you are interested, you can come and collect them from me. So my name is Sugu, I now have a new title, it is called CTO for Planet Scale Data, it is a new company that we started recently whose sole purpose is to support the Vitesse project. I have my co-founder here Jithin who is also sitting. And previously many of you probably know me as the software engineer from YouTube, I was at YouTube for over 11 years and that is where I co-created this project, it was around 2010 and have been working on it since then. So one disclaimer or warning is I may accidentally regress into thinking that I am still at YouTube, so if that happens wake me up and remind me that is not the case anymore, alright. So I am going to talk about a couple of problems that have been growing in the MySQL industry. So whenever I say MySQL, assume the base class MariaDB and all other variants of MySQL, sometimes you can even like abstract it out to any RDBMS. So the one problem that has been growing with MySQL instances is that they all run only on one box. Like when about like this has been now 8 years when we started the Vitesse project, we thought we were actually solving this problem for big companies like Facebook's and Twitter's because they are the ones who have a lot of data and they need to shard in everything. But over the years what we have noticed is that even small tiny companies are beginning to outgrow their single MySQL instance almost in no time and this is actually, we should have actually seen this coming, people have been talking about data exploding, big data and all that stuff and now it is really affecting the choices that people are making about storage solutions. Often they are looking at MySQL and they are thinking is this going to carry me all the way if my business grows suddenly in leaps and bounds or if I suddenly need to store a lot more data, will I be able to continue to work with MySQL and some of them are actually looking at other alternatives like what else is out there if it is not MySQL. And I will talk about some of the choices that I have seen people making and we basically I think we need to fix this problem and Vitesse is actually trying to help there. And the other big trend obviously is that the cloud is here and pretty much most RDBMS's are kind of left behind mainly because culturally or traditionally they are all built to run on a single machine, they have the whole machine for themselves and they have all the resources there. But in this new container world you are supposed to live within the smaller box that you are given and those containers do not stay up for 6 months and stuff like that. They go out, they go down any time they like, like cloud environments like to move your instance all over the place and you do not have the whole box anymore, sometimes somebody else starts doing a lot of I.O. and you are short in bandwidth and the fact that all this is happening changes the way your application has to work with the database. You cannot have these config files and static IPs that you use to connect to the database anymore, you need to have a more dynamic discoverable API that applications have to use to connect to the databases. And the other player now that you are now hearing more about or will hear more about is Kubernetes which is now pretty much being seen as the future operating system of distributed applications and there is actually no solution right now. There are some people working on operators and stuff to run RDBMSs in Kubernetes but right now there is no good solution if you just have a single MySQL instance just to move it there. So these are some of the problems and the cloud problem actually aggravates the previous problem of the transactional data exploding and is kind of you many people are like almost feeling stuck with MySQL. So what do I do with this thing right? So fear not, there are ways out, that is what I am here to talk about and so here are some of the solutions that people have used when they find that they have outgrown their single instance MySQL. By far the most common one is application charting and it is now becoming the least favorite option because one it requires your application to be rewritten and it means that you need additional engineering effort to be able to do this and the second problem which is even worse is you have to live with it for the rest of your life. That is actually now the problem that companies are beginning to feel and some of these companies are not really core technology companies, they are like retail shop and they are like a phone application and stuff. They do this application level charting and now they have this debt or burden that they have to carry and sometimes your shards outgrow a single instance then you have to reshard and they do not even know how to do that right. So there are various challenges with application charting and some people have said just forget all this just let us go to NoSQL at least it will scale, we will deal with every other problem that comes up with NoSQL but at least we will not be stuck staying in one box. There are also some paid solutions which I will not talk much about since it is an open source conference and they are also actually not all perfect that is all I can say for now if you can later come and talk to me about what solutions there are and what problems they have. There is actually a newer category that is called NewSQL, I am sure you all heard of it, Cockroach, TIDB and Yugabhite are some new companies that are actually building a ground up SQL scalable solution. The main drawback is actually a time problem is that they are all new, they are just beginning to get their first versions of SQL running and if you look at MySQL it is been tuned for over like what 20 years now and these people are where MySQL was like approximately 20 years ago. So it will be a while before they get there but they will if they have enough financial backing they will eventually get there but right now it may not be immediately possible to go into production with them, it is a kind of my opinion obviously these companies are here, they will probably help you go to production to the extent that they can. And the other problem is that these companies, some of these companies have a open source offering and commercial offering and there is, we do not know whether you can just take the open source and run with it in production yet. So there is not enough data to know that that is possible or whether there are any examples that exist. So those are some of the things that I see with the new SQL but it is definitely something to watch out for. So at this point today if I have to make a data decision, the way I see it is these are all inferior solutions to what MySQL can do for you. So functionally if you look at MySQL and what it can do for you, you move away from that and come into this, you are actually downgrading in terms of functionality, right? And why are you doing it? You are doing it only because you have to scale. So the obvious question is why not both? So that is basically what Wittes tries to do is gives you all the functionality and power that MySQL has but at the same time you can keep scaling indefinitely. So why is, why Wittes capable of giving you both is because underneath it just runs MySQL. You basically, if you have a MySQL, you deploy Wittes on top of it and then afterwards you can reshard your database without the application knowing much about it. And how big can we get with MySQL? It's pretty big. We call it massive scale which is pretty large. Actually I was talking to Yoshinori earlier. That's what he defines, Yoshinori is from Facebook. He defines that as a medium scale. But we call it massive scale because we call it, oh we can go to tens of thousands of nodes. But the cool stuff is that it is open source and we have been working on it for 8 years now. So a lot of time and effort has been spent to make the project production viable. You can take what is available in Wittes, take it to production and scale it to millions of QPS and it will run for you, no problem. And for the longest time YouTube was the only company that was using Wittes but now we have very large, not very large, a decent number of companies but running at very large scale. Some of them are, many of them are running a 6 digit QPS in their instances and many of the companies have completely rely on Wittes for all their data which is super exciting. And which companies are using Wittes? Here is a list. There is actually more. These are the companies that have given us permission to use their logos. As you can see there is a name you may not recognize which is JD.com but it is because it is a Chinese portal. It turns out that they are the second largest portal in China after Alibaba and they have completely migrated their business over to Wittes. They run over 300 key spaces or databases and they run again comfortably 6 digit QPS. You have Slack which is actually made a major commitment both in terms of engineering effort and operations and they have actually contributed a lot to the project too. They are in the process of migrating to Wittes. There are actually plenty of talks that Slack has given. We even have Guido here who was working on that migration. There is a few others. There is HubSpot and Squire. Squire for example uses, is using Wittes for their Squire cache application and they are already sharded. They started with a single shard and then deployed with this and started resharding it. But the more important thing here is if you see there are some names you will not recognize. These are actually pretty small companies but they have huge data. Many of them again run pretty large QPS and this is an indication of what is coming, how the industry is changing. It is not the YouTube's anymore that require massive scalability. There are some small companies probably that I have never heard of with us and they probably went to the route that you chose some of the routes that I showed before. But I think there are lot more companies that are coming up that are going to require big data and are going to wonder can I go to MySQL, will it scale for me when I need to and at least Wittes is there to take care of them when that situation comes. There is actually a third kind of need that we are beginning to see. Many of these big companies like Pinterest, Twitter, they already have sharded, they already have sharded solutions but their secondary databases is beginning to grow and they need sharding for those databases and many of them are saying I do not want to go through this application sharding exercise anymore and some of them are actually coming to Wittes to see if they can actually deploy Wittes on top of their MySQL instance and do the sharding. So first of all good, alright. So what does Wittes do? Actually this slide usually takes me 1 hour to cover if I drill down into all the stuff that Wittes can do, like 8 years of development you can develop a lot of things in that time. The one thing, so I have categorized it into 3 parts. The one thing that Wittes will do is make your MySQL happier. So if you are running tens of thousands of nodes, simple tasks that you usually expect a human can do should not be done by a human because if problems appear at that scale it is very hard to manage. So many of that thing about query mediation, making sure that queries do not run for too long, making sure that transactions do not run for too long are all built into the proxy that Wittes has. If there is a sudden overload, Wittes can take care of throttling it, throwing away, throwing errors to the application saying I am overloaded, I am not accepting any more transactions. So all those things are built into Wittes. The other thing is, this actually interesting story, Wittes was originally in 2010 we built it for actually bare metal, we were running in YouTube, YouTube's own data centers and at some point of time we had to migrate to Google's work cloud. So if you are a Google, how many of you have worked at Google before? Oh, there you go. So if you write an application, oh you, Mark himself. So if you write an application that runs on Google Cloud, it is pretty much guaranteed that the application cannot be open sourced because the Google ecosystem is so unique and there are so many internal dependencies, they are such that if you write an application in Google Cloud, you cannot open source it. So we had an interesting reverse problem is we have an open source project that runs on bare metal and we have to make that run into Google's cloud. So how did we do that? We basically built an abstraction layer for every Google Cloud feature that we had to integrate with and saved this open source project and fought for that being for remaining open source. We will say we will do whatever it takes to build these abstractions to make sure that Wittes runs in the cloud but we want to keep this project as an open source project. So it was kind of hard but we succeeded but it paid off, the reason why it paid off is because when Kubernetes came, we were ready for it, we said oh cloud no problem, we just build all these other pieces, put them together, we are ready for the cloud. So and right now many of those companies that I showed you actually run on Kubernetes, run on the cloud, run on Docker, I mean for the longest time people were wondering can MySQL run on Docker? We do huge QPS on Docker, some of these people, GKE, AWS, Kubernetes AWS, Mesos, Azure, like you take all these things, combine them differently, there is probably configuration there that Wittes runs on. And finally icing on the cake is it is indefinitely scalable so you do not have to worry about that part. And even in the indefinitely scalable part, you can say indefinitely scalable but the important part is that you do not have to rewrite your entire application to be able to scale it because all the queries that you send to MySQL, to send to Wittes, it will figure out how to break it up and send it to different parts and get you the results back as if it was a unified database, yes. I will answer that question on my next slide. And the other part is that even if you once you have sharded your system and later you suddenly realize that you are running out of capacity and you want to reshard, there are workflows that will do this for you with application noticing absolutely no change at all. We do this, YouTube does this multiple times, like 2 to 3 times, like every 2 to 3 months they do a resharding, major resharding and we used to make big announcements before now nobody tells anyone anything, they just go and reshard it and there is usually never any issues. So this is the MySQL, this is the Wittes architecture. So the most important part of this architecture is the Wittigate. The Wittigate speaks the MySQL protocol and the app server instead of connecting to MySQL just connects to Wittigate and nothing else, after that it just treats the Wittigate that it is connected to as a giant database except that underneath it is actually a bunch of smaller databases and this architecture actually I can spend a lot of time, we really really like it. Each and every one of these pieces plays a very important role in this whole thing especially for example the lock server is the one that makes it possible for Wittes to run in the cloud so comfortably. I am going to rush because I will be soon running out of time and I wanted to cover a couple more things. I wanted to cover the pluggable architecture I have already covered, it is actually the most important thing that we had to take care of while making this architecture pluggable is to not compromise on performance. So in spite of all this pluggability Wittes latency is still pretty low, it adds about 2 milliseconds as total overhead to your database but then you get this indefinite scalability. And in terms of project I wanted to cover how we manage the project because that is actually the best part, the reason why I keep wanting to go back and keep working on this project is because we have maintained very high standards in this project to the extent that some people get irritated especially when we start insisting on unit tests and strict code coverage requirements they are like do I have to do all this I do not care this thing people are going to take and run into production every code that you write that is going to serve a query it has to be tested it has to be covered and if needed you need to write end to end tests also and so we followed this discipline from day one and have been very strict about it all coding standards are enforced we have all kinds of CI tools that make sure that you do not break those things and if your code is not readable we do not even take your pull request. So but then that is actually caused over time some really very high quality contributions to the project and this is big news that is coming CNCF actually is going to accept Wittes as one of its projects it is actually pretty much a done deal the official announcement is happening tomorrow so keep watching various blog posts they took the CNCF a long time to accept Wittes almost a year and the reason is because they could not find a project similar to Wittes they could not find anything else to compare it with because in reality there is not one right now I do not know why but they just could not and so they did a lot of due diligence they interviewed all many users they looked at maturity they looked at contributors and then finally said yeah it is right there is no project like Wittes but it is awesome it really does what it is supposed to do and after all the due diligence we are now reaching the end tomorrow we will be accepted in CNCF which means that Wittes will get to benefit from all the cool stuff that CNCF as personally for me the reason why I am excited is because until today Wittes was always seen as a YouTube project but now it is becoming a truly community owned project so I am hoping that this will attract more contributors and attention in terms of adoption. So in terms of road map Wittes is not perfect but so there are still things that need to be done so if you are a good open source person if you know query engines if you know things like that we would like to welcome your contributions the one place where Wittes is short on is some cross shot queries it says sorry I cannot handle it it does most of them but not all of them if you do some complex aggregation with ordering and sorting it says please rewrite your query it is such a way that I can do it. So help with migration tools and configurability those are the areas that we think we need to focus on. In conclusion the main message is if you are hesitating about using MySQL because it cannot scale do not worry about it we have it covered and there is plenty of examples of people having scaled with Wittes at really really large scales I was actually telling somebody else today if somebody came and if your engineer came and told you I want to store 100 terabytes of data what would you tell them oh is there any way you can prune it is there way you can reduce it say no do not worry bring it on we will take care of it you know so that is the kind of attitude we should have when people come to us for storing data or running in the cloud is also not a problem and finally help build Wittes because it is an awesome tool yeah that is it thank you very much.