 Cool. So what the hell does this title even mean? Storage is stateful. What is a stateless storage? So hopefully by the time I am done, I would have successfully defended this title. That is my goal. So just to introduce myself, I am Sugoo. I am the co-creator of with S. Show of hands how many of you have heard of with S. Okay. Things are getting better. How many of you, I guess the other, how many of you don't know anything about with S. Okay. You're all honest because A plus B equal to approximately the total number of people in the room. Okay. So that means that I will spend slightly more time covering what we test us. And then, so that I actually had, have a lot of content I can go for like about five hours. But I told that I have only 40 minutes. So what I'll do is I'll try to rush. And at the end, if we have time, I will do a full detailed demo. If we are out of time, I'll like crunch it up and show it in small format. So what is with us? There is some history behind with us, which I will cover. But if you today, if you want to know quickly, like what can I do with with us? The first thing you can do is with us. As far as I know is the only cloud native database in the world. It's the only system that can confidently say that you can run on Kubernetes without losing data. So that is what we test is. And I'll talk about why that is the case and also why there are not other storage systems that have started making this claim. And there are storage systems that are kind of beginning to defend, but no one can come out and boldly say we are a cloud native database. Only Vittas is able to say that. And the other property of Vittas is that it is massively scalable. Massively means really, really massive. We'll cover some of that. Highly available. Highly available in Vittas land means about five nines of availability should be comfortable. Five nines is a golden standard. The reason is because six nines of availability is theoretically goes into your response latency, which doesn't make sense. So if you took like five milliseconds to respond, if you add all that up, it is impossible to get six nines of availability. And it is based on MySQL. What that means is that it can use it uses MySQL underneath. It also speaks the MySQL protocol. So as far as the application is concerned, it thinks that it's talking to MySQL. So how does it do that? It's basically through sharding. So what it does is under the covers, it can take your single MySQL instance and then break it up into really, really, really small parts and then scale indefinitely and massively. So how it does that? We'll try to cover some of that. Vittas is a CNCF project. How many of you know about CNCF? Okay, cool. CNCF is actually a new foundation called the Cloud Native Computing Foundation. It was actually, it's essentially a foundation that was started to support Kubernetes. How many of you know Kubernetes? Okay, all of you know Kubernetes. So we don't know if Kubernetes founded CNCF or CNCF founded Kubernetes, but they were kind of, they are kind of synonym. But after that foundation started taking off, CNCF started taking on other projects that are Cloud Native. And Vittas was actually the first storage project to be accepted in the CNCF. As a matter of fact, it was a big controversy because they were like, oh, databases are not Cloud Native. Why would we take a storage system like Vittas into Cloud Native? Then they went into an argument about what Cloud Native means and all that. So short story long, Vittas is now not only a Cloud Native project. They have tiers of or levels of project success. A graduated project in CNCF means that it has reached the highest level. It means that it is on par. There are CNCF as I don't know how many projects, but Vittas was the eighth project to reach graduation in Cloud Native. Like for example, CNCF holds conferences all over the world. And the conference has been growing. The last one they had was in San Diego and they had 12,000 people attended. So that's how big CNCF is, how big Kubernetes is. So it's all pretty exciting for all of us. Okay, these are some of the stats. The most exciting part of the stats are who uses Vittas. At the end of the day that's actually a testimony to the success of the project. So actually somebody came and told me, I call these other storage systems have these comparison charts. I can do acid, I can do this, I can do durability across data centers, etc., etc. Why is Vittas not have those charts? I don't know. I guess I don't need charts because people are using it. People are not using it. Maybe I want to show charts. But then I thought about it and then I said, you know what? We should go ask these people, the adopters, why they are using Vittas in spite of the fact that we don't have a comparison chart. So if you look at the adopters, it's a pretty impressive list. What I would say is that these companies are kind of the technology leaders. They kind of dictate what is going to happen in the future. If they do something some way then everybody goes and listens to what they have to how they are doing it and they want to repeat it. So that's the exciting part about Vittas is that it has basically a pretty impressive. I mean look at Slack, GitHub, JD. So I'll talk about some of these companies but we have a pretty impressive list of adopters. So here's the first one. It's up to one slide. So Slack, right? Slack is, any of you know about Slack? Yeah, is that about Slack? Yes. Slack is basically a company that's experiencing hyper growth and they found out with Vittas about three years ago and started studying it and they were actually at a crossroads where they had sharded their application and they were not happy with it. For a couple of reasons. One is it was really, really hard to manage the starts and the other one was their business needs are changing and then they said like this starting scheme is not going to work for us anymore. So that's when they studied Vittas and they were studying with trying to figure out should we build something ourselves or should we buy. And then somebody inside Slack said have you heard of Vittas? You should go talk to them. So they talked to us. Then when I went and presented to them I told them you know what? Why not both? You should build and buy. So and that's what you get to use with this because you buy into Vittas for free. I mean you're not paying any money because it's an open source software but then if there's something missing in Vittas you can build it because it's an open source project. So they liked that story. So they started learning Vittas and they are now one of the main contributors to this project. And this is a quote from their principal engineer and essentially this is a testimony to how passionate people are about Vittas how much they love it and how much they trust it. Actually there's another quote from Demers saying that we don't plan to change this ever in the future kind of code. We don't have any other short term plans of changing because Vittas we believe that Vittas is going to meet all our needs which is pretty exciting. And how many of you have heard of JD.com? Yes, the Paytm person has heard of JD.com. There's a few others. So JD.com is kind of the Amazon of China. They have 67 billion dollars in revenue and they use Vittas. How they came about is kind of funny but since I'm short of time maybe I'll tell you the story in the hallway but they are what I would call massive scale. They use so 67 billion dollars in revenue. They are the largest, one of the largest retail portals in China. So I don't know how many of you have seen storage QPS. Try to imagine what would be a really, really high QPS. Try to make a guess and I will tell you how much they are doing and then we'll see later if your guess matched. Try to guess how much JD.com did as QPS. So they have about 4000 key spaces which is essentially a logical database but it could be sharded. When it's sharded into 1632 it's still one key space. They have about 4000 key spaces. The number came out. 35 million QPS. So that is what they did on their Diwali. It's called the Singles Day which is on the 11th of November and they are all on Kubernetes which is the exciting point. How many of you have heard of nozzle? None because it's actually a tiny startup. So you can understand somebody like JD using with us but why would a tiny startup use with us? So their story is they are a startup but what they wanted was they wanted the benefit of Kubernetes. Why did they want Kubernetes is because they wanted flexibility. So their story is they were shopping for a place to run their software. They talked to Azure and Azure gave them a really awesome deal like huge amount of credits. Azure does that for you. You should know how to negotiate. They give you huge credits. So they went to Azure and they were running it but guess what happens? GKE finds out. GCP finds out. We can give you a better deal. So they said oh okay. So because they were known as Kubernetes in one hour they just packed their bags and moved to GKE. So that's what you can do if you are all in Kubernetes. No lock-in, no vendor lock-in and complete portability. So and that is the reason why you should run your software fully cloud native including storage. So right now there are not many storage systems but there is huge benefits in moving to it and Vitas is leading the wave and let's hope it remains that way. So I have now used the cloud native a few times. What is cloud native? There is a lot of people who claim to be cloud native but there is actually a clear definition of what cloud native is and how does that apply to storage is what I am going to cover now. So for that we have to go back in history into early YouTube days because there are two reasons why Vitas is cloud native. One there is a historical reason and then there is a technical reason. So the historical reason it goes into YouTube history when after Google bought us we were still on bare metal and we built actually with Vitas on bare metal and there was no cloud. We are talking 2010, the cloud did not exist then and everything was fine. I mean our own data centers we had our state was in our hands, everything was completely fine. Perfect. Not really. Things were on fire, outages every other day and almost every time it was always the database because that is what somehow causes outages in systems. So my co-creator Mike and I said we need to do something about this. Let's take ourselves out of this rotation and figure out how to leap ahead of all these storage problems that we are having and that is essentially how Vitas was born. But then we deployed it, launched it and actually it kind of got slightly happy but in 2013 Snowden happened. How many of you remember the Snowden story? Yeah, Snowden happened and all US companies got spooked and they said and Google said you can't have your data outside Google anymore you need to move it inside Google. So we had a very aggressive project to migrate everything into Google very very quickly and until then we were quite happy we didn't know what cloud was we didn't know what this Borg thing was Google has this thing called Borg which is their internal cloud and when we went and looked at their APIs and stuff that what you needed to do to run inside Google you essentially had to build a system for that platform because Google is an ecosystem they have load balances, they have they have APIs, custom APIs for everything and how do you move something like Vitas into Google? That is one part of the story. The other part of the story is if you are writing software inside Google you are only supposed to write the stateless part of your application. If you want to store your data somewhere then Google gave you APIs you just call those APIs and it will store them and they were very fixed APIs you are not allowed to store anything in a file so Google doesn't allow that there is something called Colossus which is actually a blob storage you can put your software there this big table all of you have heard of big table and there were a couple of other systems but there was no relational database that you could store your data on and Google says you need to move all your data there to do it. So that's when we we made two difficult calls one is to keep Vitas open source in spite of making it work inside Google so we built layers and layers of abstraction and did that. The second thing which we did is what made Vitas cloud native is because you have these APIs so Google says so we decided to run Vitas as if it was a stateless application what does it mean is that the way you write stateless application is when you launch your application in Google it will run you you have a file system you can write to it but as soon as Google reschedules your pod all your data is lost it will wipe your data and restart your process somewhere else so we figured out a way to make Vitas run in that environment using fmrl local storage so how did we do that is just having a master have a large number of replicas and I don't know how many of you have heard of semi sync replication in MySQL we use that which means that if you commit data it is guaranteed that at least one other replica has it and using that we managed to port the entire YouTube traffic into Borg probably a few million QPS huge amount of writes and we never lost data so we were kind of I think the first storage engine that ran as a stateless application in Borg so I called that stateless storage and that is my defense I mean it is better than serverless what is serverless but at least stateless storage now makes sense there is stateless application but we still store your data for you and in 2015 Google announces Kubernetes and we look at the features hey we can run in this system because we run on Borg Kubernetes has the same features we know how to run on Kubernetes so we announced before Kubernetes 1.0 came out saying that Vitas is ready for Kubernetes and guess what Vitas actually believed us they said oh you think we can run in Kubernetes can we say yeah sure start running it because in reality we were running YouTube like that so we had the confidence so the first adopter was a company called Stitch Labs and they went into production in 2016 and by now that is by far the longest running Vitas workload on Kubernetes so it has been running there Kubernetes HubSpot came and they went on Kubernetes JD came and they showed that not only can you run on Kubernetes you can run on massive scale as you saw the QPS some other stuff that they have done and finally nozzle came and showed that the advantage of running in Kubernetes is the portability so we have a nice and beautiful story for Vitas so all that happens there is this person called Kelsey Hightower he is somewhat of a celebrity in the Kubernetes CNCF world he is the oracle of Kubernetes so he predicts things he is the evangelist for Kubernetes so we have been running this and then Kelsey look at the date here the date here is 2019 so last year he says do not run databases on Kubernetes you will regret it and I am like Kelsey we know how to run but then he has a point he says if you take just MySQL or Postgres and run it in Kubernetes you will regret it you will lose data or you will lose uptime because those softwares are not cloud native and if you try to run them in Kubernetes you are going to have trouble the reason goes back to the board days where Google does not respect the data written locally by the application I consider it ephemeral if I reschedule you I will wipe your data that same rule exists in Kubernetes so if you took MySQL and ran that in a system like Borg or Kubernetes you will have a lot of trouble so I am going to quickly go through a few scenarios I usually spend more time on this but I want to jump ahead and see if I can cover the demo so obviously if you took MySQL and ran it on local storage that is I just said it is not recommended because if you reschedule you lost your data so this is not a viable configuration but this is viable configuration if you are doing testing if you are running tests your pod is short lived bring it up all in one run it finish your test and throw it away it is really nice for that the other approach is to use a mounted volume if you use a mounted volume your pod gets rescheduled the new instance comes up and reconnect to that mounted volume but there are some problems one problem is MySQL if it crashes and you reschedule it somewhere else it needs to do crash recovery who knows how long that is going to take there are sometimes crash recoveries that take an hour so that is one problem the other problem is these databases are tuned for local disk and if you suddenly put a mounted volume there the latency is different the performance characteristics are different so your application may not may not perform the same way so those are the two problems with this approach the third problem is the more common way by which people run MySQL is I have a master and I have a bunch of replicas the master goes down you fail over but if you the problem with failover in Kubernetes is the pod is not a first class citizen in Kubernetes there is something called the stateful set and there is nothing stateful about the stateful set the stateful set is a stateless way of running pods they initially called it cattle and I do not know if you have heard that argument they did not like the pets versus cattle argument because somebody started arguing cattle is also a pet so they went with stateful set stateful set basically allows you to address each pod by its name that is what its stateful means so you can run in stateful set but you cannot Kubernetes will not distinguish a master from a replica as far as Kubernetes is concerned they are all the same you could say ok I will always run pod 0 as my master but the problem there is if pod 0 goes down you re-parent to a replica you have to go tell the application oh it is not pod 0 anymore it is pod 1 so what that means is that you cannot run this without an orchestration layer that actually watches these changes if there is a re-parent it has to go in from the application so the short story long is if you run MySQL in Kubernetes you need to build these these layers without which you cannot run Kubernetes essentially that is what we tested we test built these impedance mismatch thing so built these layers in the middle that made sure that if you run MySQL within Kubernetes it will perform all this orchestration needed for that so and then there are other problems with the cloud which are not really good for people migrating from on-prem solutions the life cycle for example I mean when we are in YouTube our master uptime used to be like 6 months it is not unheard of to have a master that is up for 6 months but in the cloud world you will be lucky if your master is up for a week sometimes master goes on multiple times a day the data size you are used to seeing like 10 terabytes 15 terabytes of data Kubernetes does not like that because with reschedules apart you cannot just like take 10 terabytes and move it around so it likes you to run smaller instant sizes IPs get reused these are not things that a on-prem solution likes and then there are other things about if there is a topology server you have to make sure you do not overload it because those are not really IQPS systems so all these issues we had to resolve when we moved with test to board it does not exist in Kubernetes so when we move to Kubernetes these things were automatically taken care of so just being able to say I am functionally can run in Kubernetes is not insufficient you have to have sensitivities to these issues and make sure that you can handle these types of environment changes alright and then finally after arguing with Kelsey he has now changed his standpoint now he says that do not run your MySQL directly on Kubernetes or Postgres on Kubernetes but you can use orchestration systems and if you use them it is safe to run them so this is the Wittes architecture and there are 3 principles that we used in this architecture one is simplicity which means that every component that is there is necessary and must be there and no more than that the other one is loose coupling which means that no single component is directly dependent on another component which means that in this system things can go down and come up independent of each other and the system will tie itself back together as you bring them back up the third one is survivability which means that there is no single component in this system that he is we cannot afford to lose we can afford to lose any component and the system will know what to do about it and can wait until that component is resurrected it will continue to work so based on these principles we built Wittes in reality we did not have this principle we figured out that eventually when we got it working these are the 3 principles that we ended up with now I am speaking as if like we designed it from the ground up to be that way but that so essentially when the app servers connect to these VT gates these VT gates are all stateless which means that they can come up and go down as needed you can add more as needed or shrink them back so when an app server connects to a VT gate it thinks that it is connected to a humongous database but in reality it is actually a cluster of multiple databases in the back end and these VT tablets there is one VT tablet per MySQL it is essentially a minder of MySQL it proxies queries make sure it has connection pools proxies queries into VT tablet and does housekeeping work like taking back up restores and everything and when a VT tablet comes up it goes tells the topology I exist and then the topology says okay noted the VT gates watch the topology as soon as they discover a new VT tablet they connect to the VT tablet and say okay start serving queries there is a lot more to it than just this but this is essentially the principle based on which with us was built and now there is demo I am actually thinking I should not do a live live demo or screenshots live demo okay let us see if it works okay okay actually there is couple of more slides I should cover here okay so here is a simple schema all of you familiar with databases yeah is a very simple application it is a marketplace there are customers coming and buying products from a merchant that is and so which there is a customer there is a customer table there is a product table and there is a merchant table and when a customer places an order order table has foreign keys one back to customer the person what they who ordered the product what they ordered and the merchant who they ordered it from simple you think that this schema is easy to scale it is not the reason is because if this goes into billions of customers how do you shard this system you can say customer shard the customer obviously products not many products product can be unsharded in a separate database merchant may or may not be sharded but in something like where really really huge marketplace you may need to shard the merchant also but where do you put orders you put them with customer or you put them with the merchant so so far we have always said you can you have to make a choice in a sharded system this you do not have a choice you have to choose which relationship is stronger and in this case I am implying that orders is strongest associated with customers so we are going to say we are going to put the order with customer so I am going to show you a system that is sharded this way and we are going to talk about what challenges so this is essentially what so there is a product which is unsharded database customer is sharded and the merchant is sharded on their own shard and obviously this yellow line is going to be a problem this purple line is also going to be a problem so we will talk about all those problems so here is my e-commerce application is very sophisticated if it comes up if it does not come up we go to slides I am connected to the internet yes it came up let me make it make the font bigger is this readable maybe no this too big ok this is readable ok so here is my e-commerce application this is basically a pretend e-commerce application you have to pretend that you are a customer and then you have to execute the SQL statement to create yourself you have to pretend that you are placing an order so you go and say insert into order that is it is cool because you actually see what we test as under the covers but it is not cool for a real user say I want to place an order please send me your select statement so I am going to make this font bigger let us see I have a read me here ok so I am going to first load data into this thing so let us look at what is in data so this is a bunch of insert statements there is insert into customer insert into merchant product so basically prime the database these are statements that do not talk about any sharding or anything they are just talking as if it is a single database so I am going to run this directly against my SQL with me ok I am going to say run this oops I have some what a completion problem ok so I am going to say my SQL please execute this statement against the cluster that I just created and if you look at this cluster there is a product which is unsharded customer is a sharded cluster there are two shards and merchant has two shards so that is the setup here so run this it says data is loaded refresh and voila so data is loaded so what this application does is it actually snapshots the database before every operation and then shows you what has changed so anything that is highlighted reddish is new is what it has discovered as new so yeah so basically it is loaded as you can see some rows went on the left side of customer some rows went on the right side of customer so this is something that we test can figure out for you you just send your insert send your selects and we test will know where to route that query so that is the first super power of the test if I say select star from product it says oh I know where product is product is in the product database so I am going to send it to so this right hand side green window shows what we tested with your query so it says I sent your query to the product database because I know that product is in the product database and on the left side is the result if I say select star from customer it says customer is a sharded database if you said select star from customer then I have to collect all rows from all shots so I am going to send the select star from customer scatter it to all shots gather the results and send it together if I said where CID equal to 1 it will do something else it says oh I know that customer ID 1 is in the left shot so I will just send it there 8 is because hexadecimal so 8 is in the middle alright now let us make it do something cooler so this is a join this says get me all the customers and their orders do a join of customer with order and give me a join I think so from customer is one and join with orders right what should happen it should all be scatter the reason is because the way the system is sharded orders live with their customers so if you can see that the customer ID 1 all their orders are with in that shot itself so if you give this query to witness it says yeah I know I know that the orders are living together with their customer so I will just do a simple scatter but if you did something crazy where you order like if you did something that is relationally meaningless but like for example did this join right instead of CID did OID it will do something else it did not like it OID does not exist anyway never mind so it will do a scatter query and do a nested loop join do all that stuff but will give you a relationally accurate answer that has no no practical meaning we can try this query again but I want to get to the end the more exciting part so we will come back to that we have time now I am going to do an even more complicated query so this query says I want to know the product names that were ordered by the customer so what you are doing is joining customer with the order finding the product ID of the order and then go to the product table change the product name so let us see what we test with this it says oh I know that customer and order are local to each other so I am going to break this query up into two parts one is the part that joins oh I have 5 minutes left so I am going to go really really fast and then for each of those rows that come I need to go look up the order so I will do nested loop joins against the product so in this scenario but this is painful right what if you have a million customers for each row that you got you have to do round trips against the order so what we test allows you to do is to say hey I know product is a small table what if we materialize that in every shot of customer then it will work fine so I am going to accelerate that command let us see here run it refresh is materialized so now if I run but then it is the name of the table is different because it is an order that is in the customer so this is the query and you can see now it is doing a simple scatter and if you go and the more important thing is if you go and like added a product right so I am going to insert into product as soon as you insert into product so everything is immediately replicated into the target so this is actually a true materialized view which means that you make something change the source table the target stable are kept up to date which means that your queries will continue to work with the local join efficiently and the same thing can be done where if a merchant wants to join with order right so that is also a problem so I will skip ahead of the steps that do not work and show you what works let us see we are going to materialize the order table into the merchants data but the difference is that merchant is also sharded sources sharded target is sharded not only that the target is sharded by a different key in the in the source source is sharded by customer ID the target target is sharded by merchant name so now let me refresh and there you go it is materialized and then even if you like like now let us say here is a query where the changes the merchant name of the source so I am going to change order ID one which is this one is mono price I am going to change it to new egg now let us see what happens if we do that boom so here the row changed but in the target the row moved so it is actually a relationally consistent view of the source means that you can rely on the accuracy of the target data there is one more example which I do not have time to go into which is actually the way these materializations happen is by actually saying that materialize this table and the materialization is expressed as a select statement I want this table to be materialized as a select statement select a,b,c,c and put it in the other table but the obvious question is two minutes and I am almost done what can you do aggregations can I say can I do a materialization using a count star can you do a sum the answer is yes this works for those expressions also if I had time I would show you that but there are other demos where I have shown where I have shown this to work so I am out of time it was great talking to you if you have any questions I have 30 seconds to answer them yes transactions could be pretty expensive depending upon their own various machines and everything which sort of data traffic is more preferable is it OLTP do you like with the support OLTP transactions also or with this is built for OLTP mainly OLTP people that come for OLAP we tell them go elsewhere so then how will you handle those joints and everything if suppose merchant's table was that large that you cannot materialize yeah there are some OLAP transactions that we test can do and not all joints are OLAP there are joints that you have to do for OLTP systems so those are still needed so that is that is still required but then the read time would increase in the sense that you will have to take your query to different different charts and different correct so that problem is solved by materialization so because OLTP transactions also need these joints like if you have to do it with a merchant you still have to do multiple round trips that you still avoid so there is a big gap between OLTP and OLAP there is you can call it OLTAP or something so we test takes you pretty close in that area and only the last part where pure OLAP queries that you can export and into a columnar store actually a row based store is not recommended for OLAP for that type of query