 Good morning, everybody. Just one more minute and we'll get it kicked off. Actually, it's 8.02. Kevin, let's get going here. In terms of the agenda, we've got two things that we want to do this morning. First of all, we've got Kevin from Tai KV, who's going to be presenting. And then we've also got the follow-up item of the KubeCon sessions. So it's getting kind of hot and heavy on getting those done. And so we're going to make sure that everybody signed up and we set up some groups to start collaborating. I think that the next session that we do in two weeks, up to KubeCon, is going to be dedicated to just reviewing the sessions and collaborating a bit more on them together. So don't expect a presenter next week. Cool. Sounds good. Should I just get started then? Are you there? Sorry about that. Can you guys hear me now? Yeah, I can. I can. So I was just doing the kind of the intro agenda. I don't know what actually came across because it looks like my Wi-Fi was flaky. Did you guys hear the intro or did you miss all that? I heard it all. What was that? I heard it all. Okay, great. So let's go ahead and kick it off. You got about 30 minutes and then we got to get to the next agenda item. Okay, sounds good. Thank you, Clint. So my name is Kevin Xu. I'm from the company Pincap. I'm the company is ahead of U.S. strategy and operations. And both myself and Ed Huang, who is our co-founder and CTO, will be doing this presentation and taking your questions afterwards about Thai KV. Ed is actually dialing in from Beijing. I think it's like 11 PM over there. Ed, do you want to say hi real quick, everybody? Hi, hi, OR. Yeah. Hi, so amazing. Great. All right, so I will just launch into Thai KV, which is an open source distributed transactional key value store. This is again, one of the projects that Pincap, the company is making right now. And let's get started. All right. So a quick agenda to our presentation today before we dive into the whole thing. We'll do a really quick history of the company and the product. Pretty detailed technical walkthrough of Thai KV. I will also do a quick use case summary by one of our largest adopters, a company called Oilema, which is literally means are you hungry in Chinese? And it's actually one of the largest food delivery platforms in China. They were recently in the news for being purchased by Alibaba for $9.5 billion or something like that. And they exclusively use Thai KV to handle their storage. And I will also do a quick demo of how Thai KV works inside the whole Thai DB ecosystem. And then both Ed and I will take your questions. And I know these like zoom signals get dropped off pretty frequently. So if at any point I start trailing off and you start missing what I'm saying, please just shout at me. And I'm happy to rewind and restart. All right. So quick history about the company. It was founded in April, 2015 by three infrastructure engineers, Ed being one of them, Max and Dylan being the other two. And we built a Thai DB platform. Thai just stands for titanium. There are a few main components in the Thai DB platform. One is Thai DB itself, which is a stateless sequel layer that is compatible with my sequel protocol. The other one of course is the focus of this presentation, which is Thai KV, you know, Thai key value essentially is what it stands for. It's a distributed transactional key value store. We also have a project called Thai spark, which is a spark plugin that allows a lot of our users to run more complex OLAP queries directly on top of Thai KV. And another component is called placement driver, which is kind of this meta store component cluster that does a lot of the auto scheduling as well as load balancing and works very closely with Thai KV. And everything that we have been doing so far was open source from day one. We open source Thai KV on April 1st, 2016. And the whole project's current version is at 2.0 RC four. Since we did start open source from day one, we've been very well received and pretty popular in the community. The Thai DB repo itself is more than 12,000 stars. Thai KV itself is closing in on 3000 stars. We also have quite a few contributors, not just folks who are working for us, but people from really all over the world. There are also a few, I would call them institutional contributors that I want to highlight real quick. Companies like Samsung, like Tencent Cloud, which is one of the larger cloud providers in China. Also Ulima, the use case that I'll be highlighting is contributing to the Thai KV project, as well as companies like Mobike, which is one of the largest bike sharing companies, and also Total.com, which is a popular news aggregator app in China. And of course, we are popular, I think, because we are trying to solve a pretty pressing pain point that a lot of organizations and companies are looking for. As we accumulate just more and more data, doesn't really matter if you're strictly in tech or not. All of our lives are getting digitized every single day, and companies and organizations need a unifying distributed storage layer that can support really key features like strong consistency, asset compliance, easy horizontal scalability that can theoretically go into infinity, and also, of course, a cloud native architecture, as more and more organizations are moving into that direction. And of course, hopefully it's open sourced as well. So we actually got our initial inspiration from Google Spanner, the Spanner paper, which I'm sure many of you are familiar with. Unfortunately, Spanner was not open sourced and still isn't open sourced, but Thai KV is. And we really imagined and designed Thai KV to be a fundamental building block that could simplify building other systems, more interesting, useful systems on top of it. And so far, we've built Thai DB and Thai Spark ourselves, and Total.com, which is this news aggregator app again that's actually valued at $20 billion in China, so I probably shouldn't call them a startup anymore. They use Thai KV as their metadata service for their own kind of S3 architecture, and Ulima, which I'll go into a little bit later on, made their own Redis protocol layer. But essentially, with Thai KV being the building block, we really do think the sky is the limit in terms of what you can build on top of it. So a quick overview to how Thai KV looks inside just a Thai DB ecosystem. As you can see, it's quite literally the center of the entire ecosystem. You had Thai DB on the left. That again is the Stateless SQL layer that can talk directly to Thai KV, which is where the data is actually stored. And you have Spark on the right that does the OLAP query leveraging Spark that also talks to Thai KV directly where the data is stored. And Thai KV works with what we call Placement Driver Cluster to do a lot of the metadata shuffling and organizing. Now let's dive deeper into the Thai KV architecture. We've always imagined it to be a standalone distributor key value store. So right now, any client really can use GRPC to directly connect and communicate with Thai KV clusters. GRPC, of course, is a CNCF project. So we are heavy users of that. It has two APIs on the top. One is a transactional KV API. Another one is a co-processor API that we've designed to do a lot of push down to push down SQL logic into the Thai KV instances to be processed. We also apply the RAF consensus protocol to do our replication to provide strong consistency as well as a high availability. And underneath each Thai KV instance, you can imagine every single one of them essentially also have its own ROXDB instance that takes care of the storage. And this is just one example of how we build ThaiDB on top of Thai KV, ThaiDB being the MySQL compatible SQL layer. But you can really plug in all kinds of different ORMs or other database connectors onto it. And if you see the little color blocks in the Thai KV nodes, those blocks are essentially different RAF to groups. And we've chunked them into what we call region, which is a concept that I will probably mention a few more times later in the presentation. And each color, if they're the same color, whether it's light blue or green, they are the same RAF group distributed among multiple Thai KV nodes. And that's how it works. And as you can see, they're quite balanced. And that is actually done dynamically by Thai KV. And I will talk about that a little bit later as well. And here is another example, a distributed object storage use case where Thai KV becomes the metadata service for a set of blob storage instances. And in this case, Thai KV is essentially kind of like a giant map where they can map blobs directly to their appropriate or the connected blob storage instances. And this is actually how total.com is using Thai KV for their own metadata. Again, RAF is in play as well. So a quick summary of the technical highlights of Thai KV. So as I mentioned, it does scheduling and auto balancing, working with the placement driver cluster. It has this multi RAF implementation, multiple RAF groups inside one Thai KV instance. It also does dynamically range-based partitioning using splitting and merging. And this is how we resolve hotspot issues. And I'll go into that in a little bit as well. As a transaction, of course, we use a two-phase commit with optimistic lock. And the whole project for those of you who are curious is written in Rust. So no garbage collection stop time as well as runtime cost either. And this is a set of benchmarks. The YCSB benchmark that we literally ran last night for the purpose of this presentation. So you guys are the first ones to see this result. It's very fresh off the press. This is the environment and the hardware that we use to run this particular benchmark test. And as you can see, both the insert TPS and the reQPS operations per seconds are all both quite good. And one thing I want to mention for this test, too, is that we use the standard three Thai KV node deployment. It's kind of like our default. But, you know, in any large in-production setting, usually you have many, many more Thai KV nodes distributed to handle workloads. So the performance in a real-world setting should probably be even better than this. And this is a brief comparison between Thai KV with other popular node sequel databases that are out there. I know Mongo announced that they're going to be asset compliant as well. But the thing is still in beta, so no one's really sure yet if it is. So I'm giving them a maybe on that for now. And this is a quick graph that explains how we do split and merge dynamically. So again, this is one of the features where a Thai KV working with the placement driver can do this dynamically. And kind of the metrics that we use or the configuration we use to do splitting is if a region size exceeds 96 megabytes, which is our default value. Of course, this is something you can change, you know, depending on your needs. Then a split will happen to avoid a region being too large or forming potential hotspots down the line. And for merging, it kind of works the same way. If a region is less than 10 megabytes, again, a configuration you can change yourself. Then we will find an adjacent region and merge them to, you know, bring this system to become more efficient. And in terms of how we do dynamic hotspot scheduling, you can see these two nodes, one workload is going directly or actually all the workload is going directly to one machine. Well, the other machine is not doing a whole lot. And the blue blocks here denote the leader node or the leader element of a given RAF group, which is of course the element that serves up all the data to the application. So one machine is getting out of the work. The other one is not doing much. And we have a dynamic system where we can do RAF leader transfer almost automatically. So that's now in the resulting setup. We've essentially switched the leader from one machine to another for region B. And now the workload is more balanced and the hotspot is avoided. And of course, we've always designed high KB to be cloud native from day one to have it work very well with the Kubernetes ecosystem for to be easily deployed on all the public cloud vendors as well as private cloud setting as well. Right now, the product is on Tencent cloud and you cloud, which two of the larger cloud vendors in China, and we are working on our AWS integration right now. And we will move slowly but surely on to all the other cloud vendors as well. So our local deployment is containerized using Docker compose and I will actually demonstrate that in a little bit near the end of the presentation. And we are working on our Kubernetes integration with a thing called tidy be operator that we are working on right now. And we use a lot of other cloud native projects. So many of them are hosted on CNCF to help, you know, boost the performance and also the user experience of the entire type KB deployment. So we use for the meatiest GRPC like I mentioned, you can see the Grafana, we are actually the maintainer of Prometheus and GRPC implement implementation in Rust. So we have contributed that implementation back to the community as well and are very actively developing those two things. So with all those blah, blah, blah, you know, you might be wondering who is actually using this thing. Turns out quite a few companies right now have already deployed type KB in production. We released this thing, you know, as a 1.0 only last October of so October of 2017 so it's only been about six to seven months. And in the APAC region is already getting a lot of adoption when the 40 companies are using this in production either type KB by itself, or along with other components of the tidy be platform. And the industries range from like I said, food delivery to e-commerce to live streaming to media to fintech to all kinds of different companies. And of course, like was sharing and things like that. So now I would do a really quick dive into how Ulama uses type KB. So like I mentioned is a food delivery platform kind of similar to your door dash or your postmates. Quaraba Alibaba is currently serving about 260 million users and the bulk of his data is in key value format and before they adopted type KB. They just used a hodge podge of solution from Mongo to Cassandra to my sequel Redis and so on and so forth to made it to make it work, but they were looking for a single unifying storage layer. So they tested out type KB. And right now they've deployed type KB in, you know, 10 plus clusters spread out in four different data centers, more than 100 nodes, you know, more than 10 DB TB worth of data is in their type DB clusters right now, they're building about 80% of their platforms traffic, which is quite a lot. And the most interesting thing, of course, that we thought was really cool was that they built their own Redis proxy on top of type KB because that's what they needed. And that really showed the versatility of type KB as a standalone project where you can build really whatever you need on top of it if you need a distributed key value store. And the performance metrics here is one of the things that they decided to release as the performance metrics when they were testing one of their services on type KB that we want to share with you guys as well. So with that said, I'm going to do a really quick demo to kind of show you how type KB works in action. I did make my tribute to the demo God this morning. So hopefully, everything will go smoothly. And to give a quick context to this demo. So what I would do is launch a type DB cluster and kind of show you how that works which is very simple using Docker compose and then also launch a my SQL instance and a spark or a type spark SQL instance on top of this cluster that is of course undergirded by type KB. And we will just do some real time analytical queries on top of it to see how it enables kind of this hybrid, you know, real time data, data warehouse experience. All right. So I've already get cloned our Docker compose for type. So let's just launch this thing. So bubba bubba bubba. Everything is done. And before we write some queries, I want to show you guys a couple of things that are really cool. One is we use Grafana as the monitoring service for a type to give you cluster so we've defaulted that to port 3000. And admin the username and password for this is just admin admin very secure. And here you can see how you can monitor a type of your cluster. So through the overview, not really that interesting because nothing's really happening. But if you go to type KB, you can check out the cluster. You can see the available size the storage size, the capacity, all things like that. You can also look at the placement driver cluster to see what's going on with the entire cluster. Again, not a whole lot is happening, but you can definitely play around with this. I play around with this all day. Sorry to interrupt you're just deployed on your local machine. And this is just deployed onto my local machine. So, you know that whatever the numbers here probably corresponds to that. And another cool tool that I'll show you real quick. This is something we made ourselves is a visualization tool of the whole cluster. So that's defaulted onto port 810. This is thing called tidy vision. It works. Okay, so this is you can see basically the three type KV clusters right now against is mostly empty. But if we look a little bit deeper into it, you can see these gray blocks essentially being these are the raft region follower nodes, and then the green the dark green leader nodes. And once there's more interesting stuff going on, they will actually be lines kind of going back and forth between these nodes to denote kind of the communications between them. So again, something that I find rather mesmerizing so I play with it all day so you guys can do that as well. All right, so going back to terminal we've launched our tidy be cluster locally on my laptop. So now I am going to first get spark going. So let me just do a few commands here. Let's do JVM data spark in and then we will launch our spark shell. So we'll give that a minute. And we will also launch my SQL. And again, all this instruction and commands we have on a GitHub and I'll share that link with you afterwards as well. So we launched my SQL. And as you can see, it's directly connected to the current version of tidy be that I just cloned on to my local machine tidy be as a whole can essentially right now serve as my SQL slaves. So you can keep on writing your my SQL code however you want and enjoy the scalability that tidy be brings to you underneath it. Let's see if Okay, so spark is up and running. So now I just need to import tie spark, which is a plugin that we made for to leverage spark on top of TV. All right. And the table or the data the sample data that we'll be playing around with is in this thing right now so let me just show you what is our database is in here so we will be playing around with this TTC TPC h 001 database which is just like a set of sample data that we play around with. So if we do actually use to see age one. So now we're using that let's combine that with our type spark instance as well. All right, so that's good. So now both of these instances type spark and my SQL should be seeing the same data. So let's see what the tables are. So there are a bunch of tables in here, one of them is called nation. Let's just see what's in there. So, select nation. And you see a giant list of countries with some comments and things like that. And let's see if the spark side sees the same thing. So let's do spark. SQL select nation show the first 20. And you see the same list of countries. All right. And we can actually of course run any level of complexity of queries in here I have one heat up here that is particularly gnarly with a few group buys and order buys and conditionals in here on the my SQL side. This is coming from the line item table. So this will give us some kind of data. And let's make sure that the spark side sees the same thing. So this is the spark equivalent of that. Okay, we'll do that. This might take a couple of seconds extra. So you see the same data here. Over here, just format it a little bit differently. Okay. Now let's say we want to change something on this data set again this data is all stored in high TV nodes. Right. So let's do like an update of the nation database sets. Let's do the nation key and nation key equals 1122. And who do we pick on today. So, from where nation equals who's the 222 Brazil. All right, so let's change Brazil from to to to to oops, I think I made a mistake there. Sorry about that. Oh, that's a special and name. So not nation name Brazil. All right. So if we do select star from nation again, Brazil is now changed to 222. Now let's see if the spark side sees the same thing. So spark SQL select star from nation show 20. And voila, Brazil is also 222 because they're drawing from the same data source and you can do basically simultaneously transaction and analytical processing on the same live data set enabled by type TV. So that is my demo and to just wrap things up real quick. Again, we always want to take it to be on its own. That's why we are presenting this project to see NCF for your consideration to potentially host it as an incubating project to really make it on its own. Right now it has more than 40 in production deployment already and probably more coming down the pipe. And with this contribution, we really hope that it could not just be on its own but have more and more features that we know the community wants and a lot of our adopters want in the future. Like drivers for other languages. Right now we only have a gold client for tidy B and a Java client for type spark, but you know Alma sort of made a proxy for Redis but maybe the open source and the future who knows. We want calm family support that already started on this most recent pull request but is of course very far away from being finished and other useful features as well as even more program paradigm support beyond key value. All right, with that, thank you again for giving us the opportunity to talk about you pack a V contact us either add or myself anytime and of course happy to take your questions right now. Kevin definitely a fantastic job and put that together thank you for for presenting. Of course. Any questions out there for Kevin. Yeah, I have a question. Very nice demo. Nice talk. So you had a slide that mentioned I'm you're using it CD and it CD itself is a distributed key value store. It has support for transactions. It's based on raft and grpc. How are you using it CD and what after doing on top of it CD. Thanks. Okay. Hi, this is Ed and I'm just the city of pink cap and we use it is a day as the embedded method data store in placement driver because you know it is it is not scalable. It's only one raft group in one it is the deployment but take away is a use multi raft model. So we store their method data and the placement info to placement driver which is stored in embedded it is it. So we use it is the inside take away. I see. Thank you. What, what's the predominant way that type of ease being deployed today. Right, is it just normal deployment to an OS isn't on top Kubernetes like, what are you guys seeing right now. Right now, the, the in production deployment tools that we're using is a ansible deployment, which can be used on really any cloud service, whether you know since and cloud or AWS over here. And we are still working towards having type AV deployment be fully fully integrated with, you know, the entire tool set that's available in Kubernetes. So the type of the operator that I mentioned is one of those projects that we're working on right now. And, you know, hopefully, if this project were to be accepted, then, you know, that could be accelerated as well. So we're definitely looking to get that going as soon as possible. How are you thinking about that operator like, what, what's the scope of that in terms of the quote unquote like day two operations because I can see it being beneficial for getting it up and running and just replacing ansible for day one but are you guys thinking about the scale out capabilities of the database and leveraging the operator do more than just day one activities. And do you want to take that I think you're more familiar with operators operator okay. Yeah, the whole idea of tidy be operator is from it is the operator. It just handle the deployment skill out and out to failure for the whole tidy be project, not only for the, not only for the type AV part. But it is on our plan to, you know, extract the deployment, a type AV deployment operational knowledge to another operator project. So it's kind of like it is the operator, the tidy version of it is the operator. But it is not open sourced yet. We are working on it now but maybe in the future we will open source it. Yeah. I had a quick question. It's Quinton here. Could you speak a little about the differences or just comparison against cockroach DB which appears to be a superficially at least quite similar in goals aspirations. There's a couple points on that and you know, Ed, you can chime in and feel that and fill in more details as well. So in terms of the architecture of cockroach DB compared to what we have made here. You know, type KB our key value part is completely separate and therefore can be, you know, moved around and pluggable from different systems on top right which is kind of this larger point that we hopefully we made during this presentation. Well, to my knowledge, everything that cockroach does is in one deployment. And you know, the advantages to that and the advantages to the way we're doing it. We think the way we're doing it to have type AV separate is more scalable is more flexible. And it's even easier to debug if you are an administrator of this system for a given company. And the use case that we are hitting right now in terms of our in production, you know, use case for our customers is actually this hybrid transactional analytical processing database experience to give to give people this experience of a real time analytics support, which I don't think cockroach DB does. They are still more strictly a OLL TP a scalable solution to my knowledge. So, you know, I think we both drew our inspiration from the Google spanner paper back in the days, but we definitely have a lot of differences between us. And you know, there's like this more superficial compatibility. You know, we are compatible with my SQL and cockroach is compatible with Postgres. So, you know, different people kind of use us in different ways. I don't know, Ed, if you have anything to add to that. Yeah, it's good. And on the other hand, we, we share a different transaction model with cockroach DB that the whole asset the transaction implementation is different from the cockroach DB, but in the user user side, just just like Kevin said, yeah. Thanks, that's a great answer. Thank you. All right, fantastic. I think we got maybe one more minute for questions for Kevin and team anyone else have any questions. All right, Kevin, thank you for for presenting. Thanks for answering questions. So it was a great summary and overview of Tai cave really cool stuff. All right, thank you so much for having us. I really appreciate it. All right, everybody. So on to the next topic. So we've got cube con coming up pretty quick. We've got our two confirmed sessions for cube console. The first is a introduction session from the SWG. And then the second is a more advanced session in previous meetings that we've had we decided on topics for each of the sessions. The first topic that we have in the intro session is covering the the landscape that the team has kind of worked on over the past year. And we haven't, you know, readdressed it in the last six months because we've just been reviewing different projects in the ecosystem. But I think we want to touch a little bit on that landscape storage and what we're seeing and, you know, what are the high level like categories that we can kind of fit some of these projects that we've been hearing about into. And then we wanted to open that up to a panel discussion after that. So that's the intro session that we're trying to plan. And then there's an advanced session where we've been we've asked the TOC to give us feedback on what they want from us. So what's what's the definite what's essentially the definition of cloud native and how to storage fit into that. And what is the TOC need from the SWG, you know, what's that charter going to be for us for the next six months or or whatever they really want from us. And so two sessions. The first session is what I'd like to get people together to plan in in previous meetings, we had five folks that signed up to be a part of those planning sessions. So side, I think you stepped up or it Ben Hindman and Steve from Red Hat. Is there anybody else on this call that would like to help us out in creating the intro session for CubeCon and be a part of that. Quinn, I think I saw your your name pop up on the intro. Are you are you actively planning to be a part of that session. Yes, sorry, I was I was struggling to find the unmute button. Yes, I would I would be happy to help you guys if you need help and contribute where necessary. Okay. I'd also be interested in helping out. Jesse Brown. All right. Hey, I'm happy to help too. Great. Okay, so lots of folks it looks like we got seven or eight. So I think that the next step here is to send out a an invite but a poll to get some available times for everybody over the next two weeks. I feel like we've got probably a couple to a few sessions that will try to organize between us. And we'll make some progress on putting together a handful of slides for that intro, and then figure out who can be on that panel. So any any comments or concerns about that or or anybody have any other ideas for what we may need to do to prepare. Okay, cool. All right, so intro session looks like we're covered. I think that the next SWG meeting that we have in two weeks. Hopefully we can have made enough progress there to discuss it with the team. Sorry about the background noise. And then we'll go from there. And then following that is going to be the CubeCon for the session. I don't have anything else on the agenda today to discuss. Does anybody want to bring anything up to chat about for that second session. Do we have folks from TOC confirmed yet? I have not gotten the TOC folks to confirm. So I'm going to reach out to them again in the background what's been happening is that Camille's been talking with folks on the SWG about what they are interested in. And you know, how the SWG is doing and what they want to see it turn into, etc. And so we're hoping to get feedback from Camille on that. So I think at a minimum we have Camille talking to us and hopefully other folks on the TOC. So I'll update everybody on the status of that. And if we need to adjust plans, then I'll make sure I send out a message to the group on that. Cool. Sounds good. Thanks. All right, folks. So we'll give you guys back about 20 minutes in your day. Thank you all for making it. Great demo, by the way.