 Hi, everyone. Nice to see you here. Today, I will talk about how we use RAST to build a distributed transactional key-value database. Before we start, let me introduce myself. My name is Tang Liu, a Chinese guy, as you can see. And I'm the chief architecture at Pinca. And I have been the distributed database, ThaiDB and ThaiQV. I'm also open source lover and have developed some open sources like this DB, Gomez-Circle, Gomez-CircleElectSearch, and the RAST problem issues, et cetera. Oh, that's all. This is today's agenda. At first, I will give you a brief introduction about why we should build a distributed transactional key-value database, the problem for us. Then I will show you the hierarchy we sought for the database. And at last, I will show you how we command them all into our database. OK, let's begin. So when we want to build a distributed transactional key-value database, we found that we needed to meet many, many problems. And we needed to conquer many, many challenges. Like this? Oh, where is that? Black. We may must guarantee our database is data consistency. And our database is scalable. And stable. And has a high performance. And has a high availability. Of course, we must met our database for the essay compliance. And many other problems we need to conquer. So that is this award and that may have for us. So what can we do? In a traditional Chinese saying, a high building allows foundation. So there are many, many things we need to conquer. But we can build it from bottom to the top with different hierarchies. So let's begin. But at first, we need to choose a language to build it. It's rust. Here. OK. Here you guys are very familiar with it. So I won't talk about it anymore. I'll just give it. So let's start to build our database from scratch. The first thing is that because we are a database, the first problem is how we can save our data in the local machine. And because we are a key value database, so we choose a key value storage engine. There are many key value engines in the world like Rock2Db, like LeverDB, WireTiger, and etc. And you can even build it by yourself. But here, we choose Rock2Db. Why? Rock2Db use LSM Tree. As you can see, LSM Tree has a high performance for the ride. It's ready for the random ride and has a good performance for the read. And the Rock2Db team itself does lots of optimization to speed up the Rock2Db. And provided lots of features about, like, ingest data file, delete range, to let the user build their service easily on top of Rock2Db. And by the way, another reason is that Rock2Db is very stable. And it's used in the MyRocks and MongoRocks DB. So we decided to choose Rock2Db. Here is the project. Because Rock2Db is a safe platform library, and we wrap our own Rust Viper here, you can have a try if you want to use Rock2Db in your Rust project. Rock2Db is cool. But it can only save your data in the local one machine. If the one machine crashed, oh, your data is lost. So here, we need to find a way to replicate our data into multi-machine to guarantee our data safety. Of course, we must guarantee our data consistency. How we can do? Here, we'll use the Rust. Anybody here know Rust? Oh, yeah, that's cool. Rust is distributed consensus algorithm. It's based on the replica log. When the client want to write something to the service, and first, we will use the Rust to append the command to the Rust log, and then use the Rust algorithm to replicate the Rust command into the multi-machine. After majority of the load except the Rust log, we can think the Rust log is committed and can apply the Rust log to the same machine, then return the result to the client. Using Rust can help us to keep the data consistency and do the replication automatically. Rust is good, but it can't save all your data in one machine even when your data grows huge and huge. So how we can do here? We can, when your data grows large and large, we can split our data into multi-graphed groups. Here, you can see that we treat our wall data as an infinity-ordered patch map, sorry, map, and split our wall data into different carry-ons. And each carry-on corresponded to a rough group. Here, we use a multi-graph to support the data, for example, scalability. So how can I do this? For example, if we now have three load ABC, and we add a new load D, and we want to transfer more of our data to the data from C to the D, how we can do? Luckily, we can use a roughed internal algorithm membership change, like this. We can use a roughed membership change, config change, and load directly to add a new replica of reading two on load D. Then we can use a roughed config change reboot load to remove the replica from C. As you can see, using Rust, we easily transfer the data from load C to load D and do the data horizontal scalability. Very easy. That's cool. And we build our own roughed amplification here. It's very stable. And if you want to use a consensus algorithm in your service, you can have a try. And you can even use it in your production environment. Now we have multi-graphed to support the data horizontal scalability. But another problem is that for the distributed transaction. Let's me consider the following case. For example, for the bank transfer, if you want to transfer some salary from account A to account B, but I'm not keeping both A and B in different loads. So here we need in different multi-graph groups. And here we need to solve the problem that how we can keep the data consistency crossing multi-load, crossing multi-load, crossing multi-rough groups. That's easy. We can use the traditional two-phase commit, 2PC. But as you can see that 2PC has some problems. So here we use an optimized two-phase commit inspired by Google Pack Later. And it has a multi-version concurrent control and provides the server-selection and support of the basic transaction. I won't talk more about the distributed transaction here because it's not related in this topic. So I will skip it. If you pay attention to it, you can search the Pack Later paper and no more. So now things go better. A lot of problem is that because we are distributed the transaction, distribute the database. And we have many machines and many services. And each service needs to communicate with each other through network. So how we can do this? Here we choose GIPC. GIPC is an RPC framework developed by Google and has used in many famous projects like Kubernetes, like FCD. And GIPC has many languages supported. You can use Java, you can use C++, Golan, PHP, Python, Lua, et cetera, many languages supported. And GIPC is based on HTTP too. So you can get a benefit of HTTP due directly, like HTTP S. So here we did actually use GIPC. Google has provided official GIPC library has four APIs, like eulogy, client streaming, server streaming, and the duplex streaming. They are very powerful. But anarchy, this API are asynchronous. As you can see, writing asynchronous code is a nightmare for us. And you can even get sucked into the callback here. But anarchy, in Rust, we can use the futures. Here is a simple example. When we call the eulogy API, then we return a future. And then we can use the future weight and we'll put the future later. So here we use the Rust futures to provide a synchronous API for easily use. And we built our own GIPC wrapper here. If you want to use your GIPC in your own service to build your RPC from work, please have a try. So next, the database takeshift. So a lot of things that the database can work well. But a lot of things is that. How we can guarantee our database work well? Here, we can use some monitor to check our database can work well or not. Currently, Prometheus is the most famous monitor in the world now. So we decided to use it. Prometheus provided four types, like counter-gauge his program and assembly. But here, we care mostly of the three, counter-gauge and his program. Here is a simple example. And you can use the reject counter to get a counter, a Prometheus counter and an increase in the counter. We also built our own Prometheus client in the Rust. You can have a try. And this Rust client is suggested in the Prometheus official doc or have it already. So you can have a try it. Now, finally, let's go here. Testing. In my opinion, testing is the most important thing in to build a distributed database. Because we are building a database and we need the customer trust us and we may must ensure that our data in our database cannot be lost, cannot be corrupted. So how can we do this? I think the only way is to do more testing. And we need to do many, many testing. And for example, we need to do traditional unit testing, integration testing. And even we can do chaos testing many, many more. But I don't cover them all here. But here I will mention about a fair point injection testing. Fair point injection is a fair point inspired by free BS field. And you can see the example here. Here we reject a fair point like in the function full and trigger the failure outside when the program run and enter the function panic. Oh, that's cool. So this is our Rust fair point implementation. If you want to ingest some failure into a service and do some funny things, and you can have a try. So that's war. And now we can. I have mentioned the world hierarchy about. And now let's command them together. This whole architecture of war, from the bottom to the top, RockDB, to save data, using Raft, your replicator, and do the data horizontal. And have MSEC and the distributed transaction API use the JRPC for communication and then use the program issues for monitor. That's cool. So this is the whole thing. Type-KV. Type-KV is a distributed transactional K-value database. Of course, it's written in Rust. Type-KV, many users have users in Type-KV. Type-KV is a user in many in the product environment. And as I can see, I can know that one Type-KV cluster has now deployed about 140 machines for one Type-KV cluster in production. And from now on, we have no one complain us that their data is lost. That's cool. But this is not our final goal. Here, I will ask you a question which I don't mention, which I don't ask you before at first. That's why we want to build a distributed transactional K-value database. We're not only for the Type-KV, we aim higher. First, we want to build a distributed relational database. This is Type-DV. Type-DV, we built a MyCircle layer on top of Type-KV to provide a distributed online transactional processing solution. You can use Type-DV like using MyCircle directly, nearly with low compatibility. And we also want to build a distributed analytical database. This is Type-Spark. We can run the Apache Spark on top of Type-KV directly. And we provided a distributed online analytical database solution. So what do we have here? What's there we want to go? We want to go to build a hybrid transactional analytical processing database. This is the big future of war. As you can see that, we have Type-DV to provide on OLTP and Type-Spark to provide OLAP. And if you want to use the transaction K-value, you can use Type-KV directly. This is a whole picture. And this is our final goal. And our ultimate goal is to use one database for your data. Yes. That's all. Thank you. Oh, by the way, we are coming to China. If you want to pay attention to Earth. Oh, you asked me how we handle the deadlock in a distributed transaction? Deadlock. Because our transaction model is optimistic transaction. So for transaction A, we want to handle key A and B. And transaction B also want to handle A and B. And at first, we sort the world keys, like A and B, and then commit it. So because the commit key is sorted, so it can't be deadlocked. Yeah, all the key before commit is. Yes. You mentioned that you support snapshot isolation. Yes. Do you have plans of extending that to serializable snapshot isolation? Yes. That's a good question. You asked me that we already support snapshot isolation. So how we can support serialize SSI? Yes. Now, as you can see, we support MVCC. And MVCC is only for SI. So if we want to support SSI, we must let a user to write a circle, like Slack for update, explicitly to support SSI. Maybe that's the only way we can do here now. Using the C1, write a multi-prost. What is your experience and what are your plans? Oh, you asked me that we are building a C-wapper. But many other people, sorry, I don't know his name. But I know he is the author of MIO. I had to build a pure Rust implementation. But why we still choose the C-wapper? Because we built a C-wapper. We want to use the GIPC about one year ago. At that time, there's no stable implementation. And the only stable implementation is the C core libraries. We have to use it. But later, if the pure Rust implementation is coming, we will try to use it. This is our plan. Which SQL standards do you support? Or is SQL not possible for query languages can I use to create data? Oh, yes. You asked me, which circle dialect do we support? I said, I'm sorry. TIDB, we built a MyCircle layer on top of TIDB. So you can work the circle dialect of MyCircle. We support it. But we don't support, sorry, we don't support, sorry, sorry, I have a tooth. No, no, we support or join. It's better not to support in MyCircle. Support in PostgreSQL, we're not supposed to join. We're not supposed to procedure. Is there a function, SQL function? Maybe more of the SQL function we supported. So you can convert your, change your business from MyCircle to TIDB directly. Yes, you asked me that TIDB supports some API for the transactional API. Of course, because as you can see there, TIDB must probably the transactional API. So like user, you can use this, sorry, like this. User can use like this, say you begin to start transactional and then you get, you call get and then you call set and then commit the transaction. This was both supported in TIDB. No, but can someone else listen to your transaction like a separate service? Like a bin log tailer. Oh, you asked me that, does TIDB support bin log or not? Well, for instance, if you were to have perfect site, maybe something to take offline. We need to do more audit trail or have some debugging of what's going on, if you can have centralized node that gets told, everything is going on simply. And it does need to be high availability, but more for debugging purposes and so on. Sorry, I still can't understand. So you might want tooling that will listen to, say, there's been a transaction and you want some other tool to be made aware that there's been a transaction inside of the key value store. Like transaction moments or important? Like transaction moments. Like external watching of the transaction log. Oh, like my token bin log, yes? Yeah. We have a bin log implementation, but not in TIDB, it's in TIDB. Because you can say, well, the circle is through TIDB. So TIDB can record or do record in our bin log implementation. But this bin log is different from my circle. And we can use a bin log, and you can use a bin log to synchronize all your data from the TIDB to other service like single to Kafka or E-Lack search or even another TIDB, we can use a bin log. Yes? Folks, do like a predicate push down or anything for scaling out? Push down? Yes, that's a good question. I don't mention push down here in TIDB because it's relative in this topic. And but you can, sorry. I don't mention here that TIDB will only provide the type transaction API. It will still provide a corpusceta API. When you look corpusceta, corpusceta is the same the concept in the space. And you can push down your some logic to the TIGV directory and do some calculation and then return the result. We already do this, but I don't mention it.