 using a distributed key value store. And the word I really want to emphasize in that title is using, because our talk today is about how you would actually use a distributed key value store in your application. We're Nick and Andy. We're engineers at Pincap, where we both work on TyKV, which is an example of a key value store, as we'll tell you a bit later in the talk. The talk is going to be in three parts. So first of all, we're going to talk very quickly about what a key value store actually is and some example KV stores and about their features and how you would choose one of them for your application. After that, well, we have to pick one of those to actually talk about for the rest of the talk and we're going to pick TyKV. Now that is obviously a biased choice, but hopefully it will be an appropriate choice for some of you and the more general principles will apply to other KV stores too. Then the bulk of the talk is about how your application communicates with a distributed KV store, in this case TyKV. So to start off, I'm going to hand over to Andy, who is going to talk about how to choose a KV store. So I'm Andy, thanks for the introduction. I'm here to talk to you in this section about what is a KV store and what are the KV stores available. Further, I will talk about their pros and cons. So in the beginning, what is a KV store? You might have already been using KV stores in your programming language. For example, the dictionary in Python, or the map in other programming languages, are KV stores. This example in Python is a dictionary that maps the key K1 to value V1 and key K2 to value V2. KV stores often allows you to look at the value by the corresponding key. For instance, curing this dictionary with key K1, the returns is value V1. This operation now we call GAT, except GAT. The KV store also have method put and delete or modified data. In the real world, a KV store often refers to a database that persists gigabytes to terabytes of data in the form of key value instead of this more dictionary in Python. But once you search for key stores in Google, and then you will be drawn by too many key stores which may only have subtle differences, and you must struggle to figure out which one best fits your needs. So I'll pick some of the most common ones and talk about their pros and cons and different kinds of scenarios they are best for. You may have heard of RocksDB. It is built by Facebook, and it is a high-performance embedded KV store. What embedded means is that it's a library and it doesn't provide a standalone binary and doesn't have network interface or protocol seeder. You will have to write a program on top of it and talk to it with function force in C. It is high-performance because it has been tuned by many talented engineers with decades of effort. They've done many optimizations on this guide and interior data structures. What RocksDB is not good at is that it runs on single machine which makes it not scalable and not failure tolerance because the only machine fails or fails. So RocksDB is best for prototyping or running single node application. This in particular talking about when the scale of data is built in a single machine which is approximately up to a few terabytes. And for prototyping, we usually don't care the failure tolerance. So RocksDB is enough. ETCD is a strongly consistent distributed KV store. It is a widely used and tested implementation of rough algorithm. So it guarantees the strenuous of consistent as well as specific. The naive rough algorithm doesn't do well on data on large scale which makes ETCD become unstable with data larger than eight gigabytes. So ETCD is good for simple KV scenario that requires strong consistency and strong persistence. This scenario are usually made in storage or service discovery. Redis is a widely used memory distributed database. It features in building types, building replication, low latency, and high throughput. But it doesn't guarantee the data is consistent or persistent. So the data stored in Redis are usually allowed it to be lost. Redis is usually used as cache for our underlying data stores like Mexico or MongoDB. So when the data are lost, you can always find it back from the underlying database. HBase is an open source implementation of Google Bigtable. It is built on top of HDFS and has good integration with Hadoop ecosystem. Especially supporting out-of-box integration with Blink and Spark. But HBase are often complained about its high latency and the stop-down world port. Besides, HBase only supports single-row transaction. So you have to use other components to achieve multi-row transaction. In general, HBase is very suitable for application with around HDFS ecosystem. Talking to MongoDB, it is quite different to the others mentioned before. It doesn't store KV, but it's that store documents. In other words, MongoDB is a big array of JSON objects. MongoDB is distributed by shouting on different builds of documents. These simple shouting rules makes it hard to dynamically balance on the raw traffic. Therefore, hot rod are always a problem when using MongoDB. Besides, similar to Redis, MongoDB doesn't guarantee strong consistency of persistence. So MongoDB is usually used to store semi-structured data like user info and user history. And it also allows to loss data. In addition, because of the critical requirement on memory, it will be expensive to use MongoDB on large real products. So generally, MongoDB is best for prototyping because it's quite easy to set everything up and running. And for prototyping, the right hotspot and memory usage has not yet been a problem. All right. So, Taikevi is another key value store, but it's one we're going to go into a little bit more detail about because we're going to use it as, well, effectively like a case study for the rest of this talk. So Taikevi is distributed, which means rather than running on a single server, it runs across a cluster of servers or nodes. That makes it, and because of that and because of the way it's designed, it's scalable. If you need to store more data or you need to access that data more quickly, you can just add nodes to the cluster and you'll be able to store more and access more. It's also fault tolerant. So if you lose some of those nodes or if you've got your nodes across multiple data centers and you lose an entire data center, then you should still be able to have access to your data. Taikevi is transactional. So this is a concept that is much more familiar from the kind of traditional SQL world of databases. Key value stores have historically been not supportive transactions and had a kind of eventual consistency, kind of a guarantee, whereas Taikevi supports explicit transactions and gives you the kind of asset guarantees that make working with transactional databases so much nicer. And it has a somewhat rich key value API. What I mean by that is it doesn't just offer kind of the dictionary, hash map, get set kind of API. There's a whole bunch of operations on ranges of values, batches of operations, various kinds of scans and so forth. Taikevi is battle tested. So Taikevi started its life as the back end, I guess, of tidy B, which is a new SQL if you like database. So that's a distributed transactional SQL database. And Taikevi provides the distributed part of that really and but without the SQL layer, which is provided elsewhere. So Taikevi by itself or as part of tidy B is used by a fairly large community of users. Some of those are huge. So for example, JD Cloud, Shopee, Maituan, Jihu. So Jihu are happy for us to talk about some numbers. The last time we got some numbers from them, they had around 200 Taikevi nodes and they were getting about up to a 100 million reads per second from their database on two trillion rows, which is about 320 terabytes of data. So that's a pretty big cluster. And the biggest cluster that we know about is around 1600 nodes. So when we say that Taikevi scales horizontally, it can scale pretty big. Taikevi is being actively developed at the moment. So we're working on performance in particular. So recently we made some fairly major changes to our transaction protocol for hope-as-all, for big improvements to read and write latency. We're implementing new features. For example, we just added a compare and swap or CAS operation. And one of the more exciting things we're working on at the moment is Taikevi's co-processor. So the co-processor is a component of Taikevi which lets us run computation on what would otherwise just be data storage nodes. So this came from Taikevi where for things like aggregation operations, you can get a huge performance win by running part of those operations on the nodes where the data is actually stored. Now the interface to the co-processor is not very useful for other users, but what the work that is ongoing at the moment is to make the co-processor interfaces pluggable. And that means that whatever your use case, you hopefully will be able to run some of your computation on the Taikevi nodes close to the data and get like, well, hopefully huge performance benefits for you as well. And last but by no means least, it's really important to note that the Taikevi project is entirely open source and governed in the open. Last year we graduated as a CNCF project and the various clients that Andy's gonna talk about in the next section are open source as well. So I'm gonna give a little bit more background so you can get an idea of actually what the client is doing behind the scenes. And I want to talk about the architecture of a Taikevi cluster. So here's the basic idea. You've got a whole bunch of Taikevi nodes and the data is sharded across regions. So region is just sending them for shard in this case. So you have the data is sharded into regions and then the regions are spread across nodes using raft. Which ensures kind of reliable replication of the data. And the client is a somewhat active participants in interactions with the Taikevi clusters. I'll show on your next slide. And the client communicates with the Taikevi nodes via GRPC. We also have what we call the placement driver nodes and these are responsible basically for coordination. So what that means is that regions that get too big will be split and regions that get too small will be merged and regions will be moved if nodes are getting too much traffic or too little traffic to ensure that the cluster is running optimally. The other important function that the placement driver nodes provide is timestamps. So our transaction protocol requires timestamps and the PD cluster is essentially a timestamp oracle. So here is that transaction protocol and actually Taikevi sports various flavors of transaction protocols. This is just one example just to give you an idea of what's going on. Now, when I say that the client is kind of an active participant that's because the transaction protocol is a collaborative protocol. So the client is responsible for its end of the protocol and it's only if both the client and the Taikevi nodes all kind of do what they're meant to that you get the consistency properties that you want from the transaction protocol. So I'll just go through this quickly. So when the user starts a transaction the first thing the client does is it talks to the PD nodes to get a start timestamp. Then it's gonna build up the transaction. So for reads it's gonna actually talk to the Taikevi nodes and for writes it's gonna buffer those locally. And it can also cache the reads in case the user reads the same key multiple times. Then once we've built up the transaction then the first thing that the client does is it starts the pre-write phase. And in the pre-write phase the client contacts all the Taikevi nodes that it wants to write to and make sure that the transaction can be, can be written. And at the end of this phase it's basically got a guarantee from every node that if it commits then that commit will succeed. And so at this point once it's got all those pre-write responses then the client knows that the commit is guaranteed to succeed. At that point it has to get another timestamp from PD and then it can finally send its commit message. And it only has to send that to one node to be the primary node of the transaction. And then it can return success to the user. And later on at its own time it's gonna commit all the other nodes in the transaction. So the reason I'm going into this kind of detail here is just so you can see that the interaction between kind of the client and the Taikevi server is non-trivial. This is not just like the interface to a dictionary like Andy showed at the start of the talk. It's a bit more complicated than that because we're in this distributed situation. And just to really kind of make this point this is the protocol buffer specification for a pre-write request which is something the client would send to a Taikevi node. And there's, as you can see, there's a lot of detail there. And so in the next section we're gonna see how the Taikevi client kind of abstracts all that away for you and just gives you a much nicer API so that you can concentrate a new application on the business logic and avoid all this lower level logic of transaction protocols and timestamps and regions and so on and so forth. So I'm gonna hand over to Andy now who's gonna talk about all that good stuff. Thanks Nick. Then in this section I will talk about how to use Taikevi. Taikevi client is libraries in multiple languages. In this list, there are Rust, Java, Python, C++ and Go. You can use any of those that fits your needs. In the following, I will show you the examples using the Python client. The clients in other languages are similar. Before that, we download the Python client with BIP using this command. Taikevi client supports two different interface, rawKV and transaction. They cannot be used together so you will need to choose one of them in the beginning. RawKV is usually fast, has less overhead than transaction, therefore performs in lower latency. Still, rawKV guarantees strong consistency but only supports single row transaction. Transaction interface adds an extra interior transaction interface adds an extra interior MVC layer on top of rawKV. Therefore, it supports multi-row atomic and natural isolation. This can be quite useful in OLTP scenario. So let's talk about the rawKV first. RawKV client provides scan methods for retrieving a value by the key. Scan methods for retrieving bunch of values by the range of key. Put method for updating and inserting and delete for deleting. Here is a brief example for using the rawKV. First, connect to Taikevi with the server IP. Then, insert a KV pair. Then, insert a KV pair by calling put method. Then we can use. Then we can use scan methods with K1. Then, use gate method with K1. We'll return value V1, which we inserted before. Scan method returns a list of key value pairs by the key range from K1 to the end. The scan limit is required to be explicitly dated to avoid flooding the Taikevi server with an unexpectedly large result. In transaction modes, the methods in the client class is moved into the new transaction class. By calling picking optimistic or picking pessimistic on the client, a new transaction will be created. Transaction has actual methods like get home update and not keys, which is the features for pessimistic transaction. And it is a little bit out of the topic of the days, so you can find a description in the document of Rothschild. Here's a brief example for using the transaction. A key to rawKV, connect the client to the Taikevi server by specifying the IP. Then, we begin the transaction. Put the KV pair of K1 V1 as usual and know that other clients will not see it until commit. Here we get the value by key. Finally, commit a transaction, and then all operations will become observable by other clients. Another property of transaction is that all operations between the beginning and commit is guaranteed to be all success or all fail. That's the end of the talk, so thanks for your attention. Just to quickly go over what we talked about. We talked about how to choose a key value store. We covered ROXDB, Readers, MongoDB, and then we dived a tiny bit deeper into Taikevi, which is a distributed and transactional key value store. And then we talked about how your application would communicate with Taikevi using either the raw or the transactional interfaces of Taikevi via the Python client. So, if you want to learn more, as I said earlier, Taikevi and all its clients are open source projects and the best place to go to find out more is to go straight to the source, which is on GitHub. So you can go to any of the Taikevi or client repos and find out more about how to install, use, build, or even contribute to the different projects. If you'd rather chat with other humans, then the Taikevi working group Slack is probably the best place to go. And if you'd rather read stuff in HTML format, then we have a website as well at taikevi.org. So thanks again for sticking around to the end and I hope you got something out of this, out of the talk. So thanks very much.