 Hello everyone, I'm Tanex, I work at PINcap company. The PINcap company is the company behind Tidybee, become the Tidybee behind Tidybee. I think some of you might heard about Tidybee. Tidybee is an open source distributed HTAP database, that is inspired by Google F1 and Spanner paper. PINcap is also the company not develop the Tidybee and Tidybee and ChaosMesh project. Tidybee and ChaosMesh have donated to CNCF, and last month Tidybee has just graduated from CNCF. I am the Technic of PINcap Cloudy, and I'm the Core Developer of Tidybee Operator. I'm currently I'm leading the Tidybee Cloud Dimas Development. In order to run Tidybee on Kubernetes, we have started a Tidybee Operator project. About four years ago, in the early stage, databases, in the early stages, databases, especially distributed databases, are not well supported in Kubernetes. During the past few years, we have gained a lot of experience to run high-performance databases on Kubernetes. So in this talk, I will share our experiences to run high-performance distributed databases on Kubernetes. First, I will introduce why we choose Tidybee, why we choose, well, first I will introduce why we choose to run Tidybee in Kubernetes from the beginning. Then I will share the performance best practices on Kubernetes. Kubernetes has become the de facto container of our distribution tool. It is the distributed OS in the tech world. It dominant, it runs everywhere in IDC private cloud or major public cloud. And even can run on our desktop. I will say IoT devices. Now, more and more applications are delivered as Kubernetes packages. There becomes a programming model. The Kubernetes-oriented programming. In fact, some engineers even call themselves Kubernetes YAML programmers. It's very funny. Distribute databases, in our opinion, distributed databases and Kubernetes are the perfect combination. First, the Kubernetes provide standard abstractions for easy scale. The workloads on Kubernetes are called POD. And to run multiple PODs and to run cluster PODs, we can scale POD by silver set and deployments. These PODs deployed by silver set and deployment can be distributed across all nodes. So it's very natural to run distributed applications on Kubernetes. Another thing is that the benefits Kubernetes brings is that on Kubernetes, we can distribute configurations by config map to all nodes. And another thing, another benefit is that nowadays more and more applications have their operators, the operators. With Kubernetes powerful extensibility, we can program with Kubernetes using the operator pattern to automate the database operation. The operator program runs in the background forever to automate the operation of the applications, the databases. We also net the operator to handle the failover of Tyvee to keep the databases highly available. First, let's take a quick look at the Tyvee system. Tyvee is composed of three core components, that is PD, Tyvee, and Tyvee. PD stores the cluster metadata. It also scatters the database data stored in Tyvee. Tyvee is the storage layer. It splits data into 96 megabytes. We call it a region. And each region is replicated using roughed algorithm with three replicas. And Tyvee is the SQL layer. It is the state that is SQL layer. It is the computing layer of the cluster. And it handles my single client request and it translates to the KV API and distributed API to Tyvee. I believe that most of you might be very familiar with Kubernetes architecture. I put it here to many... I put the architecture here to in comparison with Tyvee. I will just skip it as you are already familiar with this. And after saying the Tyvee and Kubernetes app architecture, we found that it's very complicated. And you might ask why we run this complicated databases on Kubernetes. These two complicated systems met together will they become over complicated? Well, the answer is yes and no. But I will not answer it directly here. I will return it back in the later slides. Also, the another question raised by many of users is that will Tyvee and Kubernetes have poor performance? Because the users think that the Kubernetes has very complicated abstractions. We have the container virtualization and the overlay network. Yes, the network, these virtualizations does affect the performance for Tyvee and especially for the overlay network. In theory, it will increase the latency for Tyvee. So let's say how we can improve the database performance running on Kubernetes. The cloud computing is mostly handling these three aspects, networking, storage, compute, Kubernetes and containerization technique also handles these three aspects. They have their own abstractions. Let's discuss these three aspects one by one. First, for the networking, Kubernetes is the famous for this complicated networking. In distributed databases, one query becomes several RPCs. And with Kubernetes, we can see that there are a lot of problems with Kubernetes complicated networking. This will as extraordinary as a significant performance overhead. Well, we've done a detailed benchmark on GKE to compare the network overhead for the Kubernetes networking. And we found that the overall latency, overall performance decrease is about 7%. Actually, this is the VPC native network plugin, which is provided by the public cloud vendors. So for best practice, it's better to run database ports in a dedicated node if you are already using VM because VM already has a virtualization for the network. We do not want to introduce another virtualization for the network layer on Kubernetes. And besides, we can also use host network for database ports if possible. Actually, we run this mode on public cloud. On public cloud, we choose the right instant type to run a tightly powered and each node we only run one database problem, sorry. So we can enable the host network. There's no conflict with this setup. We've gained much better performance. Kubernetes storage abstractions is the persistent volume, the PV, from the beginning, Kubernetes does not support the local persistent volume. It only supports the network disk-based performance, the persistent volume, such as AWS and TCP PD persistent disk. From the architecture overview of TIDB from previous slides, we've said that in TIDB system, each data has three replicas. And for the network disk, there are usually at least two replicas. If we run TIDB on the network disk, each data will be replicated, at least six physically. This would actually introduce a node of latency for database. So, NACRI, Kubernetes 1.10, introduces the local PV and it becomes beta in 1.10. In fact, the local PV is ideal solution for distributed databases without the overhead. In public cloud, as many of you might know that on public cloud, local disk is framework storage. It will be wiped out when the instance is terminated or some unexpected problems happen to the VM. So, you might worry that is this suitable for database storage? From our experience, we can actually use this local disk since distributed databases usually have fault tolerance for node failures. With this assumption, we can even have aggressive mount options, no burial, discards, and no attack. Actually, these three mount options are usually not recommended. For example, red hazard document states that the negative performance impacts of right barrier, no barrier, is negligible, that is about 3%. So, the benefits of right barriers overweight the performance benefits of disabling them. So, they recommend that especially they recommend use barrier for the VM in VM, but actually after some benchmark on GCP with GKE, we found that with no barrier option is enabled. The performance of local disk will gain not bad performance compared to with barrier. So, does this these three options cause data and safety? In our experience, no, because when this special scenario happens, the VM usually won't, the VM usually have bad behavior that needs to be terminated. So, enable these three options would not cause any, would not increase any other increased extra risk of data loss. Let's talk about the compute. As we all know that Kubernetes has QoS, the quality of service. That is best efforts, first of all guaranteed. In the in the C group overview, we can see that different QoS have different C group layer. The guaranteed part is above the first and the best effort part. That means that it will have the width of the part is the CPU weight of the part is different. The CPU shares the the C group hierarchy actually different from the CPU shares. The CPU share is computed as the CPU request of the part. So, the best practice is to set proper CPU request. However, we should we should know, we should aware that the CPU request that the CPU shares is in hierarchy that even with the same CPU shares for the guaranteed part it will in guaranteed part the same CPU share will gain more CPU time. So anyway, we should set proper CPU request for our database part. Another CPU management is the CFS the CFS quota and the CFS period our applications you already set CPU limit. It is translated to the CFS quota US as recommended as recommended that for mission critical applications we should configure guaranteed QS that is the CPU limit and the CPU request are the same. However, after setting the CPU limit for the time part we found that the performance decreases even when the CPU usage is far behind the CPU limit the part is still slotted in fact there is the issue for this the best practice is do not set CPU for database part. It can be achieved by not setting the CPU limit or configure it to disable the CFS quota US globally. Another thing is we can tune the CPU CFS period US this would help to increase our this would help to improve the performance for I will bound applications in our TIDU cluster is PDE PDE is not CPU intensive application so we tune it to set small value the default CFS period US is 100 millisecond we configure to smaller values to have better latency and for CPU intensive applications like TIDU V and TIDU V we set it to larger value because frequently switching contacts would harm the performance another thing about this another parameter about the CPU management is the CPU set with the Kubernetes supports a CPU manager policy. By default the CPU manager policy is set to none and we can set the none means that the CPUs will be shared by all different applications so with CPU intensive applications like TIDU V switching contacts switching the task from one CPU to another CPU would harm the performance with the CPU manager policy set to static we can set guarantee the QS to make sure that these CPUs now the special CPUs are only used for TIDU V part as we stated that it's not recommended to set the CPU limit for the guarantee the QS but setting the static CPU manager policy the CMS quota limit doesn't apply to this static CPU doesn't apply to this static CPU the TIDU V part will exclusively use these CPUs the Kubernetes will not schedule other parts on these specific CPUs so it can so in practice we should enable static CPU manager policy on bare metal machines note that on public cloud the VM is already virtualized the CPU is already it doesn't make any difference to enable the static CPU manager policy it's only used for bare metal machines and we also recommend to configure guarantee the QS with initial CPU request because with these configurations and static CPU manager policy the Kubernetes will practically assign specific CPUs for the part these kernel parameters are from and these kernel parameter tuning are learned from our users production deployment different databases have different requirements for the kernel parameters these are listed here in the reference and since database is the core part of your applications you should treat it especially so you should configure specific nodes of node pools to run databases not sharing with your applications though Type-E is an open source project and we also provide many tools to help users manage Type-E on their own from a previous slide we can see that running high-performance databases distributed running high-performance databases on Kubernetes is complicated and very challenging especially if you want to run Type-E in Kubernetes there are many pitfalls in containerized words return back to complication problems mentioned in previous slides we now provide fully managed Type-E service Type-E cloud we provide expertise to support 24-7 hours so you don't have to worry about the database maintenance you just let your business net the actual expertise net the expert to manage your databases on the cloud Type-E cloud provides well-tuned high-performance MySQL compatible distributed databases when your business goes you can just scale the Type-E cluster with a few clicks on the web console you can also use Type-E cloud to do edge-tap work nodes that is the hybrid transactional analytical processing now you can apply for free about this web page thank you