 Hello everyone, I'm Yuping from Pincap. Today, Shunning and I are going to share how we brought elasticity and resilient mountain tenancy to TigerV in the last year. We will cover our approach and its benefits. This sharing will consist of four parts. First, I will briefly introduce what TigerV is. Then, I will describe why we need mountain tenancy and what challenges it brings. In the third section, Shunning and I will share our solutions to address these challenges. Finally, we will share some thoughts on the future. Okay, let's start from what is TigerV. TigerV is a graduated project of the CNCF. It is a highly scalable low latency and easy to use distributed key-value database that provides both raw and ACID transitional key-value APIs. It serves as the storage layer of TigerV, a transitional database widely used in online serving services, as well as the metadata storage system for object storage services. TigerV supports ACID transitions, replicates logs using rafts, and uses range partitions to spread and merge dynamically. It also supports co-possessor for circle operator pushdown. If you would like to learn more about TigerV, please visit TigerV.org. Okay, now let's see why we need mountain-net tenancy. Mountain tenancy is required to address several issues on large TigerV clusters. Firstly, there are a number of different services on the cluster, and mountain tenancy is needed to isolate and guarantee the QS of different groups of services. Secondly, compared to having multiple TigerV clusters, mountain tenancy can share some infrastructure such as control panels, mattresses, and alarms to reduce causes and simplify cluster maintenance. Finally, for some customers, mountain tenancy can meet their requirements to use both transition KV and raw KV in the same cluster. Mountain tenancy presents significant challenges that must be addressed to ensure the elastic and resilient operation of TigerV. Firstly, TigerV must be able to handle data volumes ranging from GPEs to PBs. Additionally, TigerV must be able to support millions of tenants in a single cluster, each with its own unique storage and processing requirements. Secondly, mountain tenancy requires to isolate and reduce the last radius of failures. With multiple services running on the same cluster, a single failure can potentially affect multiple tenants, resulting in a large last radius. To mitigate this risk, TigerV must provide strong isolation mechanisms that prevent failures in one tenant from affecting others. Additionally, TigerV must be able to recover quickly and without data loss in the event of fatal errors and disasters. Now, let's discuss our solution for achieving elasticity and resilience. Our first solution is to use key space to isolate different tenants. To isolate data between different tenants, we use key space with specific key modes. Regarding the TigerV mode, we use the key preface M for TIDP metadata. For table rows, we use the key format T, table ID, R, row ID with the corresponding column values. In the case of index rows, we use the key format T, table ID, index ID, index columns value, and row ID with no corresponding value. Moving on to the TigerV mode, for transition kits, we utilize the key preface X for transition kits. For row KV kits, we use the prefix R. Then to isolate different tenants, we use key space to prefix user kits. Previously, we use the mode prefix followed by the user key and the test step. Now, the keys of row KV, transition KV, and TIDP will be encoded with mode prefix, key space, user key, and test step. The key space is a fixed length of 3 bytes in net by order with a measurement key space ID of 16.7 million. The test step of row KV entries is used to implement the change data capture feature to indicate what and when data is changed. To manage key space, we add key space management capability in PD, which includes a location of key space ID by key space name, creation of a new key space by key space matter, updating of key space matter and configuration, management of key space life circle, and retrieval of key space matter. Finally, this diagram provides an overview of the data isolation achieved through key spaces. In the next session, if you introduce microservices related to PD. Shunning, over to you. Okay, thanks. And I wanted to talk about the microservice in PlasmaDriver. PlasmaDriver is a critical component of the tech wing. Microservice is an architectural style that structure an application as a couple of nosy couple services, which are small and independent in natural and can be deployed independently without affecting the whole system. Now, let's dive into the details of how this architecture is being used in PlasmaDriver. As we know, tech wing is a distributed key value storage system that is decided to be scaled by adding more tech wing nodes to the class. However, the PlasmaDriver might be a botanical in scalability in the very large sales class, such as we found some user saying that the tech wing can be scaled by scaling the tech wing nodes, but PlasmaDriver cannot. And the PlasmaDriver provides many services and there are too many drivers to handle it. And they also concern about stability and high availability of the PlasmaDriver. To address these challenges, we have deep split the PlasmaDriver to multi-services in Taijipi Cloud, which allow for better scalability, photo tolerance, and better performance. First, let's look at the services. There were many services that were provided by the PlasmaDriver. There is a service including the MAT data service, the time stamp or Oracle service, the scheduling service, and the allocation service. All of them were combined into a single component. This made the PlasmaDriver be a single point of the value in the tech wing system, which was a big concern of the users. Let's look at the table. The TSO service was responsible for generating and assigning the time stamp to the tech wing transaction to ensure the data consistency. It's very critical and requires more resources once there are many transactions in the class. The MAT data service in PlasmaDriver was responsible for managing the class data, including the node information, region information, and the other system information. It's also critical. Some services relied on it, and also it requires some resources to handle such as the request for the tech wing hardware. Another service is the allocation service. It's a neat same as the time stamp Oracle service, but it's not required by user side. It's internal. So the resource requirements are less. There is a scheduling service that is a response for scheduling the tech wing resource, including the balance region distribution and node balance, but it's not affected the tech wing transaction or valuable. So there are some services that were not critical to the tech wing, but they were combined into a single component, which made it difficult to scale out the critical functions. The macro service architecture in PlasmaDriver has allowed it to isolate seeing this critical service into a separate service. Here is an overview of the micro service in PlasmaDriver. As the picture shows, the PDE has split into multi-service. It's a service responsible for a specific function. Actually, the service we already introduced before, it's needed to be noted that there is a storage service in here. The storage service can be HCG classed in here. It's responsible for storing the data of the other service in the PDE. The other service can be looked at as a stateless. And by the HCG API, the all other service has the same higher web IP mechanism. Next, I will mainly introduce the TSL service and the metadata service. The TSL service in TechQA is responsible for allocating monotonically increasing time stamps for all the transactions in the class. This ensures that the transactions are executed in the consensus order. The TSL service is an important part of the distributed consensus like TechQA. And they ensure that all the nodes in the class have a consistent view of the time. After we split the TSL service as the picture, it's used as an underline storing engine to store the TSL metadata to ensure phantom torrents and higher availability. The TSL service is required more than two replicas where one is designed to the primary and the other is the secondary. The primary TSL is responsible for providing the actual TSL allocation and the metadata possessed to the ITCD. While the secondary TSL nodes we are watching the need key of the primary TSL nodes in the ITCD and also ready to take over if the primary TSL nodes have a final. The TSL nodes also use the election algorithm for the ITCD to ensure that only one primary TSL nodes to provide the TSL service. The TSL service is critical to the cost and required high performance to ensure the Nazi throughput. Separate it for the placement driver, make it easy not to impact by other services running in the placement driver. Moreover, the resource contention is reduced since the TSL service can only use its own resource and the wrong time overhead is reduced. The result in the performance is improved. In addition to split the TSL service, we also put it in for different talents as shown in the picture. Each talent was assigned to a group and each group has its own TSL service with its own metadata stored in ITCD and to provide the TSL service for each group. This partition allows us to achieve the scalability to reduce the failure risk, which means that if there is a failure with one group, TSL service, it won't affect other talent groups. Another important service is the metadata service. The metadata service in the placement driver collects the metadata information for tech weeks or the hard beta message. The information including the data about the tech week notes such as the IP address, available disk space, status, and the region information. The tech week aligns into the metadata for the placement driver to route the property tech week notes for access. And also client will capture metadata to reduce the access to the placement driver. The metadata service is also a single point of financial and bottleneck for the tech week class, as it grown in bigger size. By splitting it into micro-service, we can increase the operating meter and scale the metadata service for each talent. With this approach, we can evolve it into more scalability service and reduce the impact on the whole class. Same as above about the TSL service, the metadata service in PD relays on FCD for possessing data. Tech week notes send the hard beta to updates, the metadata information. Those metadata information is used by placement driver to route the client request to the tech week notes. And also, we introduce a common neighbor that is a metadata informer, which can list and watch the region and store metadata updates by the talent. Each talent can use the metadata informer in the client side to maintain a cache in local. So even if the metadata service has a failure, in a short time, the cache can provide the routed information to access the data. Moreover, in the failure time, the metadata informer can also track the region information for the tech week notes directly. So it can update the metadata information for tech week directly. So the availability of the service, of the metadata service, can be guaranteed. That's all of my part. Thanks. Okay, thanks. Now we come to the last section about future outlook. As we move forward, we aim to provide five grain control of resources for tenants, including CPU, memory, disk capacity, and throughput. As we also plan to offer service based on multi-tenancy, which will provide a virtual cluster for every tenant. This will enable users to start or stop the cluster in seconds and to automatically scale in and out. This improvements will enable more efficient and flexible use of tech week, as well as making it more accessible to a wider range of users. In conclusion, we have discussed the key aspect of tech week, the significance of multi-tenancy, our solutions to overcome its challenges, and the future possibilities. Thank you for your attention. Goodbye.