 Hello, everyone. Today's topic is Make TypeKB 10-Time Faster and Edge-Typeable. This topic is brought by Xiao Yunma and Li Chunpei, we are from Pincap. So why to make TypeKB Edge-Typeable? Consider if you want to analyze online data in real time. Usually we use different types of database for transactional processing and analytical processing workloads. Because these workloads are totally different. So for the database, the design goal is totally different. That means we use two different types of database. And we need to move data from TP to AP constantly by ETL. The data movement itself is very expensive, slow and hard to maintain. So usually people set Chrome type job constantly, maybe like one hour or like a day. And the job is responsible for moving data between different systems. And this is not very simple. You need to constantly watch this job to prevent failures or you to check data if it is consistent during the moving. And that becomes especially hard if the data volume is very big. So it's not only a question of complexity. That also means when you get a report, that report might be based on data from yesterday. You lost the data freshness. So that means if you want to read the fresh data or you want to eat the fresh fruit, the best way is to directly consume it from the source. So why system is separated? Why we cannot combine two system into one? Then we can nilate the moving data processing and directly read the data from the very beginning. So that's because of different database for different goals that use different design, especially the storage formats. So for a transactional processing system, we usually use a row format. So what is a row format? Row format means when you store a single each row, you store the column side by side. Consider this kind of query, select star from EMP where id equals to 7658. In the row format system, you can directly seek to the start of the row and do a sequential read. Then you get all the row data from that row. So that means row format is very good for these kind of workflows. You read very few rows and you process them and store it back. So that's the typical pattern for a transactional processing system. Let's consider another example. Select average age from EMP. So that looks like a reporting query. You want to analyze the average age for your employee. For that kind of query, a column format is more suitable than a row format. So what is a column format? A column format is a way to store your data that for each individual column, you store it tightly side by side instead of in a row bias. So that means when you answer that query for average age, you just need to directly seek to the start of the age column and do a sequential read, then you get all the data for age. Then you do an aggregation and answer the query. You don't need to touch any data from the columns that you don't need. Now you can see that's a very efficient way to read your data than the row format. But if you want to answer a query like a point query like where ID equals to 7658, you just need to pick up a single row for a column format. You need to seek three times and to find each individual column data for that single row and to combine it together, then you can have the row. So that means the column format is not very suitable for the transaction processing. The other problem you need to consider is the workload interference. For a TP workload, they might need to be very stable. Stable low latency and very high, very high transaction rates. But for a reporting query, the QPS might be very low, but they consume a large amount of their compute resource. So that means if you process those two kinds of workloads in a single system, the reporting or the analytical workloads might largely interfere your transaction workloads. And the transaction workloads is very fragile. You don't want this to happen. So how to make Type-AV-H typeable? In the new version 4.0, we introduced a new component named Type Flash into a Type-AV system. So what is a Type Flash? Type Flash is a real-time, updatable column storage engine. So the code base is partially based on Clickhouse. Clickhouse is a very famous open source project. It's been built from the Yandex. And we think data as a learner role from Type-AV were wrapped. Some of you might know that we use wrapped as a consensus algorithm to replicate the data in Type-AV system. So in Type-AV, we have different replications for each piece of data. And these replications are maintained by wrapped. And some of the replication are leader replication and some are follow replication. And for Type Flash replication, it's a learner role. That means the Type Flash replication will not vote. And that we're preventing Type Flash to interfere the stableness of Type-AV system. So the two storage engines together make Type-AV system an edge type database. And we can also access data via CBO. That means the Type-AV optimizer is a cost-based optimizer. And it can choose between column formats or row formats based on the statistics information. It will choose a path that which can provide better performance or lower cost or lower resource consumption. Thank you, Xiaoyu. Next, I'm going to talk about the architecture of Type-AV. This is the architecture of Type-AV. And we have Type-AV as a computation layer and Type-AV as a storage layer. The data in Type-AV are divided into regions. And each region contains a contiguous key ranges. Each region has multiple replicas and they are keeping consistent with the wrapped replication protocol. What we add here is a vector part, which is a Type Flash cluster. And there are multiple nodes inside Type Flash cluster. You can see that here we add a dashed line between the Type Flash nodes and Type-AV nodes. Which means that the data replication in Type-AV will not be affected if one or more Type Flash nodes are done. Next, I'll talk about how we do real-time updateable columnar storage. We design the columnar storage named the delta tree. And the key idea is that we split the data by primary keys and into blocks. The design goal of the delta tree is to avoid multi-way merge when we scan in batch. When new data arrives, the data is append to the dirt space. To optimize for the read performance, the dirt space is sorted and indexed. Periodically, the dirt space is compared into the stable states for better read performance. In this picture, we want to compare and show that why the dirt tree can offer better read performance compared with LSM trees. In the left-hand side, when you want to return a range from LSM trees, you need to read data from all the levels in the LSM tree and do a multiple-way merge. And this operation is surely very heavy. And the read application is high. In delta tree, you only need to merge data from the stable space and the dirt space. On top of that, we use a B plus tree to index the segment information. In this case, if you want to return data for a certain range, you only need to read data from a few segments. During the read, the dirt space and the stable space are merged. Compared with the LSM tree, this merge is a two-way merge, which has a lower overhead compared with a multi-way merge. Besides, the stable state is starting columnar flight, which offers much better read performance during scans. Next, I'm going to talk about how do we achieve the roughed-based edge type. We introduce a learner row to the roughed protocol. All the type flush nodes are learner rows. By learner rows, we mean that those nodes will not participate in the roughed leader election. No, they become part of the column during the data write. As a result, the writes in type A way does not need to wait for type flush. And type A way works normally, even if type flush nodes die. The replication between type A way nodes and type flush nodes are direct replication, and there's no intermediate channel between them. This replication is very efficient, and usually the latency between the in milliseconds. We also rely on the automatic load balance and fault tolerance features building type DB to make the data replication highly available. This is also an illustration of what I just talked about in the last slides. And compared to the replication in type A way and type flush with the ETL, this is very efficient. But usually during ETL, we need to copy data to a staging area and then copy the data to the data warehouse, which causes a lot of data to be copied. Another thing people usually ask is how do we achieve consistent read? Because replication from type A way to type flush is asynchronous, we need to guarantee the consistency during the read time. How do we achieve this? The idea is pretty simple. And we use the learner read algorithm and the consultant leader on the replication progress before we return the data to the client. And this actually guarantees we can achieve strongly consistent read. For example, consider the case that at time t0, there's a right to the type A way node. And later at time t1, there's a rate to the type flush node. How do we keep, how do we make sure that the data read at t1 is strongly consistent, and it can return the data written at t0? During this read, the learner will first talk to the leader and ask the progress of the replication in the leader. And in this case, it knows that the leader is ready at index 4. Before the learner returned to the client, it needs to wait until it replicates to index 4 and then it returns data to the client. This combined with timestamp and MVCC, we can actually achieve snapshot isolation in type flush as well. Next, I'm going to talk about performance. Here is an example of the benchmark result on the on-time data. So on-time data is a data collected since the 1980s about the airplanes. It contains the data of the flight number and the start time and the landing time. You can see from the result that the type DB plus type flush offers a very good performance compared with other solutions. Some of them are built for analytics and big data processing. These queries are multidimensional analytics queries, whereas type DB and type flush is good fit. If you want to know more about the queries look like, you can look at read about blog. Here is an example of the performance improvement we see from an early adopter of type flush. It's an internet company called Xia Hongshu. It is a type DB user and they have type DB running production. They selected around 400 queries from their production system and migrated them to type flush. From this graph, we can see there's 3 to 10 times speedup. And we can see that for some long running queries, the speedup can be up to 20 times. Also, I want to call out that this result is based on an older version of type flush. And since then we made a lot of progressives and improvements on type flush. And we can expect much better performance with the newer version of type flush. Thank you.