 We're going to talk about PolarDB in general. PolarDB is a database architecture for the cloud. I'm still not done. I used to work for Oracle MySQL. Now I work for other cloud, maybe. I provide different databases for users in the cloud. You can use MySQL. You can use MariaDB. You can use Postgres. You can use SQL Server and so on. And also MySQL. We want to build a database architecture that takes advantage of the cloud. When you are in the cloud, you can use special hardware because we are controlling the machines. For example, we can use all the infrastructure in the cloud to take advantage of that and support users in different ways with author scaling and author configurations and things like that. The basic principle of many of these cloud-based databases, you have Amazon has Aurora, Microsoft has Hyperscale and so on. They split the compute from this storage because you want to scale these separately. If you need more storage, you don't necessarily need to add more CPUs to compute. If you need more processing, you don't necessarily need to add a full machine with more storage and so on. This is the basic component of this PolyDB architecture. We have a proxy that is low balancing and is fairly low and stuff like that to the database. The database has a system that hides the distributed storage system. The database looks like it is using a local storage, but actually it is using a special optimize for database storage, which is called PolarStore. PolarStore is starting from the bottom. PolarStore is a distributed storage. It has multiple chunks where the data is stored. All these chunks are synchronized using parallel raft, which is a special variant of the raft protocol that has been developed as part of this project. All the databases have a special label that uses this underlying storage. This storage has to be efficient to a special RDMA network, which is a way where you actually remote memory looks like local memory in a way to the computer. This needs special hardware to get both low latency and high throughput. You have both obtained disks and SSDs in these check servers. The obtaining gives you very low latency. At the same time you can store and get a high throughput from the SSDs. There is some logic to calling this within the storage. Special hardware is used means that you need to have support in your data centers for this architecture. In the beginning it was only some data center in China that supported this. Now Hong Kong also supports this. The data center in Singapore will be able to have this hardware and run PoloDB. As I said, there is a special library on the database side that is used to communicate with PoloStore. It runs in pure user space, so it has less overhead than traditionally applied systems. It is easy to port your database to this architecture. As I said, one of the goals here is to be able to scale storage and compute independently. If you have a traditional MySQL system, you will have multiple replicas in addition to the master. For each replica you will add local storage. Here you have a shared storage so that when you add a replica you only add compute resources. The storage is still the same. You can see that the cost savings when you need many replicas gets higher compared to using a lot of servers with local storage. The replication in PoloDB is also physical compared to the logical replication used in MySQL. In MySQL, if you have logical replication, you will have the bin log, which is the one that is actually sent to the space. You will also write the leader log, which is the physical log that is used to keep the data consistent. If this computer cache, for example, you need a leader log to recover it into a consistent state. In PoloStore, for example, you just have the data under the leader log. The slave gets the data under the leader log from the shared storage. You get much less writing to this than you get in the traditional MySQL application case, where you have to force the bin log into a leader log. This is also an advantage, especially an advantage for dictionary operations. If you have in traditional logical replication, for example, you want to add a courier for some other table operations, it will first run on the master, and then the operation is completed in this log to the bin log. That's the time when it will be replicated to the slave, and then the slave has to execute this operation. The dictionary changes executing on the slave, the log behind it will be blocked. You will not do anything more on this table until it completes. But since we are using shared storage here, it's only the master that actually updates the data files. When the data files are up, the operation is completed, then you just update some metadata, and then the slave is ready to use the same version of the table. So you share that by using physical application, you avoid the latency issues if you have a traditional application. This slide, for example, uses add-column. Add-column is instant in MySQL 8.0, but there are other table operations that still have this characteristic. Also, we can't leave, all of it is 5.6 based, but we are working on calling it 8.0, so we will probably release later this year. Summary details here. With the traditional, you know, the execution of your MySQL, you have a set of transactions, and this transaction needs to be flushed to the review log before you can commit the report back to the client that this transaction has been committed. This efficiently, you use group commit, so you group transactions together, and you do one write, and then you can report multiple transactions as completed at the same time, instead of one write per transaction. In the parallel db, we share the review log, and the slave will pass the review log, and then it will hash on the page ID and put it in the divider between different workers, so that there are multiple workers that apply this review log, and that is very necessary, because on the server side, there are multiple threads actually updating the database, so if you use a single thread on this side, you will have problems keeping up. We know from MySQL application that was the situation until they added some parallelization, and what they can do today is that you can, as long as the transaction was in the same group commit, you know that they are not dependent on each other, so you can actually do that in parallel. But once you get to, but a transaction from the next group commit cannot be executed before all the others, before it has completed, in order to ensure that they are executed in the right order. But here we can have a much lower, higher granularity of parallelism, because as long as it's not on the same page, you can execute in the parallel. The only review log that is actually done is the one for pages that are in the buffer pool or are cached, because we don't want to read other, we don't want to read in pages just to read the log, because the time we will need actually need this page, but already have been pushed to this by the master, so you can save the work of both of them applying the log on the page. So, but what happens if you read what we call a page from the past, we say that the state of this slide replica is the last log that is applied to this page, to this replica. So, if we then read a page in from this, which has an order, order state, we actually have to find the review log that has been cached here and apply that before you actually allow the application to read this page. We also need, but that means that they have been piled up with the review log on the replica of not applied because the page is not in cached, so we need some way to garbage collect this log, so that the primary will send the checkpoint LSA, that is the log sequence number for the review log replica that it has, before that all have been flushed to disk, that it was checkpointing due, it flushed to disk every page that is up to a certain point in time. So, the primary communicate this to all the replicas, so they can garbage collect the old review log. MySQL is using multi-version concurrency control, so there will be cases where you need to read an old version of a page. To read the old version of a page, you use the undo log, which is a logical log that you read and then you revert your changes till you get back to the old version. In order to be able to do that on the replica, you must make sure that the master does not purge the undo log that you might need. So all the replicas will tell the master what is the oldest review that I will support. Make sure not to delete undo log records that are newer than this. This communication is done on the network, this is not shared communication, it's not something that is inherent in physical replication, it is the shared storage that makes it important to do this because what the primary knew when it's storage affects the replicas. So what happens if you read a page from the future? I mean, you have applied T4 and then you read something from disk where T5 has already been applied. The answer is that you never will do that, we will prevent that because the redo log, the physical redo log, there is no way of going back to the old physical version of a page. You have to distinguish the longitudinal version of the page that you use for multi-version concurrency control for the actually D3 structures that this redo log is reflecting. So we make sure by telling the primary what is the snapshot that we are currently on that it does not flush any pages that are newer than this to disk. So that's actually a change we had to do in ODB, to hold back flushing until the read to the replicas has applied the redo log. Note that the replicas are applying redo log all the time so there is usually not any issues with delaying this. So there is only one single master supported only. So the proxy will then split and send the leads to the master and the whitestone master leads to the replica and it will load balancing between the real replicas and things like that and should the master go down it will make sure to reboot request to the new master which will be chosen from one of these replicas. And we also support that if you have an application which first updates something commit and then select that it actually sees its own updates. Because when you do the update you return to the proxy the log sequence number at that time on this and then when they are re-descent the label replica is informed about this log sequence number so that it makes sure to apply enough redo log to be able to return a version of this row that is consistent with both the update or linearly they performed. But in the future what we are working on now is to support multiple masters so you can actually scale your updates beyond what one master is able to support. And you also will be able to use multiple availability zones for your database zones. So that's all. There are some more questions. How conflict detection happens for multiple masters? Okay so I don't have an answer to that. I did not, I did not counter work on that part and this was actually not me I was going to present this but I don't really know much about the future of the plans and how the conflict detection will be performed.