 I will not talk about progress is good or progress is good, I will not talk about which database is better, because if I tell you progress is better or MariaDB is better, it will of course start another framework, like VIM and VMAX, I do not want to start a framework today, this I will not tell you, but today I will cover some internal structure, the comparison of the internal structure of MariaDB and progressDB and I will tell you why this difference in internal structure affect our usage on this database, if luckily if I have time I will talk about the replication, how MariaDB and progress difference in their way of replication. Let's start now, B3, B3, I want to ask how many people haven't heard about this, keep this work, one only, not that, B3, before I go deep into the data structure of progress and MariaDB, I want to let all of people remember B3 first, B3 is a data structure of F3, nowadays relational database, why, because in modern database it is quite often we have billions of records in a database and we want to perform a very quick search by their primary key to locate the record and this we need a dictionary structure to support our operation and B3 is our, is the data structure to support this operation, why, because B3 supports concurrent update, for example, if I want to update record one, the database will perform page log on this record, on this data block, if there is another user, he want to update record five, the database will perform page log for this data block, this, it means that if different user, they are using different data block, they can perform concurrent operation to maximize the performance, secondly, B3 is a shadow tree, although our example, the example we use here, one data block store only, only free record, however, in modern database, the function factor is about at least 1000, between 1000 to 10,000, it means that if you have a B3 of 4, you can store 1 billion, 1 billion record with 4, this level, it means that to locate, even if, even your database have 1 billion record in a table, you need to travel for, you need to travel 1, 2, 3, 4, 4 steps to locate the exact record, which is quite high, which is quite efficient, when to pay attention to one technical keyword, balanced, I use balanced, not even distributed, balanced means that in a tree, every, every new look, they are in the same depth level, the definition is just that only, but balanced is not precisely equal to even distributed, for example, some, some new look maybe, maybe have a lot of record, but some new look may have just a small amount of record, I will not go too deep into this topic, but I want to, I want you to, if you have interest, I want you to Google it yourself, and then after that, everyone remember what is B3, I will introduce you, in this organized table, as he organized the table, table, the table structure, although, although most users do not, do not need to take care of what exactly the data structure is using by the database, I must, I must, what, I must say that quickly, in no matter in progress or in, in very deep database, they use, they, they use B3 to, to store their, the structure, the only difference is, by very deep database, they store the, they, they store the record directly in the, in the lift loop, but progress, they store the data in a, in another heap area, I will show you some slides to talk about it, here is a sample data, and you can imagine is user, users table, here is an example of MariaDB, the whole structure is a beta stream, you can see that the lift loop, the road data is stored in lift loop, and in an organized way, 1, 2, 3, 4, 5, in a very organized way, here is how many, very deep database, they store the actual data, however, in progress, this one is the beta stream, in the lift loop, only the, only the primary key is stored, and, and besides the primary key, the location, the location of the record is stored, you can see in, in the heap area, in the heap area, the data in unorganized is quite, is, you can see that is in a quite random way, but you, you will want to ask why, I, I will show you some use case to tell you why, why this structure, why this structure affect, affect our, our usage. MariaDB, almost, almost all texts for teach about MariaDB, they will teach you using auto, auto, auto increment as, as the primary key, but does anyone wonder why we should use auto increment instead of natural key or UUID, because, because of the internal data structure, if we use auto increment, and because, because the record is stored in a, in a, in an orderly race, potentially, if we use auto increment such monotonic increasing sequence, all the, all the new record will go to, will go to that data block, it means that if you perform in continuously insert, the database just create a new record, sorry, create a new data block for new record, create a new data block for new record, the insert performance is fast. And, another point is, in Maria, in Maria database, the secondary index, they, they store, they store the primary key, they store the primary key on the nift node. It means that if you use auto increment as, as the primary key, your secondary index will have minimum size because integer have only four bytes, but if you store natural key, for example, Peter, Tom, Alexander, you will have a large, you will have a larger secondary key, if you have any secondary key. Pogress, in Pogress, although many, many people, they switch from MySQL, MySQL on MariaDB to Pogress, they will continuously use, use auto increment or some sequence as the, as the primary key, but many people in Pogress, they will use natural key or UUID, why? Because in, in the role of Pogress, use concurrent, concurrency is our major, is our, is our major concern. If we use natural key and use UUID, usually it is in an unordered way. It means that all insert, all update, it will scattered among the, it will scattered among the B3. It means that different flag, they are unlikely to hit the, to hit the same different in the same time. It means that, so the system concurrency will be higher, but what it can be done that we translate to, it means that if we install better hardware, Pogress will usually scale up linearly. While, MariaDB or MySQL, they will, they will have some, they will, they will have some limitation of, of the scaling, scalability. Why? Because if you use, if you use MariaDB and you use in that organized table, all the, all the new record go to, go to this data, data block, right? It means that all the record, all the, all the insert, they must wait for a page log in that data block. And the, the, the update performance, the, the physical update performance of this data block will affect your, will affect your insert performance. No matter, no matter, no matter how many, how many habits you insert, you have, you, you have in your, in your system, the performance of this data block is, is the, is the final bottleneck. But in Pogress, because, because we use hip table, hip organized table and we use UUID, the, the insert is, the insert is scattered among this area and also scattered this area. If your, if your hard disk have high performance on random write, your, your concurrency will be, will be very, very high. Next, in Pogress SQL, in Pogress SQL, the secondary index they are not storing the primary key. Instead, it is storing just the location of the hip area. It means that, no matter, no matter your primary key is auto, is an integer or is a natural key, for example, Alexander the Great, it does not matter. It is, it is not related to the size of the secondary key. Okay, I want to talk about MariaDB range scan. For example, it's a common situation. Your boss asked that, Triton, I want to, I want to have a, have a sum up of the purchase, purchase order of last financial year. In, in MariaDB, how can you achieve this report? Usually, we that record can have millions of records. How can we do it in efficient way? We can, we can buy some means by, by, by guest or by, by some P record to, to locate the, the first rack, the first auto increment rack, the primary key of this financial year stuck. And also the, the primary key of this financial year end. And then we perform range scan. For example, the, the, the, the record with ID between 1, 1, 2, 3, 4 and between 2, 2,000. In MariaDB, the range scan is very high performance. I will show you later, but in progress, the range scan is, is no performance. I will show you later. For example, if I want to perform range scan on the primary key on MariaDB, for example, I want to perform range scan from ID 4 to 6. How can, how can we achieve it? How MariaDB handle it? It will, it will travel from the wood note and then, and then try to find out the number 4 record in this note. And then he just need to, the MariaDB just need to perform a sequential scan in, sequential scan in the hard disk to 4, 5, until he reach the last record, number 6. And then it's finished. Everyone knows we copy a large file is much faster. We copy 1,035 because, because hard disk is optimized for a large sequential scan instead of a, a big number of range random scan. This MariaDB have a very high performance because it involves a big range scan, a big sequential scan on hard disk only. However, how about in progress? If the, if the ring, if progress, ask them, think that your range scan involves only a tiny amount of record, but he will, he will only perform, he will look up the, he will look up the bt to look, to locate all the location of, of the, of the hip, of, of the hip state, for example, 4, 5, 6, and then random scan, random, random look up, sorry, random IO on, on this, this and this hip state, which is low. If there is a large amount of random IO, this, if in progress, if you, if the progress thing, you are performing a range scan on a large amount of record, it will directly give up this bt structure and perform a total full, full table scan on hip area. And this, if your, if your table involves 1000 years of record and you need to only farm out one, one financial years of record, it will quite slow because all records in it is involved. And this usually in, in, in the role of progress, if you want to use progress and you need to some, perform some reporting, usually you have two options. The first option is you build up master and master slave reputation and you perform the reporting on a slave, on a slave, or you perform some time-based, time-based data partitioning. It means that, for example, in, in our, in our, in our example, you perform data, you perform data partitioning and store the, and store different financial years records into different tables. But it involves a lot of tasks and it may affect your program. It may involve extra cooking in your, in your system. But I will talk about record update. In Maria, in MariaDB, because the record is stored in the NIF node, and because the NIF node is in a, in a, in an ordered way of structure, the new version of the, of the record must be stored in the exact location of, of the original record. Otherwise, you have to perform data, you have, if, if the, if the data block do not have enough free space, you marry their, marry their base, have to perform pay-speed. I would, I wouldn't go too deep about the, how, how pay, pay-speed or page merge affect the performance of MariaDB. But you can, you can Google, you can, I, I, I, I truly suggest it, suggest you use to Google it. And to avoid pay-speed and page merge, and if you are DBA and concerning our power performance, you should, you should read more about field factor, table field factor. It affects both, both my MariaDB and PogreSQL. However, in PogreSQL, because the record is stored in HIP area, and in a, in an unordered way, if there is any update, the PogreSQL just need to farm, just need to farm our, our fee, fee space in the, in the HIP area and then update the beach, and then update the beachy pointer to, to the new location. This, this, and this all. And this usually, for a large update system, PogreSQL will usually have a higher performance. Before I go to the next section, I would like to talk about more, about update, to maintain a high performance and also for the, for the beauty of benchmark. I'm not saying something right. Both MariaDB and PogreSQL, they do not clean up the old version of the record immediately. Otherwise, the performance will be slow. And in MariaDB, in order to cam the, in order to cam the fee space back to their base and keep them, and clean up the old record, both Maria, MariaDB and PogreSQL, they use some back, some after, after the task. In MariaDB, it's called back end purge fact. And in PogreSQL, it's called record. And I must, if you are not, if you are not DBA and if you do not know what is it, do not do, I said, I said again, do not up, do not change this configuration. Otherwise, you will have some serious effect. For example, data loss or slow performance. Luckily, I have time about, to talk a little bit about replication. How many time now? What time now? Oh, it's time up. Give me five minutes more. What is replication? In movie slot, everything can go down. It will go down. And sometimes they will put up some stack in the server rack to play for the server to not go down. But being a responsible engineer, you should not use this way. And we build a replication to relay all the, all the change in the master and the slave to keep the data between master and slave in a synchronized way. Even the master go down, we can immediately switch all the traffic go to the slave to remain the service online. This is the principle of replication. In replication, there is no major way. First statement, there is the replication. It means that all statement happening in the master is passed to the, to the, is sent to the slave and the slave just, just replay that replication. This strategy is used by MariaDB, MySQL. However, there is another way of replication called from, from based replication. It means that after you have run a statement, the database will record which, which area in the, in your heart this is updated. And then the database just send the updated area, the new version of the updated area, send it to the, send it to the slave. And the slave just use this, use the new data to read. Shit. Sorry. Sorry. Sorry. I apologize. And the slave, and the database uses new data to replace the old data in the heart disk to keep the data synchronized. Which one is better? It depends. In statement, in statement based replication, usually it has lower bandwidth requirement because the record you update one record or you update one million record, the database only send the compressed, the compressed statement between, from, from master to slave. And it, the database, the bandwidth requirement is usually, is lower. Usually, why is important? You can imagine if you are Facebook, if you are Uber, you have data, you have data center across the ocean, your bandwidth, your, your bandwidth charge is high and you, they have to minimum the, the bandwidth between different data center. However, what is the advantage of block based replication? It's no matter, if there is, even there is some big, big, big, big, even there is some data corruption between, between the network or there is some data corruption in, in your slave storage. When there is our, when, when there is update on master, such, such data corruption on, on your slave will be replaced by new data. And this, the, the, the data corruption in slave will not be replicated. If, as long as there is continuously update in, in, in the, in the, in the master. And this, if you, if you are bank, you need to make your data be 100% correct. Usually, usually block based replication is your first choice because, because your, your error will not be accumulated in, in the system. In conclusion, I will not tell you which database is best, which, which database is good. However, I, I do hope you to Google it yourself to, to assess your requirement. I, do you, do you want, do you want to maintain minimum architecture and do, and do the reporting on the master incentive instead of state? Do you prefer UI D or your, your application layer have mandatory requirement to use auto increment? Do you need to have application across the ocean? Do you need to maintain high, high data consistency or your data must be absolutely correct? And lastly, do you need to write complexity code? Currently, currently, my, my SQL and progress, they, their SQL standard is only, only remain in SQL 1999. However, however, progress is SQL standard is about 2008. It means that if you write complexity code, you need to write when you, you want to use window function for, for analysis. Sorry, in my way DB, you can't, you have to do it yourself. That's all for my today's presentation. I wish you, after this presentation, you can Google it yourself to further, to further learn about the difference between two database. Thank you. About the UUID natural key, if you use UUID as the primary key in your table, yes, the problems you mentioned will exist. But a lot of users, what they do is they flip the bits in the sense that UUID is seconds, hours, months and years, right? So you can actually write a stored procedure which will flip the bits. Oh, okay. And hence, when you do that, the years come to the beginning. And that allows you to have better indexes. Yes. And is a built-in feature in Oracle. It's called in-vend primary key. I do not accept it. Now, MySQL is providing that with a function called UUID to binary. So maybe that is a solution for that problem. The other thing about marking records which have been updated or deleted. Yes. That inodb calls it delete marking. It is not just for benchmarking purpose. The reason is that a lot of times, the way DMLs work is that they delete a record. But within a few seconds, they come back and insert the same record against. Okay. So it would be nice if that record totally went away and a new record came in. But inodb does something even better where they essentially delete marker record, which they say they have a bit that says this has been deleted marked in the future. But within a certain time before your push threat comes in, if an insert happens, it doesn't have to do the full insert. It just needs to go and mark that record. It is a joke only. It has a delete line. There's a purpose for it. That's what I was trying to say. And yeah, I really like your presentation. Thank you. Any questions? You're running late. We're actually even into the next presenter already. If you have any questions, please take it offline.