 Good afternoon, everyone. This is Mayank Prasad. I am working as a developer in NODB development team in MySQL. And today I'll be talking about what Neo we have introduced in NODB in MySQL 8.0. Okay, so this is the safe harbor statement. It says whatever I'm presenting over here is for informational purpose. Right. So this is the agenda of today's talk. So it's engine and how it fits into MySQL architecture and then we will jump to the latest features we have delivered in 8.0, MySQL 8.0 and how these features have helped us to improve the performance and scalability. And then question if you guys have any. Right. So let's go. So this is the MySQL server architecture. So the orange box you see in the center is the MySQL server which have multiple components and there are multiple clients or the connectors using which the applications are connected to MySQL server. And inside this MySQL server we have this pluggable storage engine API interface. So basically anybody can implement this storage engine API interface and plug its own storage into MySQL. We have multiple implementation of these APIs like we have archive storage in them, we have memory storage, performance, etc. And one of them is NODB, which is the default storage engine for MySQL. And this NODB is the guy which is responsible to take the application data and store them on the disk and take the data from the disk and present it to the user or the application whenever it is asked for. So let's go to the new features in MySQL 8.0. Okay. So this NODB storage engine is a traditional storage engine. As I said, this is a default storage engine for MySQL 5.5 and it is asset compliant. This NODB storage engine provides roll-level locking, it provides crash recovery, it provides MVCC, multi-view concurrency control, and then it has foreign key differential integrity constraints also. And as I said, different storage engines are used for different purpose. And NODB is the default guy in MySQL. So let's go to the new features. In MySQL 8.0, we have been working for quite some time and we take feedback from the community and we take feedback from the customers and understand their pain points and then decide upon the features which we are supposed to put in NODB. So in 8.0, we have worked on more than 67 work log of features and everybody has their, or each of the features has its own importance and complexity. And but because of the constraint of 25 minutes, I have tried to pull the most important one and which are more significant to the customer. Let's see. So the first thing which we have important one in NODB in MySQL is the data dictionary. So data dictionary has the metadata of the tables stored inside it. So before this feature, the data dictionary was spread across at multiple places. So every table which is there in MySQL has its FRM files on this. So these FRM files keep the metadata of the tables like the column names and the column types, etc. And then the NODB or the storage engine have their own copy of data dictionary. So at all in a DDL is happening, which is causing the change in the data dictionary. So that both of the places have to be updated and should be kept in sync. And which was a very difficult and tedious task. And it was not crash safe also. So I have put down a link here for the listed issue, which we have listed. What issue a customer can be thinking about. For the data.0 is we came up with a single source of truth, a single place where the digital information is stored is DDTables. And these DDTables are part of DDTables space, which is stored in MySQL. So now, because we have single source of truth, there is no need of making it's making multiple places to be in sync and no more FRM files on the disk. And because we have a single place, we have introduced a new locking mechanism, which is metamorphosis, in which we can update this dictionary information. And then we have, so somebody might have a question, right? I have the table in 5.7 or I have database in 5.7 and you say you have new dictionary in 8.0. So how is the upgrade? So we have made sure that upgraded smooth when the user is upgrading from 5.7 to 8.0, the information is ported into these DDTables and at the end of the upgrade, the legacy, you know, DDTables are dropped. So we have single data dictionary and MySQL itself are running with that. Then apart from that, we have introduced something called serialized dictionary information. This is basically the redundant copy of the dictionary, which we embed inside the IBD file itself. The main purpose is to make the IBD file self descriptive. And whenever we do any DDL, which causes the dictionary data to be updated, this information is also updated. In the table space file, we have a specific page where we store this information. And how it could be used, I will explain in a couple of slides. Okay, and then we have introduced an IBD to SDI tool. Using this tool, I use that can see what is the data dictionary information stored inside an IBD file. Let's see how we can use this IBD to SDI tool in practice. So let's say I have a table created test.t1. It has two columns ID and name and ID is a primary key. And I run this IBD to SDI tool on this table. Table table space file, which is T1.id. And this is what I get. I get, okay, this is this is a table, which name is T1. It is created in MySQL 8.0.19 version. The time when it was created, the last time it was altered, it belongs to schema test, it has the column name ID, etc. Right, so where we can use this SDI information. So let's say we have a MySQL server, but somehow, some unfortunate incidents caused us to lose the DD table space, which has the data dictionary information. So now we are left with an IBD file, but we don't know the metadata of this file. But because we have this SDI information embedded in the IBD file itself, so we can use this SDI tool to get this metadata information or the SDI information out of IBD file. And then using this SDI information, we can import this table space file into another MySQL, stunning MySQL server and use that and use that table metadata, for the table data. And the import using this SDI is in work in progress. So we're working on that. The next thing we introduce in MySQL is DDL log table. So this table is an internal table in MySQL. It resides in DD table space. And the user DML and DDLs are not allowed on this table. But the main purpose of this table is to basically store or keep a record, keep a track of physical operations which are happening on the disk when a DDL is running. So let's say when you are running a table, so a new table space file is being created on that disk. So this DDL log table will record this information that the file has been created on the disk. The main purpose of this DDL log table is all these physical operations which are happening on the disk when the DDL is running. And if at all there is a crash on MySQL server during the deal, these physical operations are not, we are not able to roll them back just because because we don't have any information about those. But with the help of this DDL log table, this information is persistent on the disk. And when we restart the server, we consult this DDL log table and we see, oh, there is an orphan entry left or an orphan physical file left on the disk. So we are supposed to get rid of that. And when the server is restarted, we have MySQL server without any orphan file left or any file left behind. And the next thing which we have is the instant DDL, the add column was, so this was a very big pain point from the customer point of view. So if customer has a very big table and it wants to add a column in that table. So because of the row format dependency in my, in NODB, each and every row has number of column information embedded in it. So the number of columns are changing, each and every row is supposed to be modified. So when we do add table, we basically rebuild that table completely. And if a table is big, it takes a lot of time. The more the table sizes, the more time it takes. And as I said, as I said, we are doing full table rebuilds, we can get a lot off of the whole, so a lot of IAM. So this was very tedious for the customer. So what we did in 8.0 is we came up with this algorithm is equal to instant. So now what we do here is instead of going to each and every row and modifying the metadata or the header of the row, we keep some information into the data dictionary. And using this information, we present the user row data in the format which should be after the add column has been done. So it saves time because no row is being touched. It saves spaces because no copy of the table is being made. And it's space resources also because there is no, not much iOS happening. So let me give you a pictorial view of how this thing works. So let's say we have a table in which there are multiple rows. And now on this table, we are doing alter table add column with algorithm is equal to instant. When this thing is happening, what we are doing, we are just storing some metadata information into data dictionary. And we are not going to touch any row on the table. So that's how it is instant. So irrespective of the size of the table, table is too big, then also it takes the same amount of time, table is very small. It takes the same amount of time, it's an instant column. Okay, so we have a few limitation in this instant BDL or add, add column implementation. So as of now, the column can be added only at the last. So only the last column can be added instantly. And only the dynamic compact and redundant row format are support are supported. And if the table has full text index, the add column is not supported there. And the table, the really table spaces tables are not supported, which is obvious because they are not the user tables. And we are working to improve it and we are to get rid of limitations. So stay tuned, we might have something coming up in future very soon. And then we have, I don't know why this presentation went back to me a second. Okay, so similarly to add column, we have rename column, which also has dependency that when the rename column is done, it was causing the table to rebuild, which doesn't make any sense because rename column is just changing the name of the column which I said, data dictionary information. So we implemented that also in a dot zero. So now the name column is done with algorithm is able to instant. And our aim is to make a more and more detailed to be instant. So stay tuned, something new might be coming in your future. And on the security front, we have encryption implemented in a dot zero. So now the redo logs, which are the user data, they are encrypted now, we have introduced to global variables, you know, to be to log encrypt and in order to be under log encrypt using them, we can encrypt these logs. And then we have a board. We have, are you guys able to see the presentation? Anybody can say? Yes, indeed. Oh, okay. So we have implemented and for chair tables is also the table spaces can be taken encrypt tables or later point of time they be made encrypted with this single SQL command. And then the double light buffer, which is nothing but a mechanism to make sure there are no torn writes on the disk. That is also encrypted now. So if we have a table, which is encrypted, and when we are flushing a page of that table on the disk, the double light buffer, which is getting the page of that table is also encrypted. Now, the undo log, okay, so the undo log also we have made improvement here. And also an undo log was part of system table space. So when the undo log increases, it takes more and more space on the system table space. And it was not possible to shrink the system table space when it goes very huge. So what we did in a dot zero is we moved undo log out of system table space now. So now they have their own entity on the disk. And we provided SQL interface also to manage the undo logs. So if user wants to have more and more undo log, it can create them using create undo logs. And if you want to see want to drop the existing undo logs, they're not more used, it can drop them. And the truncation of undo logs is also happening automatically behind the scene. So and by default, the two undo log table spaces are there. And when the server is running, at least two undo table spaces should be there. The next thing we introduce in a dot zero is the dedicated server mode. So this is for those customers who have the sources, but they're not sure what should be the value, what, what the value should be set to the buffer pool size and the little file size, because it's a little file size and the buffer pool size plays very important role in the system for port or PPS. So what we did here is if the user sets this any of the dedicated server is equal to on. So my skill automatically calculate the best size for this buffer pool and the little file size based on the physical memory available. And then it can turn that. So here is the algorithm to calculate this value. So one is the for the inodiver for full size and the next is for inodiver log file size. So physical memory into concentration and based on that set the values. Okay, the very important thing which we introduce in my SQL 8.0 is the parallel scan of table. So earlier, when the tables is being scanned each and every row of the table is being scanned through a single thread. And it was, it might take some time, but now what we did in 8.0 is we divided this entire range of the nodes of the tree into multiple chunks. And we have one thread for each chunk, which is reading the data. So this scan is happening in parallel. We have this configuration variable inodiver parallel thread count, which is which can be set from 1 to 256 default support. But these many threads will be used when the scanning is being done. And as of now it is working only for the cluster index. And we have already few scenarios where we are using this parallel scan. One of them is when we are doing a select count star to report the number of rows in the table. And then we have we have add index mechanism in which the cluster index is being scanned. So then also we use parallel index. Let me explain in the next way how we use it as index. So whenever we are doing an and index We do this through the index entries and then we have to put those results onto the files and then based on the key use of the expected which is being added and then put these reports into the tree. So everything was done in a serial order but now cluster index has been done in batch into a simple tree file this recall and at the end when we are loading the tree only that is being done in single thread. And now the work in progress to that part also includes also making these records and then into a single tree is also being done in batch order. So this was the idea of doing the research by the research in the University of Malaysia. The idea is so now in my database collection coming the resources allowed in first service is always asking for the resources. With this GAT idea we give the resources the transaction on which the transaction are waiting. So let's say the transaction one for which the five transactions are waiting and the transaction D2 six transactions are waiting and comes first and D2 is behind D1 but two will get the resources to make the more and more time can be finished and the system throughput can improve. So the analogous I give is the cab driver and the first driver. So let's say the cab driver and the first driver are the best for this integration and the cab driver will have over 70 passengers. Both are the best for the copy on the way. The cab driver ask for the copy first and the driver is standing behind it. The fee for the driver will get the copy. But GATs because the driver is taking the passengers so first driver will get the first and the cab driver will get the later. So that the driver can finish the tree and then go to the driver with 70 passengers. So there is no contribution to be done by the driver internally for people to automatically more and more continue the system with GATs to make the system throughput. And here is the performance number which we have got from GAT implement. Right. So the way the current blob the storage design is if you want to access the last fight on the last blob page you have to go to the cluster index record, get the reference and go to the first page and scan through the entire blob and go to the last page. So when I say scan through the entire blob that means you are bringing all the blob pages or blob pages into the pop-up pool and then going to the next page. So this was very IO consuming. So what we introduced in 8.0 is we introduced the blob index pages in between. So there is an indirection which we have introduced. So now with this blob index pages are storing the index entries for the blob pages. So we say okay from this offset to this offset for this particular blob on which page this data is stored. So from the cluster index record we take the reference to this blob index page. We go to that page, read the entry and then we see okay for this particular offset the data is stored in XYZ blob page number. So we bring only that page into the pop-up pool. So only the blob index page and that page we have been into the pop-up pool. So IO is very less. So this is the performance number which we have got after this change of storage architecture of blob data. So for 500 thread we have gone from 10,000 per second to 110,000 per second. And I've also put the blog link here just in case if somebody wants to get more details about this feature. And then similarly we have this compressed blob design also changed the way we stored it on the disk. So earlier when we are storing the compressed blob on the disk we are taking the entire blob data and compress them together and store them on the disk. The problem is if you want to modify the last byte on the last page or any byte on the any page you have to bring the entire stream into the disk and uncompress it and go to that particular page and then modify it, recompress it and store it. There you are you can do it. What we did is we did something similar. We introduced the Z-Lob index in between. So what we did is we divide the entire blob data into multiple chunks and each is compressed separately. And then in the Z-Lob index pages we have stored this information from which offset to which offset of this blob data is stored in which particular blob stream. So whenever there's a request to read or modify the data on that particular offset we go and bring on the small chunk of compressed data into buffer pool, uncompress it and modify it, recompress it and store it. So this is the performance number which we have got after this modification the way we stored the compressed blob. And I have also put the blog link here. Here you can go and see the details about this feature. And the second last feature which I want to talk about is the log-free redo log. So whenever a this is the right ahead log design whenever a transition is happening and before it can flash the data pages on the disk the redo log has to be flushed. So this redo log is generated by the multiple mini transitions which are running and each mini transition has to take a log on the log buffer and then only it can write state on the log buffer. The problem here is there are multiple transitions running with each of them is creating multiple MTRs and each of the MTR is waiting for this single mutics the logs is mutics to write onto the log buffer. It was a single bottleneck point and what we did is we made it log-free. So when I say it may be made it log-free the way it works now is that each MTR knows how much data it is going to write on the log buffer and when it comes to writing the data on the log buffer we allocate a specific region in the log buffer to that MTR. So every MTR has its own region assigned and they are writing into their own region so there is no MTR interfering with any other MTR and once the data is written on the log buffer we are being written to the file system cache using log write thread and then it is flushed to the disk using log flasher and if at all there is any MTR on the transition which is waiting for it to be persisted on the disk there is a log notifier thread which notifies this information that after this LSN the data has been persisted on the disk and the transition can commit. This is the benchmark or the performance results which we have got so for 256 connections we have gone from 60,000 TPS to 240,000 TPS with this feature and the last feature which I wanted to talk about is the clone so clone is a simple and very efficient way to create a MySQL replica it's a physical snapshot of running MySQL server and it can be used by a replication setup or in the group replication cluster if you want to add a new node it could be very useful so the way it works is now we have two servers one is a donor server from where the data is to be cloned and we have a recipient server where the data is to be cloned and both of them should have the clone plugin installed and then there is a client to which way we connect to the recipient server and it will run this clone SQL command and if you pass on the information about the host also or the donor also from the data has to be cloned then the clone plugin comes into Pitchland donor and it takes all the data from the donor server and transfer it over the network to the recipient server this data includes everything the IVT files, the redo logs, everything and then it goes to the recipient server and the recipient server flushes everything on the disk and restarts and the regular recovery mechanism and once the recipient server has restarted it has the exact data which was there on the donor server it's an exact replica of donor server now so in the clone we are not cloning the configuration parameters because the port like the port might be different from the recipient and the donor server but there are some configurations which must match like the page size or you know the page size should match both on the recipient and the donor server also they should always be on the same MySQL versions the cross version cloning is not supported and we do not clone binary logs and as of now the clone is supported only for the MLDB tables if you have tables in any other source engine that is cloned as a temporary or as an empty tables and concurrent deals on the donor was not supported and it was introduced but in the last year I guess we did this feature which supports concurrent details on donor server when the clone is going on and yeah that's all I had and this is the link for the blogs where all the MySQL developers writes about the features which we are developing and you can get more details about the new features coming