 So, in point 0, a lot of features in advance, you're using 5.7 as a UL, you're really trying to create pressure to use in point 0. There's new release models coming up since July last year. This is a big ask from many of the big customers saying that in 8.0 we've been doing a continuous release model where we include new features as well as stability and buck faces together as one. And that bleeding edge features sometimes can cause pain because it can cause unpredictability and other side effects. So, in 8.0 we have now moved to what we call a innovation release where those new features that are being experimental can be tested in this. There's new production grade but the support for it is short term is every quarter which means that if you're using 8.1, you want to get a buck fixes, go to 8.2, 8.3 or 8.4. And with that, the long term support release will just include stability and buck fixes which means the next release itself coming upcoming in this quarter in April. 8.4 will be released, 8.4 will become the next LTS. When 8.4 comes out, by right what should happen is 9.0 innovation will start. So, this will carry on for two years. Every two years, every quarter that will be released, 9.7 would be the next LTS, 10.0 would start and so on and so forth. This is accordance to, if the release schedule is in accordance to what they plan, this will come out. The exception here is an 8.0, 8.4 was the first LTS which does not follow this rule because we feel that because 8.0 will sunset in 2026, 8.4 coming out this April would sunset in 2032. You will still enjoy the eight years of support as well. So, this is what happens and typically when that happens, this is the upgrade and downgrade path that are possible where you can actually do in-place upgrade from the innovation release to the next innovation release of the same major version. You can do in-place upgrades accordingly. From one LTS to the next LTS, these are the various types of supported and from an LTS, within the same version, these things are supported as well. So, the upgrade path and downgrade path is all shown here, except that when it comes to downgrade, you can downgrade within the same major LTS release. The file versions are fixed so that you can actually move up or down between them. Anything else in between, you have to do either async replication or use a dump and load to export and import. So, this is actually what happens. So, expect this coming up in the 8.4 onwards. I just want to cover a few features quickly. Mask Your Shell allows you to check for upgrades by using the command, the Ups Mask Your Shell checkup of this facility. Once you run this, you can check the target release from up 5.7 to an 8.0 release or even beyond an 8.0 release. Let me see whether this is running. Sorry. Sorry, this is a PDF file. So, my other file shows it. Why upgrade to 8.0? Better concurrency. The redo log architecture has been revamped. Used to be a single choke point on sync here. We've made it such that the log consumers are non-blocking and therefore you could see from here what has happened is we have actually scaled up the performance of the redo logs and these consumers in this way. The other advantage is the redo logs now are dynamic. You can change the size of your redo log. We start it. My colleague, Mayank later, will share about the dynamic resizable redo logs in version 8.0.30 and beyond. So, this has improved concurrency for the transactional database side of things. This is one of the key reasons why upgrade is encouraged at 8.0. The other thing, the undo format has changed. This used to be a blocker in 5.7. You can only change how many undoed table spaces you need and their fixed size. But once you change it when you create the database, you cannot change it anymore. You cannot increase it or decrease it. In 8.0, we revamped the whole architecture of the redo logs. About 127 undo logs each with 128 rowback segments. If your database is doing a lot of batch and a lot of reading transactions, you can increase or decrease the size of undo logs dynamically. It can be increased and so this helps to improve database availability. You size it according to what you need and this is now possible in itself. There are also advanced features like undo truncation. These are all automatically handled and it has been improved in the later versions of 8.0. Resource groups is another improvement. Whereby you can set priority for all different type of workload and database. For example, you want your batch workload to run at a low priority but you want to maximize priority for your online transaction database. With resource groups, you can set the affinity in such a way that you allow your transactional workload to be higher performance yet your batch still runs at a lower priority. This is now possible with resource groups. This tunes but it's only restricted to certain OSes. For example, it is well implemented in Linux, not so well implemented in the other OSes here. Priority can be set and this actually affects the CPU affinity for those processes affected. This is another reason to go 8.0. Table space truncation has been improved. We used to have a problem with adaptive hash indexes in the sense that this would actually slow down as the number of objects increased when you try to delete and truncate. With the new fix, no matter how many tables you create the time to truncate has improved dramatically up to about 250 times depending on the number of tables you have. The other thing we have improved is columns. In the next release when you need to actually have a table there is no primary key. Invisible column is the way to go because it would actually allow you to create the primary key based on the invisible columns but the primary key would expose that column itself. You do so that your application even though it does not have a primary key it will still be adapted and it will allow it to be for example transformed into a high-ability application which requires every table to have a primary key. Invisible indexes are very useful for soft deletes of preparing a new application called deployment without impacting existing queries or if you have analytics and a batch job that you are running you don't want those indexes impacting your existing workloads. What you can do is overwrite with the optimizer switch for the session of the batch job or the reporting job to expose the indexes and they can run and actually take use of those indexes. The other thing is generally invisible primary key, GIPK very useful when you are having all legacy tables no primary key with invisible columns this will be generating a primary key for that table alone you can now actually add HA to those tables for those legacy applications. Quickly, security roles has been added if you have used other databases you know that roles is a very important part not just for application ease of use but also for compliance because when it comes to compliance you notice that sometimes it's the ease of managing your application permissions it is easier to use roles to assign one time define one time and assign many times and this helps you to actually simplify access management in that sense. 8.0 has now matured onwards with roles added in the database. The other thing that we added to improve security is to use of dynamic privileges where now privileges can be tailored to civic tasks without giving the use of super privilege which tends to be too wide a permission and prone to abuse. This now helps you the task to be performed in a very fine-grained manner enough to do this job and that's it. No permissions breached in that sense this lowers the risk in such ways alright and the other thing that's very important is actually when you have all connections typically what happens is that once it's reached you have a connection lockout alright with the new feature contributed by Facebook now you can have a separate out of bank connection that does not contribute to your max connections with this the DBA can go in if the session is full he can now manage the database by killing off processes that's not needed and they can actually secure this connection by using a set encryption certificate apart from the other side alright then in 8.2 release we have now restricted the use of TLS such that only the ones that has perfect forward secrecy as a H2 and encrypts data using AES based on this these are only the ones allowed alright so if you run the database and you connect to it you would see sometimes see this error alright that says that this will be deprecated when it comes to a de-support stage in the later release like 8.4 or even later when this is de-supported you will not be able to use ciphers that are not listed here for TLS 1.2 or 1.3 alright this is to prevent unsafe use of ciphers that are not supported anymore alright this is actually what we're supporting HA very simple we have now added geographic redundancy may not be one it can be more than one cluster the other one can be a single node we also add admin API locking for cluster operations so that it would then obey if it's an exclusive lock the other DBE commands have to wait if it's a shared lock no exclusive commands can be run so this has improved cluster consistency and cluster operations predictability alright and also improving security inter cluster and intra cluster channels can now be encrypted and the cluster admin user can now be protected by a passwordless file using passwordless authentication okay so this is improvement in 8.033 itself alright this is improvement my colleague will talk about this later we scale out and transparent copy transparent database routing alright and clone plugin has been available for copying databases alright so this has been added in 8.017 you can use this to provision a database by cloning in this sense or for copying a database that you need to do downstream for some development dev test purposes alright cloning is very fast on this because all these are parallelized and it's auto-resumable so this has been the benchmark that we've done against a few other tools alright my Mexico shell and dump load is actually the preferred tool compared to my scale dump so they should use this today and the innovation release introduced instead of dumping to a file you can now copy from one database to another target database on-prem or in the cloud using my scale shell copy without having a storage layer in between which means that you can use the source instance it is memory you can now copy your data from one instance to the other in the other cloud either on-prem or to a cloud database in that for that matter alright in the cloud database certain restrictions apply so that comparatively is maintained for the cloud database service that you're targeting across alright so this works both on-prem and on-cloud developer productivity you have VS code and also my scale rest services these are now in preview mode you can download them and coming to the end of the talk please stay for the other sessions that we have for my scale by my other colleagues as well so thank you for your time thank you for a remarkably precise 15 minutes that's really good is Mike in the room? hang around David maybe there's some questions do you have any questions for David before he flees the scene you have a minute or two while Mike sets up this is great I've run into the out-of-band problem TLS compliance issues and the lack of role-based access control so these are all to my mind very exciting features any questions for David? nope alright thank you very much one two okay Mike is a consulting member of the technical staff for Oracle India and it's going to talk to us about the NADB engine dynamic re-logs and the instant DDL that sounds impressive good morning everyone can everyone hear me? the last row so about me my name is Mayank I come from Bangalore India and I work in NADB storage engine development team in MySQL Oracle and I've been part of MySQL since 2011 and today I'll be talking about a couple of features in NADB storage engine so this is a safe harbor statement which says whatever I'm presenting over here is for informational purpose and this is the agenda of today's talk I'll talk about the configurable DDL logs which we have recently implemented in 0.30 and instant add and drop column in MySQL I'll go in the details about these features and how they are implemented and then I'll take questions if you guys have so configurable DDL so DDLs are right ahead logs for NADB so right ahead logging is a mechanism in which we before we write the table space pages on the disk we write this information into a log and then we flash that information on the disk why do we do that? because when we modify when we are saying the transaction is committed the changes has to be persisted on the disk and if for each and every modification we start persisting table space pages on the disk then I might be bottlenecked so what we do this right ahead logging mechanism does is instead of writing the entire page on the disk at the time of transaction commit we write the information into a log and then we persist that log into the disk so it saves IOS and the TPS goes high but this right ahead logging is implemented as a redo log in NADB and this is a circular buffer on the disk so it keeps on rotating and keeps on truncating at the end when the redoes are not needed anymore if you have a large very big redo logs so it will save us lot of flashes of pages when the transaction is committed but at the same time if there is a server crash happens the recovery will be longer because you have bigger redo logs you have to apply those bigger redo logs it will take some time but if you have a shorter redo log the recovery will be faster but you have to flush lot of pages to the disk when the transaction is happening so TPS goes down so what should be the optimum size of the redo log that's based on your workload and trial and error so you can decide what redo log size you want to have and then you can set this configurable variable in ODB log file size but the problem is this in ODB log file size variable is not dynamic so every time you set this variable you have to restart the server then only it will take into effect so there was a request to make it dynamic instead of having a requirement of restarting server can you make it dynamic so that's what we did in 8.0.30 so redo logs are still circular buffer on the disk but instead of one file now we have multiple files using which we create this redo log and instead of talking about the file size I mean we say these much MB of the file size we don't talk about file size no we talk about in ODB redo log capacity so we say this is the capacity of redo log I want to have in my system and based on that we automatically calculate the size of each file and number of files to be created on the disk so now if you look into the data directory folder we have a dedicated folder hash in ODB underscore redo in which we keep all the redo log files so there are 32 files of redo logs right now and there are few files who are suffixed with underscore TMP and there are few files who are not suffixed with underscore TMP so the file who are suffixed with underscore TMP are the files which are free to use I mean they are empty files but the files which are not suffixed with underscore TMP are not having the redo logs which are needed by the system to have the consistency and whenever we need more redo logs whenever system needs more redo logs one of these TMP files will be used to store the new redo logs which are being generated on the system so if I put all this redo log files in a single line so this is how the redo log entire redo log looks like so this is the checkpoint adolescent so anything before the checkpoint adolescent what is not needed in redo log for the consistency because this information has already been persisted on the disk anything after checkpoint adolescent is what is needed for the consistency point of view because this information has not gone to the disk on the table space pages and those white boxes are the files which are underscore TMP which are not used so now if we need more capacity or more space to store redo logs these files can be reused okay so these are the empty files these are the files which are needed and these are the files which are having redo logs but they are not needed so what we do is we delete those files and then add the empty files to the end of the end of the row so now we have the redo log capacity increased where we can store new redo logs and if there is a scenario in which the redo log is getting filled up all the files are being used but checkpoint adolescent is not moving further then the DBA will get a message on the error log saying that your capacity is going down this is a time to increase the capacity otherwise the system will start throttling the deep transactions we have observability so in performance schema in ODB redo logs file stable we can see what all of the redo logs files are there which are using redo logs and for every file what is the start adolescent and end adolescent adolescent is a log sequence number log and so the 5 files are there where the redo logs are generated the rest of the 27 files are with the suffix underscore tmp as I said before these are the files which are ready to be used again as I said we can do this resizing dynamically so let's say I want to make the redo log capacity 200mb so this is a command I said the said global in ODB redo log capacity as 200mb and then the resizing will happen if you want to upsize redo logs that means you want to increase the capacity of redo log that is instantaneous because we don't have to do anything so whenever we create a new file the size of the file should be according to the new capacity of redo log but when we want to resize it down so let's say from 200mb I want to make it 100mb then we might already have existing files which have redo logs which cannot be truncated till now so this operation happens in the background then how does user know this operation is finished or not so for that we have this status variable in ODB redo log resize status if the status say resizing down sorry the status say resizing down that means resizing is still happening behind the scene but if the status value is okay that means resizing has been done and then we have this another status variable in ODB redo log capacity resized which says what is the capacity of current redo log in the system so if this capacity is less you can dynamically increase it or if it is more you can dynamically decrease it now I will talk about this instant add and drop columns which we have introduced in MySQL 8.0 so before we go into this feature in detail it is very important to understand the row format of row format of in ODB pages on disk so how do they look like this is how a typical in ODB page on disk look like every page in ODB have two dummy records infimum and supremum and the other records are connected in a singly linked list infimum is the head of the singly linked list and supremum is always the tail and the records are in between and every record has two part one is record metadata and another is record data so record data is the one which user has inserted and the record metadata is the one using which we can interpret the inserted record by the user as I said the two parts of the record metadata might contain the null bitmap for the nullable columns it might contain the length of the variable and fields and other some information so now if I want to look into record metadata how does it look like it has two parts one is variable size row header and another is fixed size row header the fixed size row header can be of two type one is a six byte fixed size row header one is five bytes so if the row format is redundant the row header fixed size row header is always six byte if the row format is compact dynamic or compressed the fixed size row header is always the five byte so these are the four format in ODB supports if we look into the variable size row header it also has two parts one is null bitmap one is where length fields length so what are these so where length field lengths are the information about those columns who are variable length so let's say I say where car 60 right a column is a where car 60 but when the user inserted the value he inserted 10 characters so the actual size of that column on the disk is 10 characters so we we store this information here and then we have this null bitmap so let's say in my table I might have number of variable columns which can which can have value nulls right so all those columns whose value is null in this particular row we just keep a bitmap and set that bit one we don't space or deserve any space for that particular column in the actual data we just say this particular bit is one so that column is nullable or that column is null in this particular row yeah this is not in the redundant row format so in redundant row format we actually keep the space for this column so as we can see right so with the disk format and the record format every record has information about the columns in which are present in the table so if you want to add a column or if you want to drop a column this is very obvious that every records on the row on the table has to be modified and this might take a lot of time so if your table is huge you keep on going you keep on going and modifying each and every record it might take a lot of time it is costly it takes time disk resources etc the table is also locked when this rebuilding is happening and it is a problem with the replication also so you do some this ddl add column on the master let's take x amount of time the same thing goes and happens on the slave again x amount of time so there was a requirement from the customers and the user that can we make this thing instant so this is the idea behind making this thing instant so we say is it possible to have this information updated only the metadata updated when we do instant add and drop instead of modifying each and every row if this could be done then there is no table rebuild right this replication will be benefited because we are not running everything twice on the long operation twice once on replicant once on master and the table lock is also not required table rebuild is also not required so how did we implement this solution so in mysql 8012 we came up with a solution in which we can add a column to a table but there was some limitation only the column add was supported the drop column was not supported in this version the other limitation was the column can be added only at the end of the tables so the column could be the last column of the table it cannot be added at the first or in between any of the column what we came up with the new design in which we can add column at any place be it first column middle of some of the columns or at the last column we can drop column from any any any place so this design takes care of add and drop both and the syntax is same right you just after and first is supported it was not supported before and the drop is supported how did we do that so we introduced something called row versioning so now whenever we add and drop a new column or and drop an old column from the table we bump up the version we say okay now the version of the column a version of the table is x or y or z whatever and this version is stamped on the row itself on disk saying that this particular row belongs to this version right and with this information we are able to interpret any record which is stored on the disk how it is done I'll explain in the future slides so but where do we store this version I said the version is stamped on each and every row so where do we store it so as I said in the redundant row format we have 6 byte header and these are the different fields of that 6 byte header and the last field was info bits so in info bit this particular bit was available it was not used so what we did is we used this bit to indicate that this particular row if the bit value is 1 this fixed size row offset a fixed size row header has a prefixed of 1 byte version similarly with the 5 byte row header we have this different fields in the row header and the info bit this bit was available which we used if this bit is 1 it indicates that there is a version prefixed with this 5 byte row header now with this information if you look at the any row on the disk the fixed size row header if that bit is 0 what is the next what is just before that is a variable size row header which I told you in the previous in the earlier slides right but if the value is 1 so there is a version bit version byte in between these two so this is the information which is stored on disk right now so as I said now we are storing the record version for each and every record on the disk the 2 new metadata we have uploaded or updated for the column is version added in which version this column has been added an instantly added column the column is dropped in which version it is dropped with these 3 information it is easy to interpret any row on the disk how so let's say I want to fetch a row so all the columns which have version dropped greater than 0 that means this column has been dropped right so we don't need to read that column so we can ignore that column value if it is there on the disk all the columns whose version added is greater than the current record version so that column value is not updated on the row so we have to read the default value of this column on the disk okay I'll explain all these things with an example so let's say I did a create table it has 4 columns C1, C2, C3 and C4 I inserted this a row on the disk so VC1, VC2, VC3, VC4 is the value of those columns now what I did I dropped the column C3 so then C3 is dropped now and I added a column C5 with the default value C5 default so you see the version has increased by 1 because I dropped the column C3 version has increased by 2 because I added the column C5 so if I look at the metadata this is how it looks like so for C1, C2, C3, C4 the version added 0 because they were already existing when the table was created for C3 dropped version has become 1 because this is the version where C3 was dropped and for C5 version added is 2 this is the version the C5 is added and this is the default value of the column C5 now if we if I want to read this particular row from the disk if I do select star what should be presented to the user so with this information we can see the record version is 0 nothing is stamped on the row I said the bit is 0 it's not 1 so the record is in version 0 current version of the table is 2 the latest one C3 was present on disk because this is version 0 and C5 isn't present on the disk because this row is inserted into version 0 with this information what do we do when we read the row we read the value of C1 because that is present now we read the value of C2 but we don't read the value of C3 we ignore the value which is stored on disk for C3 because C3 is dropped now we read the value of C4 but we also want to read the value of C5 because C5 is not present on the disk so what do we do we read the default value of C5 and when the row is presented to the user this is how the row looks like it has value of column C1, C2, C4 and C5 so this is how the row is transformed which is in version 0 to version 2 now I'll explain this entire flow with an example so let's say we have a table it has 2 rows R1 and R2 none of them have any version because we have not done any instant add and drop I did a DML I updated row R2 to R2 dash still there is no version right because there is no instant add and drop done I did a DML insert R3 is created with no row version now I try to do an instant add and drop column so when we do that we don't touch anything on the row but we just added some information or some metadata information into the dictionary tables no row on the disk is modified right now if I do an insert the R4 is added but the R4 is added with version 1 because now we have already done instant instant add and now if I modify this row R3 it become R3 dash in version 1 so now we have 2 rows R1 and R2 dash with no version R4 and R3 dash with the version 1 and when this information is first we read the data from the table we read the metadata from the data dictionary and we transform row R1 and R2 dash into the version 1 then this transformation happens and R1 and R2 dash are presented to the user in the latest version which is version 1 and R4 and R3 dash anywhere doesn't have to do anything but they are already in version 1 so this is what it looks like now the instant algorithm is the default algorithm so if you try to add a column or you try to drop a column without giving algorithm is equal to anything by default we try to do everything instantly if at all instant is not possible we fall back to algorithm is equal to in place or copy algorithm when we do optimize table or we do any other alter table which is causing the table rebuild then anyway we are going to modify each and every row because the table is getting rebuild then instant information is reset and this row version which I said each and every row is now being stamped with the version so this row version has a limit which is 64 for now we realize this add and drop column is not a very frequent operation in the real world so the 64 should be sufficient to to have a limit on the row version 64 and if after 64 you try to add a drop a column with algorithm is equal to instant error that maximum row version has been reached please try algorithm is equal to copy or in place if algorithm is equal to instant is not specified I said you can it by default it try to do everything by instant but if the 64 reaches behind the scene it will try fall back to algorithm is equal to copy and the instant information or instant metadata is reset again now the compatibility so upgrade so is there any issue with the upgrade with any other any previous version to the latest version there is no transformation of rows needed on the on the disk so that upgrade is as it is being done now we might have rows which are inserted before the first implementation of instant add we might have rows which are inserted after the first implementation of instant add or we might have the row before the new instant add implementation or rows after the new instant add implementation all these rows are handled correctly behind the scene and as I said once the table is rebuild you do optimize table or alter table which causes table rebuild is all this instant metadata is reset and export and import so when we export a table space this is which has had other table having instant add and drop column this instant information is written to CFG file so now with with the if you want to import the CFG file is a must so you have to have a CFG file if you are having it table with the instant add and this is a table which says it will not supported so no instant drop in the source table no instant drop in the target table anyway it is allowed source has instant and drop, target doesn't have instant drop, target has all the columns to present in the source table presently the import is allowed source doesn't have instant add column, target has not allowed because we don't know who has added what column has when added what column has been also is instant write column. The import is allowed only when the instant metadata is matching exactly. And observability, so we have these two tables, information schema, InnoDB tables, and information schema, InnoDB columns to give some observability. How? So let's say I do a create table T1 with C1, I and T. And we have n calls four. Four is because we internally generate three more columns. One is row ID, one is transition ID, and one is roll pointer. And the fourth one is the C1. The total row version is zero because still no instant add and drop has been done. This instant call is the column field which was used to give information about the previous instantly added columns, previous implementation of instantly added columns. So we have kept it for the backward compatibility. And now if I want to look at the metadata of the column C1, I see the has default is equal to one. If it has it been, had it been the case, this C1 was added instantly. This has default would have been one and then it would have the default value. But right now C1 is not added instantly. So we are getting zero here and the null here. Now let's try to, okay. Now let's try to add a column C2. So C2 is column, C2 is added with the algorithm is equal to instant. Now n call has become five because the new column has been added. Row version has bumped to one. And now if I look in the metadata of C2, it says the has default is one. And this is a default value of this column C2. Now this is what I tried on my system, my development machine. So I have a table with 8.3 million rows and I try to do add column with algorithm is equal to copy. It took 29 seconds. Then I try to do algorithm is equal to instant. It was done instantly. There are a couple of limitations. So we do not support this instant add and drop with compressed tables, which we realized they're not used too much. We do not support instant add and drop for the tables which have full text index. We do not support instant add for the DD table spaces. The tables which are in DD table space, those are the dictionary tables. Anyway, user is not supposed to modify those tables. And there is no support for the temporary tables. The temporary tables are not supposed to be very big. So yeah, those are not supported. And I have put the link of the two blocks where I talk about these features in detail. And that's it. Thank you. Is rough. Is our next speaker in the room? Oh, time for Q and A without the next speaker doing setup. Do we have any questions for Mac? Please, hang on. We have a stream, so microphone please. So if I not catch from your presentation, but according to column add drop feature, it looks like your table will tend to blot because you store old values of deleted columns. So I guess you need some cleaner maybe for process which will compactify your data. So do you implement it such thing? No, so the blotting is, so we just keep the metadata information of the table which and the column which is being dropped, the metadata only that this column was in this version dropped in this. The reason we keep it, so we might already have some rows which have the value of that column, but when we fetch that row, we should not read that value of the column. It's the cleanup afterwards. So if you've done bunch of instant ads or isn't deletes more usually, at the time you do the delete, data remains on the disk. Yes. So you do an optimized table. Is that correct? Yes. The rows are remain at this as it is. So if you want to do a cleanup, you have to do this optimized table which will clean up everything. So that was my question again. That was my question too, because there's a compliance issue for particularly for data processes who have to be able to issue deletion certificates to customers. There has to be a way of removing data. Yeah, so for that we have this optimized table, right? This command is there which will clean up everything. So my second question on that was, you mentioned there's a limit of 64 row versions. Does that limit by optimized table? Yes, it does. It gets reset to zero. Wow. After that you can just keep on adding instant drop. It's not that you're gonna add hopefully hundreds of columns, but you're not gonna just kind of get into a dead end that you can't escape. There's always a way out. Yes, yes, yes. It's like a maintenance window, and that's beautiful. Is our next speaker in the room? Ah, please set up now, sir. Right, we have still two minutes. Any other questions for Mayak? No? We've got a little bit of certainty. We don't have any room in the room. Thank you so much. Okay, so, next up we have Rafa Talbari's founder from Alch. If you'll just put that away, why not we can make a change? Which I think is going to happen. Reasonable advice for you. Good morning, everyone. My name is Rafa, and this session is going to be about how to open different structures for high probability in the mice that are simple. We're going to see a few business names. We're going to talk a little bit about inner-wake clusters, combinational, proxies with control stages, ancient proxies, proxies, but with a data, master, a... One minute, two, and then a few questions. So, a little bit about me. First of all, I'm a father of two, and I deserve my time much more than computers. I've been in data for many years. MySQL is here for around seven years. And, well, for the last year, my own adventure is going on, which is this. It's Gornark. It's a MySQL professional service consulting company, and we earned that. So, let's go straight to the point. The first strategy is in ODB cluster. And I think that it makes sense that's going to be the primary option because that's a native solution from MySQL to address these type of strategies. Sorry. And this solution has automatic failover. It works for vitally synchronous application. And before version 8.2, the way to achieve or to get this HA in place was to, from the MySQL router, I will go into details now about the architecture a little bit to provide a two different flows of connection or connection stream, one for reads and one for writes. From version 8.2 onwards for MySQL router that has changed, and now just with one single connection stream, you can achieve the same. I'm going to go into super detailed things, first of all, because we have a talk about this right after mine and there will be more in detail than me. But just as a glance, the way this works is yes, you set up your cluster, you have your primary, you have two replicas or you can have five or the nodes you decide to have, and then you put the MySQL router on top of it. Usually the most convenient way to do this it's to use the MySQL router to implement it in the application layer itself and then using as a sidecar or just an additional or an attachment of the application and then it manages the connection to your cluster. It's worth it to mention that InnoDB cluster offers the option of a cluster set which is essentially ideal for DR for disaster recovery and what it does is essentially you build your cluster, InnoDB cluster in DCA, data center A, then you build same in data center B and you interconnect both in an asynchronous way and this is important to mention that it's synchronization for obvious reasons otherwise latency between the two clusters could be too much, good heat performance indeed, and then you achieve a disaster recovery plan properly. Next one, personally this is one of my favorites, to be honest, is a combination of tools. It's a ProxSQL plus orchestrator. Console comes into picture as well. Sorry. The goal is the same. I mean, what we want to achieve is same. It's auto failover, fault tolerance, which it's going to become the EHA solution. This one has an addition of the autopoly discovery that's offered by orchestrator, which essentially means that Zoom as a node implements or joins the cluster you have, it will be recognized by orchestrator and then from ProxSQL too. It's important to highlight that this strategy is more suitable for asynchronous replication. This is a classical replication where you have a primary and a few replicas, as read replicas, and then they are serving read traffic. In here, what we do is decline the application will connect to the proxy SQL as a proxy layer, same as for my SQL router before. You can use one single connection string as well. You can use split connection strings. You have multiple ways to connect to proxy SQL, depends on how you do the setup that's up to you or to your requirements. And then from there, it will connect to your cluster. What's the thing here is that, per se, proxy SQL won't offer you the option to do an autopoly discovery. It won't auto promote the primaries for you. So if a primary crash, the replica won't get promoted right away. That's why we implement orchestrator right here. And that's going to be like the one that decides what's the status or what's the topology of your cluster. This means that, is this tool the one that will control if your primary is healthy, if your replicas are healthy? And if, for instance, the primary is not healthy anymore, it's not serving traffic properly, orchestrator is the one that will manage the promotion of a new primary and the demotion of this primary as a replica or discarded or whatever it have. The thing here is that orchestrator out of the box doesn't talk to proxy SQL. So we put console in the middle. Console is going to be, or let's say, version of truth. Orchestrator will talk to console because he knows how to talk to console. That's implemented. And it will say, hey, this is your primary. Hey, this is your replica. It will save everything in console and it will remain there. And the good thing here is that console, through a console template, can talk to proxy SQL. So then we have the cycle completely closed. Now we have a orchestrator, which is the one that is managing topology, saying to console, this is the status of your topology right now. And then console will be able to say to proxy SQL, hey, this is the information that I have for proxy SQL, so please react. So ensure that the primary or the great workflow is pointing to the primary, the great workflow is pointing to the replicas. Replicas plus primary depends on the implementation that you do. But this key. It's important to highlight too that multi-data-centered replication is supported too. It requires a little bit more complex configuration. In essence, it's more or less the same with the addition, of course, you can imagine, of an additional cluster with an additional orchestrator here and what we do for multi-data-centered replication is we add a cluster or orchestrator. So we add three nodes, and then we synchronize everything. It's a bit more complex, but it's doable too, and it works very well. Just a word warning is that so far, orchestrator is not being maintained. From the original creator, last commit of code in GitHub was from January 2021. It's forked by Percona. There were some rumors that they would maybe maintain it. I have no idea, but the reality is that from the same date, there is no any pull requests, any change in the code. So so far, it looks like it's a little bit unknown. Would be a pity. It's a real, real, really good tool. How are we doing with the time? I'm not following that. No, we're fine. So the next solution or strategy could be ProxyLayer and Galera. I put the cheap proxy because I talked about proxy before, so you know, just to be a little bit fair, but the offering here is more or less the same. I would compare this kind of with InnoDB cluster in the sense of the deployment. It works more or less the same. We put ProxyLayer on top of the client applications. This one will distribute the traffic to the different nodes. So the important thing here is that Galera works in a vitally synchronized application. So you can have a multi-premity cluster. And thanks to the ProxyLayer, you will decide how you want your traffic to flow. You can have multi-master. It's indeed recommended for Galera. But from the ProxyLayer, you can decide to write only for one of the primary storage. That would be okay. And having this, you ensure that if one primary goes down, you have all the two primaries up and running. It's just a matter to update the ProxyLayer and then it will right there. Even you can have, for example, in the cheap proxy, you can have all the masters registered and then it will automatically just select the available primary. It's a simple way to implement HA. Galera is, I really like it, but I have to admit that could be a little bit tricky sometimes. Yeah, sometimes it requires a little bit of change of mindset from the application layer, but it's not a big thing. So far, this doesn't work very well for multi-data center setup. It can be done as well, of course. I mean, after all, almost everything can be done, but this would require much more configuration and probably some kind of custom coding to make the switch properly, but it's there. Next one is Max Scale. I'm not gonna attach this one quite a lot because I didn't work with it quite a lot, but in essence, in essence what it offers is just the same as before with Orchestrator and Proxy SQL. So essentially a topology manager and auto failover and a proxy layer all in one product, which is great. Again, you can connect just with one single connection string and Max Scale will discriminate the traffic between reads and writes, or you can have different ports, different connection strings for writes and reads, it's up to you and your configuration too, and it will go to the cluster. This can be either a synchronous application, so you can have a primary, a couple of replicas, you can have, it works with Caledra as well. I didn't mention it, but this is the very beginning that this is the native solution from MariaDB to address HA in their cluster, but can be used with MySQL too. It's not widely used, but you can use it too, which is the reality. The thing here is that Max scales offer two different versions, one is open source and the other is enterprise. The open source feature is a little bit a noun for this application too. It's been maintained, I'm not saying it's discontinuous, it's not like Orchestrator, but it's not clear. It's not clear what's gonna happen in the coming months if they will continue offering that open source solution or not, I don't know it neither. So the last solution that I want to talk about is current HA, something that has been built in-house, sorry. Essentially it covers as well, same as the other product, you know, just to offer a full HA solution, same, which is for Torrance, auto topology, discovery, auto promotion of failing nodes, and that kind of things. Same as we were talking just for the others. It was only for a synchronous application so far. The way that the traffic split is through connection strings, so we are not offering one single connection string to distribute the traffic. And it's, yeah, it's super important to mention that it's under active development, it's not production ready yet, although I'm looking forward for testing people, so if you want to talk to me later. It's open source, of course. You have the GitHub project here, just free to have a look in there. Okay, so this is the end of the talk. If anyone has any question, I think that I want to be too fast maybe, but I'm Spanish, I mean that's quite common to me. Thank you, it is not often that I'm dealing with speakers who have finished early. Okay, questions, anyone? Hang on. First of all, I want to thank you for your inspiring speech, and I have questions about the Proxy SQL diagram as all the solutions that you provide. I noticed that Proxy SQL has two elements that console and orchestra chart. I don't know about the functions of that two elements, and can you explain more clearly about its role in the system? And can I just, you want not to both? Okay, so I just know about the function. Okay, thank you. So if I understand it correctly, you want to know more about this? Yes, about the console and orchestra chart. Okay, so is there any specific you want to know about console and orchestra or you just want me to, just give you a little bit on how it works? Just give me a general view about this. Yeah, sure, sure. So essentially, orchestrator by default has an option in the configuration to write what's the primary, so the primary information directly into console. Okay, the information about the replicas requires from you to build a script to do it, okay? In essence, what you end up is having an orchestrator note that is capable to write what's the information of the primary, so who is the primary, and what's the port, and who are the replicas, how many replicas you have, and which port are they accepting connection, usually 3306, right? Then that information goes into console. Usually a good approach is to have console locally installed in your orchestrator, so latency will be very good. So, yeah, thank you, yeah, thank you. Okay, we have a screen, please use the mic. Okay, thank you so much, yeah, okay. Do you provide automatic failovers for Konasek because you are using us in Konasek application and how to provide complete consistency in this architecture? Yeah, so, so far, yes, we provide auto failover. The way we provide it, whether it can be customized a little bit, but the logic is that if a primary just stops accepting connections or it gets essentially time out from the checker, it will decide which is the replica that should be promoted. The decision will be based on factors that can be set up by the client, but by default, it's going to check who is the replica that has zero replication lag and the same binary log coordinates or GTI decoding. To ensure that you know that sometimes seconds behind master is zero, but there is a replication issue. So, we double check that both parameters are okay. They are in sync with the primary, and then it will use that one as a replica. It's worth it to mention that can be adjusted. So, the user can decide, no, I don't want to have that because what happens if both replicas are slightly out of sync I don't have a primary anymore. So, you can decide to use a different approach which is an instant failover. So, it will just take one. You can set up priorities if you want to. If you don't, it will just go for the closest one and promote it. If it's losing data, I don't care. It's available. So, you can decide what you want to have. Is that an answer? Yeah, thank you. The automatic fellowverse will be driven by the Konaseche or the... The automatic fellowverse will be driven by the... Konaseche. Konaseche, yes. Thank you. We do still have lots of time, but it's okay to have a break too. Would there be any other questions for our current speaker? No? No, we have one. Okay. Hi. Amongst all the solutions you've just presented, I'm still not sure which solution is best to pick. Can you do a bit of comparisons between all of them? Yeah, it makes sense. Because there is no best. That's the real answer. Depends on the business case. Depends on what you want. I'd say... I have a... Well-designed application. Yes, I have a well-designed application. Well-designed is... I mean, in this case, for example, that the amount, the size of the transactions are small, they are properly contained, they are not huge insert with thousands of inserts in it. So, I can afford having synchronous application. I can go with the solution. This one is simple to implement, simple to maintain. Okay? I mean, just like I hear, but I really like the solution. It's very efficient. Okay, if you're in the MariaDB ecosystem, you can go with Galera. It requires a little bit more of setup, but just a little bit. It works, I mean, it's very solid too. Now you say I have an asynchronous cluster because sometimes I have, I don't know, network latency, or I really don't care about, you know, having all data in sync all the time with my replicas. I can afford that. It's okay. This solution is super solid. Problem is this one. This is a blocker right now. So, if you have an asynchronous environment, I would still go with this because the state does not be maintained, that's true, but it works. So, yeah. About the max scales, for my scale, I don't see it quite often. And about the Godak solution, I cannot offer it because it's not production already. It will be, for instance, but not yet. So, depends on what you need. I don't know if that really answered it. There's not one single one, but depends on what you need, you will go for one or another. And that's going to be usually not only, but mainly based on what type of replication you have based on your needs, on your business needs. Okay. Anybody else? Nope, okay. Thank you so much. Thank you, everyone. So, we now have about a 10-minute break. Sorry, I just forgot to mention something. Yeah, I still have time. The floor is yours. Thank you. Now, I just wanted to mention that it broke some t-shirts like mine. So, if you want one, just come to me and we can talk a little bit. Now, we're going to be on. Thank you. Now, I'm really done. Thank you. Swag is the actual purpose of tech conferences. All right. So, we now have a nine-minute break. Hopefully, we have been persuaded that we should operate HA. Hananto is going to tell us how to maximize HA. All right, we are back on for a MySQL master principle solution engineer. Hananto, I won't try that. Hananto, he'll do. He will talk to us about achieving maximum high availability with MySQL. Take it away. Okay, thank you, everyone. Okay, allow me to present about maximum high-coffinability with MySQL to reach the most maximum system uptime for MySQL database. Okay, a little bit about me. I am Oracle database professionals, basically 16 years Oracle database enthusiasts, but five years MySQL database enthusiasts, and I'm based in Singapore. So, as we understand that MySQL can run on any servers, including the commodity server. Therefore, our high-coffinability solution is not using expensive hardware. We don't use a shared storage, but we use shared-nothing architecture, meaning that we don't need expensive hardware to run MySQL on high-coffinability mode. So, we rely on database replications. As you know, that database replication is like a mechanism where every transaction that committed in one node will be committed also in another node using replication channels. Once we create replication channels on replicas, there will be two threads that actually running. The first is actually IO threads. The IO threads will actually load the data, load the transaction logs in form of bin log event from the source and put it in the relay logs. And IO threads, I mean, SQL threads will read the data in the relay logs and commit the same data on replica servers. So, this is the traditional way how to create MySQL as in conos replication. Basically, we need to create the replication users on primaries and then assign replication slave privilege to that user so that we can create replication channels on secondary node or replica database, pointing to primary database using a replication user and start the replications. The story is not complete yet. Because we want to provide end-to-end database solution for high availability. So, application need to be, you know, we need to help the applications to do a connection palover as well. Because replication meaning that we have a more than one database servers running and we need to make connections from applications to be transparent, regardless the database technology. So, we come up with the InnoDB replica set. So, InnoDB replica set is basically a modernizations of traditional master slave replications. Okay, that we provide MySQL routers as a component to provide transparent connection palovers from the applications. And then, we also provide admin API that we put this admin API on our tools, what we call it as MySQL shell. So, you can use MySQL shells admin API basically to easily configure InnoDB replica set. So, again, this InnoDB replica set is based on asynchronous replications, but we have automatic transparent application palovers, connection palovers from the application using MySQL routers. So, this slide basically talk about how easy to create asynchronous replication using admin API. We can run the admin API through MySQL shell. Okay, a configure replica in set instance is to configure the local instance, prepare the system variables and make sure the system variables is set up correctly for replications. And if the system variable is not dynamics, then the admin API will trigger automatically start. But once done, we do the configure instance on all nodes, then we can create primary nodes of the replica set by easily execute create replica set API command. So, we can use admin API as well to view the status of a replica set. Just log in to MySQL shells, log into one of the database, especially the primary database using MySQL shells and then issue command status. And at this point, we have a one replica set with one database running as a primary and database is running as in red-white mode. Then to add additional replicas to expand or to scale up in ODB replica set is very easy, right? We just run admin API at instance, then using auto, it will check the GTID of the new nodes. If the GTID is actually the subset of the replica set GTID, then automatically incremental recovery will be executed. But if GTID is completely different, you don't have to do a backup restore in this case. If the GTID is completely different, then admin API execute adding instance using MySQL clones. So MySQL clones will automatically execute it. Once clones done, then it will continue with incremental recovery until the stitch of the new node will be the same as in ODB replica set. And we can view the status. Okay, as you see over here, we are on the state that we have two nodes, one node as primary and the second node that we just added. Okay, it is running on red-only mode as a secondary node. And again, okay, MySQL routers come as handy utility for connection transparencies. So the application can just connect to MySQL routers as database endpoint. If application connected to red-white port, it will connect to primary. Okay, if application connected to red-only port of the MySQL routers, then applications will be connected, the connections will be directed to the secondary node. And to use MySQL routers is very simple. What we need to do is basically to install MySQL routers binaries and then run bootstrap and starting up MySQL routers. So connecting to database, even though we have more than one, okay, that replicating its other in the replica set setup, we just need to connect to the primary node. I mean, we just need to connect to MySQL routers. And if we use 6446, it is the default red-white port for MySQL routers. Then we will connect to primary node. If you're connecting to 6447 of MySQL router port, then we will connect to secondary node. And of course, MySQL routers have routing strategy, okay? For primary connections, it is first available as a routing strategy. That means connections will be pointing to one primary. But if we are connecting to red-only port, then the routing strategy is actually run Robin. So yeah, we will follow balancing the red-only connection across secondary node within the InnoDB replica set. And using InnoDB replica set is very easy, okay? To do a switchover, okay? We don't need to activate or fail over the VIP, for example. Okay, we just need to issue command set primary instance if we want to flip to primary becomes secondary and secondary become primary. And application does not need to knows about changing on database topology because what application need is just connecting to MySQL routers. And MySQL routers will redirect the connections to either primary or secondary depending on the port that application is using to connect to MySQL router. We are not stopping there, okay? InnoDB replica set does not guarantees full consistencies. So in order to give a strong data consistency, as well as automatic fellowverse, okay? We use additional plugin, what we call it as group applications. So group application is the hike availability engine that we are going to use for our cluster. I will explain later, okay? Group applications will provide the consistency or zero RPO. In this diagram, as you see, whenever application connect to the primary, okay? Which is great, right? And making a transaction and transaction is to be committed. Then the primary will send over the logs to all the secondaries for a quorum process. And the transactions will be committed if majority of nodes certified the transactions. Once the quorum is rich, okay? Majority of node certified the transactions is member will eventually commit these transactions to the InnoDB storage engine and so on. So we support up to nine members, odd numbers of members is preferred, okay? Because we want to have like, you know, majority members in the event of one or two nodes down. And then multi primary mode is also supported, but with some limitation. Group application provide automatic primary file over as well. As you see over here, we have two kind of clients, okay? White clients and red clients, red clients reading from the secondary node, red clients reading and doing transaction on the primary node. Okay, once the primary nodes down, for example, the group applications will automatically point one of the surviving nodes as the new InnoDB, I mean, the new primary nodes and the red clients need to connect to that nodes. Is it practicals? Maybe, okay? Because we missing one point over here, which is application connection, you know, or failover. Therefore, we come up with InnoDB cluster. So what is InnoDB cluster? InnoDB clusters is group applications integrated with MySQL routers and MySQL shell. So MySQL routers over here is the same like MySQL routers that we use for replica set, okay? That we can use these MySQL routers for connection transparencies. If the primary node down and primary node failovers to one of the other nodes, then, you know, application does no need to knows because what application need to do is just to connect to MySQL routers in red white pod and connections will be pointing to the new primary node. And to configure the InnoDB cluster is very easy because we have integration with MySQL shell. We can use MySQL shell admin API to configure instance and create clusters after that adding a secondary node. And, you know, we have the same process as adding the new nodes into the replica set, okay? We don't need to do backup restore, okay? Because the admin API itself will triggers the MySQL cloning process. Okay, this is the high availability orchestrations against primary failures. Let's say we have a three node InnoDB cluster setup, application connected to MySQL routers. If the application connected to red white pod of MySQL routers, it will be connected to the primary. If the primary node fails, then automatically one of the secondary will take over the primary roles. And application does no need to knows about changing on database topology, okay? Because MySQL routers will redirect the connections to the new primary node. And once the fail node comes back, it will automatically healing and automatically joined back to the cluster. It will do incremental recovery or distributed recovery. Once that recovery completed, it will be online as secondary node in InnoDB cluster. So it is really, really automated and highly available. And start from MySQL routers 8.2. We add additional transparent connections, red split. The application does not need to choose whether connected to primary or connected to secondary. Okay, application need to connect, just need to connect to red white split pod. And depending on the statement, if the statement is DDL or DML, it will be executed on primary. If the statement is query only, then it will be executed in secondary node. So it is possible for OLTP application to split between red white and red only, red, DML, DDL with query only. The best practice is for OLTP, short query, short transactions, we need to think that all queries and transaction need to be happened on primary. But in certain cases, maybe on certain design, we might want to split the query okay to be executed on secondary. But the secondary node is not always getting updated at the same time as the primary. And we need to understand that sometimes a query on secondary node does not give us the new value that already changed in primary. In order to prevent that, we need to set group replication consistency equal to before. By setting to before any query to secondary node where the data is already changed on primary, the group itself decide that data is changed and commit. But the secondary has not commit that data, the query will be still waiting for the data to be completely committed to the secondary. But it is good because what? Because we don't want to change the old data that already changed. So we can do this set group replication consistency equal to before either at station level, global level or we can set permanently using set versus. However, sometimes during the operations, we are dealing with replication lag on one or two secondary due to server utilization and so on. So we need to monitor the replication's lag. If the replication lag is too many, too far away, let's say 30 minutes, 15 minutes and so on, waiting the query that we throw to secondary node and need to wait for that 15 minutes, I don't think that's possible for OLTP application. So what we can do over here is to prevent routers to route the read-only traffic to that secondary by using, again, MySQL shell admin API to put a tagging. We can put one of the secondary nodes that lagging very badly as hidden. So we can monitor this. Okay, once this lagging servers able to catch up, we can remove the tagging so that MySQL will start to continue to show the read-only traffic to that servers. So in order to make sure that InnoDB cluster can able to sustain against long run queries and heavy query load, we can consider to add InnoDB cluster read replicas. The read replicas basically does not join inside a group replication. So we can have MySQL routers, configure MySQL routers for any traffic, read-only traffic to go to the read replicas. And between read replicas and the InnoDB cluster, there will be a synchronous replication. And this is managed by the InnoDB clusters itself. If we show InnoDB cluster status, this read replicas will be inside as a child servers, either on primary or secondary, depending on what we want. Some users basically wants to have delayed replications. What is delayed replication? Delayed replication is simple. Delayed replication is a read replicas, but we decide to delay the commit point. For example, for the last two hours, don't commit. Reasons why? Because if we have a logical corruption, when somebody actually truncate the customer data, for example, the truncate will be propagated, will be replicated across. So in order to avoid restoration, complete restoration of the InnoDB clusters, maybe it is good if we have a delayed replication because the data is still there. Until the next two hours, then the truncate will be executed in delayed replication. So we can put the data back logically. We can export the data and put back to the InnoDB clusters. Very simple. And then maybe run somewhere attack, that attacking our data inside the InnoDB clusters, maybe we can use InnoDB cluster, delayed replication to recover data logically. And then we have to expand the InnoDB clusters to work on multi-sites. For example, if we have two sites, production and non-production, disaster recovery, let's say, we can use a cluster set, and we also support admin API in our cluster set. We need to create one InnoDB clusters and contribute that as a cluster set. And then after that, we can create replica clusters on disaster recovery set. Very simple, very straightforward process. And we can use disaster recovery site or replica cluster as read-only workloads, like ETLs, dashboarding and so on, that does not require data to be guaranteed available over there. And the same thing, if we have situations where the primary node is failing, then there will be automatic failovers for the primary within the cluster and this automatic. And then to switch over to disaster recovery site, maybe using a traditional method, there will be a lot of things that we need to do, but in cluster set is quite simple because we just need to log in using MySQL shell to one of the nodes and issue command cluster set.set primary cluster to mention which cluster to be the next primary cluster. So there will be a role switch between primary cluster and replica cluster. And how about the application? Application does not need to know. What application need to know is router IP and port. That's all because the connections from application to MySQL routers will be redirected to the primary cluster. If there is a site failure, it's simple. Okay, we can do force primary clusters to promote the replica cluster in disaster recovery site to be the next, to be the primary cluster. And again, application does not need to know as well because what application need to do is just to connect to MySQL router. For achieving maximum high capability, we add additional configuration over here that we can split the networks between data land and the network that used by group application to communicate each other within the InnoDB cluster itself. So we can have network topology, something like this. MySQL routers and application running on application line when, while DB lands, we can split into two networks. One is data land, and one is, I call it as private land. We can configure group application local address and group application advertising point to use additional Phoenix inside the MySQL, okay? That pointing to another network segment. And this will improve high availability performance and stability as well. So in terms of InnoDB cluster set, okay? While we running multiple InnoDB clusters in our one, okay? This is the one routers, network routers to communicate between production site and replica site. As you see over here, we have three not InnoDB clusters on production, three not InnoDB clusters as a replica cluster in disaster recovery site, three replicas to help the expanding this InnoDB cluster for lab, for example, or for heavy query, for example, okay, that routers can do around Robins to run heavy query inside. Delay the replication on both side and so on. And additionally, we can use a MySQL enterprise thread pool that can help a cluster to run more scalable and stuff. And stable, yeah? So this is the summary. If you want to have like, you know, to set up MySQL asynchronous replication, just simple MySQL asynchronous replication, okay? We can choose to use MySQL as InnoDB replica set. It is basically just asynchronous replication with MySQL router support for connection transparency. But if we want to have zero RPO and automatic followers, we can choose to use InnoDB cluster because InnoDB clusters using group replication as HHA engine. And with MySQL shell admin API to easily provisioning to create, to scaling up, whatever it is, and MySQL routers for connection transparency. So if we want to extend InnoDB cluster to multi-site deployment, then we can, you know, create InnoDB cluster set on top of InnoDB clusters and we can start to build a replica cluster on a remote data center. Thank you. It's been a while since I've had to think about the details of multi-site, but yes, it's all familiar. Questions, anyone? Yes. Thank you. So you mentioned when there is an InnoDB cluster set, if one of the primary site fails, then we need to go and force the other one. Is that manual process so someone, DBA need to make intervention or it can be automated? Yeah. Since InnoDB cluster set replication is based on asynchronous replication and we only have two set, we can add that set is okay, but we need to have witness to, you know, to make a decisions whether to force quorum or whatever it is, to force, you know, primary clusters, failovers, whatever it is, right? But we don't have the witness. The witness is the DBA. So it is, but we believe that disaster recovery thing is not just a one-person decisions. It is IT decisions. Therefore, it is very difficult to build the automations for, I mean, for a set failover, but you know, it is possible if you can, you can create your own, okay, put it in a cloud or put it in third side to monitor the InnoDB clusters in production as well as on disaster recovery if something happens either to alert for the DBA to do force instance or to do some automations. It is possible, but it is not part of the product. Yeah, thank you. Is that perhaps a bit of a line between HA and DR? They're not quite the same. Is that crossing a line there from HA to DR? Yes. Sure enough. Oh. Could you go back to the slide? You are dealing with replication lag. Which one? Sorry? The slide asked about replication lag. Oh, replication lag, okay. So assume that I have a system that have high rise traffic and at the time I dealing with replication lag and my primary node is fail. So could InnoDB engine provide, how InnoDB engine provide data consistency for us when that incident happened? Thank you. Good question. So zero RP or data consistencies, okay, in InnoDB cluster is guarantees. There is no data loss. Even though you have three node InnoDB clusters, secondary one and secondary two all lagging, for example, and suddenly primary down, then the data is still on the secondary one and secondary two. That's not a problem, yeah. Because any commit point on the primary, it will require node two and node three to participate the forum as well. Meaning that actually they already receive the transactions. So there is no data loss in this case. Okay, thank you. Thank you. Right. Oh, I can probably do one more. Last call, oh, run at the back. Okay. No one at the back ever ask questions. That's great. Thank you. In once. Which slide? You mentioned. You mentioned ransomware in one slide. Sorry, can't hear you. Ransomware. Oh, ransomware, okay. Yeah, nice. Okay, yes, please. Okay, so ransomware attack is a cyber attack that does two things, right? The first is actually to steal your data, okay, and then broke your system, right? In InnoDB cluster, okay, what? In the database, right? If there is a ransomware attack, what ransomware is doing? It's basically to steal your data and broke your data, right? Corrupt your data either physically or logically, right? Okay, physically is fine because you have a backup and home, right? And you have maybe replication and zone. But logical corruption is something difficult because what? Because we have replication cost, okay? Anything that I do to drop your tables, critical tables, it will replicate it over, right? And it is very difficult to recover from that kind of failure. And then suddenly all broken, right? Because ransomware. So if you have a delayed replication somewhere and we protect the delayed replication in other segments with network firewall or all those things, it is possible that your application, data, that inside delayed replication will not get affected by a ransomware attack because even though the transaction logs already received by delayed replication, but it's not committed yet, waiting for the next two hours or one hour and so on. So that you can have your data clean over here. Okay, thank you. A critical question in ransomware is recovery time. Recovering in minutes instead of days is really valuable. So that's cool. Thank you so much. You're welcome. Thank you. Thank you. We now have a team that is sponsored by TeniMot. We're back at 11. Hope to see you then. Thank you. Thank you very much. Thank you, everybody. It's a pleasure to be here. I'm going to, unlike a lot of the other more deeply technical sessions, I'm just going to talk about everything that's going on with MySQL in the more recent time. As my Safe Harbor statement indicates, a lot of the things I'm going to talk about are going to be in preview, in beta, or have just come out. So please, before you say I'm going to download that and start using it, please read the state of what's going on in our announcements, our blogs, everything else. But to start with, I'd like to say that pretty much all of you have already interacted with MySQL today. If you guys have used a social media network, if you have booked a car or maybe gotten a house or a, sorry, well rented a house or a hotel room or something like that, odds are it's gone through a MySQL database. And of course, as developers, you probably know that we are the thing that powers most of the major open source platforms that are out there today. And we continue to do so. And that's one of the things we're very proud of. I'd like to talk to you about some of the future direction of MySQL, the product that you can download on your laptops, as well as the product that we are very proud of that we run on the major clouds. And it's called Heatwave. So Heatwave is a managed MySQL service. It's the MySQL that you're used to. It's that same interface, the same SQL interface. However, it's also got a number of interesting parts that for the most part only can exist in public cloud. Heatwave itself, as we originally introduced it, is an in-memory columnar storage cluster that you can attach to a MySQL server. It runs in OCI, the Oracle cloud, as well as in AWS and Azure. What we've added to that in the last couple of years, and one of the things that I'll be talking about is in-database machine learning. So you can do machine learning without having to use TensorFlow, without having to use Python or R right inside the process of MySQL. And I'll talk about why that's important. And also we have something that we call Lakeouch, which is that MySQL or Heatwave is capable of reaching out into the object store and grabbing CSV files, Parquet files, database backups, Avro files, and incorporating them into that in-memory columnar data store. So you can have half a petabyte of data that you are querying in real time across 64 nodes that are attached to your one MySQL database that you have up there. So it gives MySQL an enormous amount of scalability. So in addition to that, we've kind of changed the release model. We've introduced some new developer tools. We've introduced JavaScript into the engine. In short, MySQL, I'd like to talk to you about some of how that impacts you and some of the things that are coming up and it's a lot to cover. So the first thing, as I alluded to, is that we've changed the way that we released MySQL. A few years ago, we went to what we called the continuous innovation model to be able to roll out new features more quickly than every couple of years as we released new major increments. And that was great for a lot of the social media and a lot of the cutting edge companies, but it was really for, we have a lot of external MySQL and a lot of governments and they really like fast challenge. So what we've done is we've tried to give everybody the best of all of them. We have what we call the innovation releases and that's these little boxes that you see rolling up. The innovation releases are the latest and greatest. We support them until the next innovation release comes out and we put all the new things in there. So some of the things that I'm going to be talking about when I talk about architecture data time, when I talk about JavaScript sort of procedures, things like that, those are going to pop up in these innovation releases. And then every two years, we will come out with the long-term support of this if you use a bunch of new American labels right now. But so you have to decide which one is right for you. So basically, when you're developing, when your application is in that much development, you should probably be going with the innovation release. You should probably be getting a very data status. But then when you go into production, when you are not going to be changing your application, you don't want your application to change everything. So for that, we have the LTS release and the LTS release for its entire eight-year cycle that will be a standard or full-time support cycle. For eight years, we will get nothing but our bug fixes or security patches and none of the behavior changes. We won't introduce new keywords. We want to create new functions or change the way that functions work. That may need to go get some of the performance increments that we put into the innovation releases, but you get some of those. Most people agree with that. Now, while you're doing that development, you might have heard how not to talk about the MySQL shell, which is our new command line client. It is very exciting. And we've actually integrated it into VS Code as a VS Code plugin. So you can use MySQL shell, which replaces the old MySQL command line. You can use MySQL shell as a command line client, or you can have it be something that you have built into VS Code. So you can have it be the way that you are inline, integrate your regular SQL code, inline with your JavaScript or your Python or your C++, and you can still edit and install the syntax highlighting and still be able to execute against the database. But then the other thing you can do inside of VS Code is that actually, if you're using a cloud operating way with you, VS Code will actually create a massive array of it up in the cloud so that you can get through to a database that has no public ideas by virtue of using your cloud credentials to log into the cloud, create just kind of an invisible, massive poster that then it's able to connect to your MySQL alias. So it's a much more secure way to develop from your laptop onto an offspring. And then, of course, once you're developing, you can do your administration of that server. You can see the server status. You can see your network status and all the things that are going on inside it from your laptop, from inside of VS Code. So it's a full-fledged development platform. And again, VS Code, this plug-in is open source. And as it is, MySQL show. So the next thing I'd like to talk about is that we've added a new feature. Again, this start-up will be soon coming to the download versions of MySQL where you can actually write stored procedures in JavaScript. So what that means is that these JavaScript stored procedures have become a native part of MySQL. So you can use them inside of the VS Code that is your org-wise and things like that. So now you can use to embed JavaScript into your SQL. And then when you're writing these things, you can embed them into your JavaScript. So when you're writing these stored routines, you can write them and then add in what you want them to do inside of SQL. So you can have this way of combining and proceed through a very, very quick way. The way that we do that is with Oracle's Growlian. So Growlian is actually running inside the processes of MySQL server. And it is able to. It's got a lot of full-fledged components. And so if you only want to be able to do for a sake of a space right now in JavaScript, you actually can compile that. So now, going from development into the clouds themselves, MySQL heat waves. So again, this is just kind of our definition of the MySQL heat wave database service. We are the people who grow and maintain MySQL. Whenever you see MySQL with consortia database, they're talking about the thing that Oracle produces. Everybody else thinks they're MySQL with consortia because they're an evening melody. So we have a GPL database that we use to create cloud offers. So heat wave is our version of MySQL that runs as managed service. So fully managed, you don't have to do anything to administer it, to back it up, to increment the versions, to do a plasticity package. We manage all of that for you inside of Oracle, as well as the Amazon thought, any members. And this is all the latest version of MySQL, obviously. And then it has all the things that we have, that I've been talking about, heat wave versus which, is this columnar memory analytics database. So what this is, is you can have your MySQL database, and then you can have from one to 64 in memory nodes that you just spin up on demand. So if you've got a lot of data, and you do a lot of processing, if you only do your analytic processing in the last week or month, you don't have to have a massive database that is always existing in your own container. In the cloud, you have your regular MySQL transaction database, and then when you're going to do your analytic processing, you spin up one to 64 of these nodes, and each one of these can be up there in size as well. So you can have half a petal of data in memory, in the cloud, and you can do your query across that. And if that sounds fast, it's because of this. By our benchmarks, which you can read to you, these are standard TCPH benchmarks, using 500 terabytes of data. And these are the results that we get across different database systems. So it is likely fast. We think it's the fastest way to do analytic processing. So then in addition to being able to do these analytics, what else do we find that people want in formal cloud data? So one of these is, so the first one obviously, is being able to do large, large-scale analytics and to be able to speed that up on demand without having to, and as I said, spend a lot of money on a massive instance that just holds the system. The next one is to leverage machine learning without having to move your data into place and without having to hire a large team of data scientists. Not doing wrong with a large team of data scientists, but that shouldn't be the gateway to doing machine learning. We've tried to democratize the process. The next way, the next thing we want to do is to be able to use your data wherever it is. So you can reach into object stores, as I said, in the lake house, and you can pull out that structure data for use for analytics, so your CSV files, your database backups, things like that, you can pull out for use through analytics. And actually, as I'll talk about in a couple of minutes, if you have data that you want to do AI summarizations on, you can actually read that as well. If you read unstructured data, again, not into your schema, you can read unstructured data into the database for use by dragging and other things with large data. And again, just put this on a slide in case anybody missed it. It doesn't run on multiple objects. Heat wave is, if we develop it at Oracle, but we develop it, it is developed and optimized for multiple objects. So a little bit about machine learning. So we have AutoML, the idea of which is that if there's a lot of steps, the data scientists in the audience are going to recognize that machine learning is not lost. Especially if you want to be able to explain the results you create, you need to be able to understand exactly what your code is doing and exactly where the data came from and mentioned. And there's a lot of different ways to do that. Just if you're not familiar with machine learning, the way that I always think of this is that let's say that we run back. A customer comes in and you want to determine, everything about letting you know all your other customers, you want to know, should I get this person along? That's a classification. Machine learning can be. As long as you have that data in mind as well, we can show you how to do that relatively quickly. Now then, once you've decided you're going to pay the loan, you say, okay, well, how much is that loan going to be for? That's a regression problem. And machine learning can also do that. And then you're going to want to give them the schedule. How soon can they pay it back and how much do they have to pay? That is a time series before customer selection. Machine learning can also do that. It can still do all of this things. And then once you've done that, you want to make sure, oh wait a minute, is all this integration correct? Does anything look like it's completely out of order? Well, that's an anomaly detection problem. And again, you can find that. Then they put it into a zero on the value of their house. Then they not put enough zeroes of your salary or something that looks like it doesn't have the rest of the data. Machine learning can tell you that as well. And then finally, once you have the successful customer, you can say, well, what other kind of products like they want, well, that's a recommender solution. That is also something that we need to be sure of. And again, you can do this, not just with all the information that you have inside of your transactional database, but it can reach out into, like I said, if you have another database back on that has a prior data from other nodes that you've given or other products or something like that, you can pull that into memory, into your online heat wave nodes and you can process that as well. And I realize this is kind of a circuit of talks. If you don't understand the details, many of this is what I have. But we've got a lot of speakers here and I'm not completely sure what I'm talking about. But as I said, we're democratizing machine learning. And the way to try it is that this is all you have. All of our machine learning inside of a heat wave is controlled by just these, these sort of procedures. You learn the parameters of those and you can do all the things I just said. As long as you've got a name. And then since we went ahead and built a machine learning engine inside a heat wave, we also, like you said before, ML Autopilot is the system by which our machine learning engine watches your queries, watches out the optimizer listings and then improves upon them. You can also ask Autopilot, should I have, based on what you've seen of how I'm using my database, 24 indexes, have I written indexes and how I don't need Autopilot to tell you that as well? Autopilot can also tell you in the cloud, would I benefit from moving to a larger shape? Can I use more CPUs or am I open? Can I shrink down to a smaller shape and still get exactly the same performance? That's also something that's also one of the benefits of having machine learning built in. So again, one of the things we're working on and I would encourage you if you look at our open cloud world tour, our head engineering and equipment borrowable has been doing talks extensively but we're putting it in terms of artificial intelligence. But the basics is that, as I said, heat wave is late-cast, it reaches things out of objects. So you can really unstructure data and then ask the generator of AI that's leaked to your heat wave database to suffer. You can also take that same generator of AI that is leaked and you can use it as a rag system which we go off into generation. And it's a lot to summarize, but basically what we can do is, if you have data on your recent press releases for a time and you want to ask an LLM, hey, can you generate an answer? How somebody can give you this brand new feature of my SQL, the LLM that it's about years ago. So it might say, I don't know what you're talking about. So you can point it to these PDS or documents that have your recent press releases and you can say, okay, if using these, now that we mentioned the question, that's the basis. It's a little more complex than that, but is that what you're talking about? And then you can do all that with natural language. You can actually ask the question and then add it to the literature that is also a natural language. And so the key to joining all this is as I said, we're introducing a new data type that is called a vector. A vector is kind of a semantically meaningful numerical representation of a statement or a word. And with vectors, you can do semantic search of documents. So you can return just the relative parts of documents and then use that again to feed the LLM. So you may be wondering here if you're really a LLM to the right, I might be saying, well, why do I want that in the database? The reason is because if you have a machine learning engine already in your database, then you're already very capable of narrowing down that search space. You're not asking a completely open-ended question but you're basically using a machine learning engine to narrow down what they're asking the LLM to do. And then, because it's a database, you can provide data. So this one's going to be something that's very important to have all connected into one place and we're going to build more as time goes on in these cities. So just to wrap up, and then I'll try to take some questions. This is my field. It is all based on open-source database that you know and love and it has several parts to it. It is obviously a transactional processing, maybe the thing that processes most transactions in the world. It also does analytics up to the data by the scale. It has a machine learning engine inside it and that machine learning engine can also be... Thank you very much. Before what I said today is the next speaker in the room. I'm not really able to set up, but we'll just put it out. Right, any questions? I see that the question is on the cloud. I didn't mean to say you mentioned Google Cloud Actual. Is there a plan to iterate or do you want to support it? As of today, there is none. I see that the question is out from most of us. So if there are many contractors for that configuration which we do, let's make some recommendations. Just to have the reputation. Yeah, because I see that maybe there are a lot of improvements. So I'm continuing on like I said earlier, I will iterate and keep in these plans or recommendations Oh yeah, yeah, yeah. Autopilot can give you like I said, it does index and it will evaluate your index. It evaluates and adapts the tones its own optimizer and it will also tell you what shape is ideal as well as doing a number of things Sure. One more question. Sure, of course. Can we talk about about something that really can change only if you start what is that? What size? What size? What size? There's a lot of things you can scale. We've got some engineers who can talk about the specifics of what is easy for us to work. Being able to scale your number of tasks is a little bit challenging in the cloud once you've already set up your system. However, actually changing the shape being able to say that I want to go from having two processors to a 16 that you can do if you want to change my brand name to have things like that obviously in June and May. What do we do in the month? So Autopilot that's the only work you can do when it's done in the kick way is we use Autopilot and Autopilot and Autopilot and all the machine learning things. Those are part of the key to it. Yeah, that's part of the key. We've got music that it doesn't use in the forum. Yes, you can replicate your scheme of non-premise into the cloud and use Autopilot on it or just to have some of your database restored into the kickway and do whatever you want to do and then obviously take the recommendations that it gives you and turn them into records. Thank you all. We've got a bit of a problem. Okay, so next up we have Persupong who is a principal MySQL solution. Thank you very much. Hello. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. This is a show how to improve the performance with the future type and the future You can add in the life amount We can do more things about the database and the after before we saw it we have the work of the technique and the software and it should be just a work with that. This is how to see the data between the DVD and developer's data. Before we have the build code we have the one technique with the MySQL and the software with the template on board on each technology and when the developer offers the data with MySQL it is called you have to deploy the data to deploy the data and the relational the relational inside of the MySQL you can deploy the code on board on each technology and the result of the code and the result of the application you can select the data with the relational and get the result with the relational and you can get the data with the relational output with the data and also we can get the data from this time and get the result with the relational and this time in the same way when you connect with the myself technique in one database. After we put more technology inside of MySQL we can have the power of MySQL to access the data with the relational cable and similarly we can make MySQL with the MySQL and we put some technology in the application you can deploy it with the data for the MySQL and the next one we can do the for the for the for the for the for the for the if we have the and the for the the the the the the the the the the the the the and into the whole of the population, you can include many rejections for the data which I offer in the last one you have made and it is simple, with the one that I made, by the limited backup and restore with the one only one that I made. This is a way for get the data result from the board of technology. Some applications will like to get the data in the JSON format. You can send the selection to the traditional keyboard and put the function of the JSON object. The logic of the selection will be get the data in the JSON folder and application can use the most of the JSON, the most useful asset data like this. And the next one when you need to buy the data between the traditional and document and send only one statement for the traditional between the table and the data table and you get the data between the board of technology. Sometimes in our customer, they put that we have a table with how we do the marketing, with marketing that have to customer with web architecture in the form of digital architecture. We can do a relational table with the table that keep the documentation like this. This is a new way to do marketing of technology. If we don't have this technology, we need to implement the digital tool for get the data of the board of technology and do relational with the other table. This is a feature for truly the data. Sometimes we would like to get the top end of the letter long, but the data can't form the digital document and you can use the select length of the selection statement for the length of the table of the letter long. So we can put the top end of the letter long, how we want it the number one and top end of the letter long. You can get the data of the data table and get the length of the statement with the S2L. We can combine all the technology, accept the data, get the document and we use the statement for get the length of the statement. We can combine both of them to work together. On this way, we don't need to help to adopt the technology and develop the technology. It's easy to accept the data when you have a very high of the data, high of the data. This is the easy way to accept the data with the S2L and we use the statement for get the length of the statement. This is another one we have. We can go and work with my S2L. You can download the extension and can work together with the S2L. Before we have this one, we have to work with IDE to volume up the application and when you need to understand the structure of the data, you have to go to another. On this way, we can download the extension and get the S2L for the S2L and the data table with the length of the statement. Then we understand the structure of the data and then we can use the length of the statement for get the length of the statement and we come back to the development mode. We can get the data for the length of the statement that we get from the S2L mode and then we come back to show the table again with the S2L mode and we can do some structure on the table on the S2L mode. Then we can come back to get the data from the S2L mode and work with the S2L mode. Before I have some information to share with you, actually on YouTube, we have my S2L channel. Before we have a collection of the video for you learning in the future, you can go down and select some video you'd like to study in the short one. Now, of course, you can get the view of my S2L for that thing and you can click and you can use the S2L itself to see on my S2L mode. And we have my S2L.com for access to any information about the Microsoft software. My S2L is the partition, character and program. We have the problem of finding locations that you have to walk a single stage. So, nobody here has had to work with the separated relational environment before. I kind of use this in the past, but I think that would be the point. Do I keep on through? So that being the case, we have a total of one break for an hour next. Okay, next up, we have the first of two talks on click-ass. First, I think I'm going to talk just a little bit about click-ass rules. I can continue in this one. Hello everyone. Thanks for joining the talk today. We have talked about click-ass. And real-time, I don't think it's the idea that you see, and for how we're going to leverage the Microsoft solution. My name is Alken. I'm from Istanbul, Turkey, all the way. And you can click me if you are thinking and if you are interested. I've previously worked in the services company. I've worked at 415. My previous life, I was in Israel. We went to work in a lot of operations like we did in the technical area. And you can connect with me if you have any questions. And the next other people that I have connections with, please feel free to send me in. This is where I work. This is where I work. Some people are the sister, some are the sister. It's part of my shift. And backed by funding. And we focus on infrastructure, the game of click-ass, and the support of services with the console. What's the click-ass anyway? You're on the... Can you say, what? You're going to learn about click-ass soon? Or maybe you'll see what's going on? Yeah, okay. Never heard of it. Never heard of click-ass before? You never heard of click-ass before? You know what I'm thinking. We'll talk about that. There are some recommendations which I want to touch with at the end. I have a great time doing it. I'm happy that I'm only doing it to break the ice. My question for today is, what is the time for the process of turning a ceiling vessel towards the wind? Not towards the wind, because the ceiling vessel cannot go up again. Good question. Yeah, it's all taken. It's all called the driving, but we never see it coming up. It's simply all the way. It's all about the wind situation. We also want to follow which you can actually do one, and here we want, start with four sides, start with four sides. That's what you can actually do, like against the wind. Take us back to the water. You know what I'm talking about? There's a lot of resources. Actually, two-double license is important. There's a lot of licenses on the ground. What they do with their system is the right form, and what goes for all of them. It is intended for non-traditional data-raising systems, for field time analytics, and doing much more beyond the traditional system. We'll take into those. The other part of the computer, it's a modern storage model, which I've been doing this. This is the bicycle track, but yesterday it was the most basic. It was a road-oriented, relational database, and an assistant to it, which we are familiar with. They are climate technologies for transaction workloads. And normally, you have a customer table, and access data. It has benefits for the customer, but it's highly recommended for the track theory, consistency, availability, and partition. So it actually enables easy aesthetic of asset properties, and it's relevant. When it comes to all of them, we have a call on the system, which is energy, and they seem to be working faster and better for the workflows and for AI. Now it is for machine learning, AI, and we actually did it on the halfway, which covered the genetics of the four-pointed system. Each call is sacrifice, and then it seems to be the goal of the system. You still make a connection, like you do it on my signal, and you select it. It will still return the data stored in the process. You select the goal-based system that you want to use in the image, and you will need to index it, and you will use raw ideas to access the data, and then you need records to do it, but in the clickouts it has a call system for searching, it wants the calls, just it asks the calls that you need. So again, in all our calls, you may have columns that are not for group-based systems, which we are teaching right now, a table with hundreds of calls in it, and then we can select one of the calls. There are various regional representations of these, but I'll go with the whole section, so actually, if you want to call more, you can still select the previous my signal interface, and so this is another thing with the clickouts to my signal, my signal wire protocol, and my signal interface is very similar. So at the time of the implementation of the clickouts that was ever before in history, there's a lot of similarities how you access the clickouts server, very similar to my signal, so you don't actually have to be in front of the view, so make a connection from a CLI, and you can scan it as well. So this is another kind of sound right here, but a lot of people don't have the system that they're only able to access letters and functions and things like that, and it's not very good for programming or for automation, so clickouts enables us this command line interface, looking like any other GDNS and use base like that. So we said the analytics, we talked about the elements in all of all that stuff, when it comes to data that can be accessed immediately, it makes a huge difference, and we do that with leveraging the task for getting strategic representatives competitive advantage including efficiency and customer experience, of course this will only turn very early. So let's take a gaming example, you have a game and a little bit of time difference, say you have a game and you play it online and the task for whether you're on a PlayStation or a mobile you can use various technologies, and actually any advantage of what is happening in the game, what the user activities are suggesting you to buy something. So I play Call of Duty on PlayStation, but every time I do anything I suggest you to buy something and this is only possible with this system like this, it turns into a very basic otherwise you will have a whole small base with one gate and stuff like that until the end and you get out. So think about this way, there are many examples like that whether you're doing a crypto or something online, the last thing that you do from your e-commerce site or from an online app store in the latest suggestions and those are only possible by the time you're done with this system. And I opened the e-commerce site and we used to have we didn't have the process at the time they were diverging, but we didn't want them on the need. What would happen is you would have the data that would run ETR online and in the next morning ETR would go into the TV system with the CDC to change the structure and then it would run your report and that suggestion engine would go into the user's robot. In this case we can make this possible in real time in the sense that in real real time and today's technology allows us to get to sub minutes in real time in the second. So when you have to click out how you can do this open source you can basically do it on your own server you can try but when you try click out, you'll get a couple of things out of the box one of them is efficient compression so ETR is more efficient compression as before basically say you have a data in my signal across press you dump it into the click out and you get like a gigabyte in the file because it will compress automatically or store it in the file system the same data this is one of the surprises people actually see this, we tell them how much data are you gaining in your my signal in so long we have about two terabytes of it in a lot of order and then there's another one this is going to be more like a less than a terabyte because once we know the click out we don't need to do it but actually if you can use the ETR and need a little bit around data type and data call type then you can actually get even higher compression issues with it vector algebraic execution is another one it actually has a back system in turn engine actually is very optimized for this type so you get cdue cdue in the cloud is not that expensive it's compared to IO2.net for all the storage this is the thing if you're running a machine learning work load or hand reaching queue work load, you don't take it back just what you're running that will save you some money compared to cdue efficiency so skillet is built for example charging and replication which is difficult to get out of the bus you can customize charging based on your index type and then you can split up the work load on a smaller load at the same time and also allow you to scale and also there's an option for replication you don't super count for hiring a lot of work load because you might be a petabyte scale dataset that may not be suitable but if there's some is this requirement some is divided by the development or something like that there is a replication very rich functionality a built-in speciality application more of this application the functions are heavy and costly for the other application so Klikauz takes advantage of these functions where you actually draw queries to get a large dataset to get a separate dataset it also has support for just special session yesterday at Postgres GIS nice class for GIS also but you have that also materialized bits and that is single so these are very important things when you actually have a dataset speaking in and you have materializing on that what you want to select and then you create a new database with a large database that you can actually type in the draw and then there's another system that can actually create a dashboard or a user group things like that only possible with this system if you want to have a nice info you would have a problem with the user will be there's a very extensive patchbox which I will not go into this one because we don't have much time Klikauz has its own patchbox running against any other database including RTDFS and other RTDFS and the benchmark is we are running 15 patchbox related in that device every server, every version every platform we can actually have an access to compare with others we are actually ahead of the curve we only think that these Klikauz patchbox you expect thousands or millions of dollars on the cloud and you can get some of the RTDFS RTDFS benchmarks you will actually have news but those numbers are so high that no one can do that or even run and control the amount of cloud consumption if anyone wants to talk about benchmarks I have seen the numbers drove so exponentially after certain IOPS and storage allocation in the cloud it is not feasible to block the benchmark yet you run a business to get proper logic so we can have a special method Klikauz has an engine terminal similar to MySQL so it's that they are pluggable engines and they are engines that are built one of them is a merge tree merge tree works like a merge tree and that's the most common one and merge tree engine terminal has dedicated engines for certain operations like a refugee merge tree that tells the application for your tables and you define that like a MySQL say engine merge tree and then it will actually go into that engine there's a more likely engine it's based on more products to be stored in the cloud so you can actually create a table with this type of engine and there are integration engines that I will show how you can actually coordinate the source tables so the integration engines of course are MySQL full-stressed and non-hardy DNS or your own source databases where you would push the data into Klikauz but also there are some of the connectors like OVBC which are connected to the input and SQL server there's CACA which I will give some examples of that the last TV engine for MySQL it's an MSM engine that is also used in large hyperspeeders like Facebook and you can actually intersect that data and there's other this is like in summary there's like much more to that that you can make integration so if there is data if you add this to the CACA there's several integrations on MySQL very very high level I'm not going to too much detail but we can talk later on so we have a couple of things to select but I don't change that customer table is not a good example for analytics because it would be a property reference table for owners and ids and everything that the customer does but say you have a customer table with a customer ID and an extension and you actually want to take that table and put that in Klikauz you click that table in MySQL so you connect to that table directly from the source table within the Klikauz interface and once you create that you can also load that data by selecting from that table basically you have a staging table and basically you can get that table imported into Klikauz in like three steps if you want to test something say you have a table once you get that data it's not interested to the source so source can stay in production source can be sourced and you're not touching it then you can play around and see what you can do with it and just to give you a quick example that table this is for my laptop by the way it's a super popular that table customer table for my laptop Klikauz by default I can do it the only thing I did was select it and insert it into Klikauz table and turn it into like 30 megabytes so you can do this you can get that approach of the same as the form compression even about the format so think about what you can do in production so let's talk about the idea so analytics let's talk about this when you want to put the data from your OLTP platform you want to see your OLTP because then you can actually have an advantage of it for example we gave you customers orders items you want to put customers orders items if you want to be able with the multiple properties you won't be able to put the amount they spent the activities they would carry things like that so the other thing is a smart schema you have a customer and then you have a surround table so very useful for the Klikauz implementation you can import the historic data that is for example not used in the so the idea the main idea is getting into the systems is to make your production environment between smaller footprint and faster serving for your OLTP in your OLTP so you can actually get time series data and log data that is not necessarily sitting in your office for my signal should be here and pointed to this database that will make your all the talk that today was the idea to see big over disaster recovery would be much easier if you have all that old data in Klikauz instead of pressing door hospice so those are the ideas all of these are the same thing we are not having something that is not needed for the crime of government you know anything over 7 years 5 years I simply had access to get some analytics but I don't want to keep it on my production put it off Klikauz data link and traffic solution and also there are a lot of different things that are going on yes we collect all the data there are so many events there are so many triggers activities being captured and how are we going to store them this is the way that you store Klikauz system like this will be super useful there links at the end of the slide that will open to the industry how you can actually store them to the information and to the user useful activity so for the real time analytics there is a product called the BASIL I don't know if anyone learned about it but the BASIL connector is an amazing open source framework that can connect most of the databases so these are the questions but the BASIL supports other databases also and with the BASIL you can basically reach the end database with that framework you can receive the information and then find it for something else what we do is we take that and then find it to Kafka Kafka is another good platform for streaming data and it has its own properties to allow the data to be stored replicated and contained and then you can you can either use Kafka or you can get a connector another connector which we call this is the SIG character there are no SIG characters from Kafka we have one of our own which is more customizable for our own needs and then we will track that and this will be the high level architecture of the connector this is something we also work on both open source command lines and also in Kafka we integrate this technique to activists and I worked on about six ones to get the proper data connected and doubted in the Kafka so the BASIL is a one good option it can take a snapshot of the old source data so when you connect it to say my signal table you have older stable you connect and it takes a snapshot and it knows where it is once you start receiving that data it will actually continue taking that once that came in once you start to see the change it knows where the changes are it will push towards Kafka Kafka is a good staging area where you can actually stick down the data take the one that you want and get it 5.2 this thing connected to 5.2 so it could be anything you could go you could basically modify customers as you need and push it to Kafka this is actually around very nice if you actually have the right property moreover we turn this this technique into a Kafka so the idea is the historical data sets come in and then you make a connection from the databases that you actually have on the other side for the old CPU or other storage and then you sustain the Kafka which is the reverse proxy to connect from make a connection and then archive and then hit it hitting like this it could be anything and then go for the analytics and then you can stop that data basically you can attach 5.2 in the score or the monitoring like visualization and then you actually have it so this is something we've been working on this will be the open source hopefully by the end of Q2 another open source tool one will be the sync connector the other one will be here so how do I get started with Klikas on your local, on your PC you don't need the cloud, you don't need credits just install the Klikas and then you can do Klikas connect and then basically perform some data and start playing around I like glue on my Mac you can just install Klikas it will be very difficult to install and I think we are running out of time before I complete this talk I want to thank Posasia organizers and this event, thank you for coming in how to contribute coming into like this you can actually get a link from here there are a bunch of things you can do and the Kubernetes they say chat could carry more you don't have to do that maybe but you can actually have sharing and spreading the word with others this is my first book some of the concepts of now working country today's talk about the bicycle is covered over here JSON functions, replication high-end you can see monitoring some of the application code in this book there about 200 recipes this is my upcoming book so we have good coverage of our database design and more on that also coverage of the scale of the options which you can actually how we can get there thank you thank you so much for recording today this party in the room so you are the one who put the room this is for me this content is in your team now we are before lunch so your session actually starts now while you are setting up to do a little bit of Q&A do we have any questions to talk about aha not desirable teaching ways to be housed and instead of connecting to an update most desirable issue I would want to talk about the cost so I think one of the next talk is about the cookhouse keeper so originally he is like our cookhouse keeper originally a cookhouse keeper for cluster management so he didn't have a built-in cluster management system so that was one I think we take you care there are the tooling because it's open source the tooling about where we've been working on the thing that was the backup for example we built a backup suite the other one was all my schema changes so there is no such thing as all my schema changes in analytics growth it's required so the source changes and then the destination when you build a system like that and then you stream data the destination does not change and how do you do that or why we don't need that because if my case is getting real-time or near real-time if I'm having a delta that's not real-time so those are the things that happen but out of the box cookhouse does what it needs to do of course there are things that can be improved but what's going on in the cookhouse world in general is the machine learning AI identification any other questions? no unfortunately because between us and lunch there is a divide between two so for me your point of view you in fact now have 12 minutes to talk to us about coordination of the systems I'm in the cookhouse and the question is already important because it's a part of the traditional in this talk I'm talking about a particular component of the cookhouse system in the next 12 years and it's not about the cookhouse why we do the cookhouse and how we do it in the cookhouse you have 12 minutes 12 minutes and quickly then we look at the as you know any distributed system it's a kind of coordination and keep it as it is plus 1 so keep it because the cookhouse is a kind of coordination in a distributed system we need some way that the state of the cookhouse is located across the cookhouse in this example if you are a cookhouse and a variable you need to be aware of the cookhouse and when you are updating the cookhouse any operation of the cookhouse before the cookhouse you need to make sure the cookhouse is reserved from any of the cookhouse so this is basically what nation of cookhouse is we have a cookhouse protocol it's well known in the main square then the entire kitchen is used and most of the modern it is easier and you can press on the protocol you might have got it used by the city and it was kept by using the same you can ask why did we need this protocol in the house earlier because of the distributed state of the because of its distributed nature there is no particular it is known every month of winter you can connect to any of the cookhouses by the operation because these are back operations you need to somehow replicate it because of the cookhouse and that's where the cookhouse comes into the future but from that there are a lot of intelligence of the cookhouse when the back operation happens a dog number and this dog number again is kept by the cookhouse and you can interact with the cookhouse by merging and you might get it before the contract something similar required then there is the distributed which is like one plus two twenty you run it at two twenty and it turns out that the cookhouse is distributed that also uses the cookhouse you can ask why because the cookhouse is distributed by the one more actually because the cookhouse is distributed by the cookhouse it is like the cookhouse is distributed except in the bodies of the cookhouse it is very bigger and bigger sooner the cookhouse is actually because the cookinghouse is distributed by the cookhouse it is very important it needs to be as the cookhouse let it be so we have to write about the cookhouse because it depends on the cookhouse but there is few issues in the cookhouse people that the cookhouse is distributed not much happening in the cookhouse but there are issues which the customer uses to create the next animal so each is a design a decision to prepare each animal which doesn't go after the design of the cookhouse then because it is taken in the channel coming into the picture there are few other things like it doesn't comment a lot and if there are too many likes it doesn't complicate the action of the cookhouse it won't make the comment a lot and it can check some things that's why we have to write our own implementation of the cookhouse because the cookhouse is very important and it's really not like the cookhouse you can do the cookhouse so for any other cookhouse you can actually replace your cookhouse and because it can be seen it's really very fast and we use it on the cookhouse and it provides much less memory and you just you just see you might not try this testing tool which injects oil so if you want to even have different ranks if you do a rank any case after the rank you always need that rank normally it will be but one of the requirements we need to have for customers to be able to communicate with the cookhouse you can never create a cookhouse but it is a tool which can provide the customers with the data and we have a basic rank also so anybody can connect with the cookhouse people can try it and one of the things that you can do is running the cookhouse in embedded mode you can run it in embedded mode so you don't need to run the cookhouse in embedded mode so at a very high level just like any other coordination system it will keep on coming here it follows because we run 3 node cluster you have 2 polygons and with a rough consensus the client needs to connect with the cookhouse the next thing you have to know if a cluster goes to the leader but the rate is the same for any of the load in this example the request from the client is going to a follower then it is basically converted to the leader so in this example you can see we are creating a 10 node because people which is going to a follower the request makes forward it to the leader basically the leader writes the end log and the end log request those followers basically get lost in time if a cluster is coming here it will keep a cluster sure that the data is between the data and it is not lost it remains in that data so at a very high level if a cluster is provided dedicated 3 node cluster in the 3 node cluster the reason why you can't within the cluster is that you get to add more food The right seems to be very important for the right students, so we don't necessarily believe in that. We actually provide what we want to do to meet the source of $50 million at the end of this. Another thing we ensure is that people with a self-contradiction strike can achieve that. So we ensure that the risk at the end of this is so that any right to press is marked with that. And we also recently added a new feature which shows that people with higher priority to serve with higher priority to the server at the end of this. A very simple etiquette change like this. Three of those can be served horizontally and vertically. Let's quickly look at what are the challenges we should have. One is the right in any coordination system because we need to provide a linear activity. All the connection which is happening from all the servers needs to end up in a single queue. This is one of the ways that even if it's the end of the cluster, what you can do, you might not get that problem. So one of the things which is coming, if you see where this is going, you might have some hope. What is happening with the server when you give it a different track, it's going to be done in the middle of the cluster. So in each of these tracks, you can use the much better. Looking back, I'd like to say thank you for coming to this session and thank you for your opportunity to speak. It's an impressive adaptation to a compressed schedule. Thank you. We actually have a plan for it. I have a question for you. We will try trying to benchmark some gates. There are a lot of limitations with the single-rack, the false key parameters coming from the server. What's the benefit for the gates in the spectrum? Thank you for the question. Basically, this might be as a good experience to give a vision, but we just need to benchmark those people and let me give a comment. Is it published somewhere? Yes. Sorry, I forgot to leave the blog here. Thank you again. We are now bringing lunch. We are back in the middle of the afternoon. Thank you. Thank you.