 Facebook's data warehouse, but it's still pretty large. And latency has to be low enough because you have large cache in front of MySQL, but if cache misses, then queries are coming to MySQL. Then if MySQL serving a script is slow, then it suffers people's experiences on Facebook, which is pretty bad. So the MySQL database has to be fast enough for the main database, and our operations are automated. So hardware types, so we use flash SSDs on hardware, and it's pretty space bound. So this is the biggest motivation of why we started MyLogs project. And it's pretty much space bound. So CPU usage is pretty moderate. So it's not that CPU bound, especially on threads. So these are the best backgrounds. So we are in the process of migrating MyLogs. So the points are broken in this laptop, but technical challenges. So at the initial migration, we need to find out the optimal MyLogs, the rock studies, family options, and we needed to create MyLogs instance with that data because there are already production services deployed in the AD. So we needed to deploy MyLogs so without stopping existing services. And creating MyLogs tables within short period of time is needed because our database is petabytes. So if it takes a long time to just migrate to one gigabyte, then it will take years to migrate to MyLogs, which is not realistic. And another important point of the MyLogs requirement is we don't use data or don't return long results to the client. So how to verify that? So verifying data consistency between existing MyLogs and the new MyLogs. So this verification was pretty important. After deploying it, so continuously monitor the resource usage like space or IOS or CPU memory for query plans, but query execution plans or stores or classes. So this monitoring was pretty important. So another challenge is when that MyLogs are on master size, so we started migrating MyLogs on slave side, but master was also challenging. So we require using low-based final log for MyLogs. So it's same as MySQL cluster or other storage engines because statement-based final log needs a gap log or table-level log like MySum or Indolevi. So we don't support gap log feature in MyLogs. So that's why we need RVR. And so removing queries relying on Indolevi gap log that's also needed. So there are not many queries like that, but for example, when people want to create cues on top of Indolevi, then this means you rely on the Indolevi gap log. So this can't be used in MyLogs. And we recently supported the XA two-phase commit support in MyLogs, which is two-phase between final log and MyLogs. So this is pretty critical for supporting master size. So creating MyLogs instances without downtime. So this was the first requirement. So, but this is relatively my SQL, just picking up on the Indolevi slave, then stop slave, then dumping with MySQL dump, then loading into MyLogs. So this is the basic step of the using, combating to MyLogs is the one instance. So, thanks to MySQL, the robust replication support, the stopping once slave does not affect services. So MySQL replication environment is relatively simple. So it's using a single master multiple slaves. We don't use change multi-tier replication environment because it is a bit too hard for us to manage. We, another interesting feature is, so we use MyLog compression on the, during a network. So because we have a geographically distributed MySQL replication environment. So the reducing a network size during a transferring from a master to slaves is pretty important to reduce network size. So, but the dump slave currently is not that efficient. So dump slave compresses MyLogs with Zlib, then sending to the slaves. But this happens to all of the, for the slaves. So if there are 10 slaves, then the dump slave needs to compress the same binary layer. So 10 times for all slaves. So this is pretty inefficient. And the Zlib compression uses a lot of CPUs. So currently, so we have a CPU usage on the issue on the master side. So we are working in progress to switch to a Z standard which is a newer and a much more secure to send a compression format to save the CPU usage on the binoc dump slave. And we use same sync replication. So we use a lossless same sync to prevent data loss on the, during failure over. And we have multiple databases within the instance. So create database. So we have multiple databases and we use multi-slated slave on the slave side to reduce the network. But hopefully, the single master multiple slaves only matters for MyLogs. So others are pretty much independent from a storage engine. So just stopping one slave, then dump, then load to MyLogs. So the second point is MyLogs tables with optimal current family options. So the lock-save, as Mark said, there are lots of parameters for lock-save. So this means the MyLogs have lots of parameters. And most parameters are dependent on the current family. So the MyLogs has two important features concerning about current family options. The winds index orderly. So Facebook has, you can support like a Facebook news feed. So this is fetched by an order by time descending. So most of the Facebook workload is, range scan workload is descending scans. So this means the descending scan has to be fast enough for lock-save. So by default, the lock-save scan ordering is pretty optimized for ascending scans. So we need a different format. So another format optimized for descending scans. So we call this as a reverse order current family. So reverse order current family specifies the data is stored in opposite order, which means it's pretty optimized for descending scans. And broom filter is, broom filter extends for makes the point lookups or range scan with equal predicates fast enough. So the broom filter needs to specify a broom filter length which is highly dependent on the index size. So this means if you have many different types of indexes that you need to configure different broom filter options which means you need more broom filter current families. So we are still working in progress to make this more robust and efficient for specifying a broom filter for current families. But anyway, so the most MyLogs configuration options that are current family under each MyLogs index belongs to a specified current family. So in production, so we classified indexes on our main database for a limited number of similar current family types, like a node or link or link in reverse order or others. So we have limited a few current families then mapping many indexes into the round for current families. So data loading, so we implemented a bulk loading feature in MyLogs to make initial migration from our innovative to MyLogs faster. So directly creating, so by default, the log service write path is writing to mem table then many compaction happens to and eventually writing to the bottom most levels. So which requires writing the single one megabytes write results in more like 10 megabytes writes. It still extends in a little bit, but still writing many more than original data. So we implemented a bulk loading feature in MyLogs which creates the data files in the bottom most level directly on internet. So which means bypassing all mem table writes and the compactions. So primary key can be controlled by a session variable called a rock city work load which by specifying that options, then everything is written directly. But loading data must be pre-sorted by primary keys. So this is a requirement. But for initial migration, it doesn't bother because you use my screw down a hyphen hyphen order by primary. Then it guarantees that the data original data was sorted by primary key. And for secondary keys, so we added the first index creation. So all the table at the index, creating index automatically uses bulk loading for the secondary keys. So this is basic data migration steps for us. So create table with no data from my skill dump, then creating my logs tables, then all the table drop indexes because to use a first index creation. Then my skill dump, then host equal in the baby, then pipe the my skill, then it command set my skill loading called zero and set rock city bulk load equal zero. Ah, sorry, so this is a typo, so rock city bulk load equal one. Then all the table at the index, then using a first index creation. Then after finishing that, the start slave. So on our hardware, so we could migrate about one terabyte in other instances in just one or 1.5 hours, which is pretty good from a migration point of view. So the data loading, aside from data loading, so another very critical requirement for us was we don't return the wrong results to clients. So the data verification. So my logs is relatively new database technology, so not just us, so many people like you might care about the quality. So they might have more bugs than robust in the database. So in the database, robust, it has over 10 years of history, but the log state is fairly new. So people care about the bugs. So data consistency is one of the most important requirements generally, not just us, but from our communities. So we wrote a verification test tool to check the data consistency between inner-db, one log-cd, or just single the my log-state consistency. So we have a couple of tests. So one is an index count check between primary key and secondary keys. So this is the counting the mismatches. If there are mismatches between the primary key and secondary key count, low counts, for the same table, then this is something wrong. So this is the basic check test and it doesn't need in the database. So just running a statement like this, so select count from a force index primary union, so select count from a force index secondary key. Then this is the same transaction scope statements. So the low count has to be the same. If the low count is different, then the data is broken. So it has to investigate. So we started using the tool and checked all tables with secondary keys and we fixed most of the outstanding issues before deploying in production. The second part is index stats, statistics. So the checking the estimated low counts like a short table status. If it's far different from actual low counts, then it's wrong. So the funny thing is this found a bug that the bug loading didn't update statistics. So that was a bug. So the short table status returns a zero low count for the one billion table. So one billion tables with one billion records. So hopefully this tool detects the problem before deploying in production. And the third part is check some tests with InnoLady. So comparing between InnoLady instance and Milox instance. So this needs a bit tricky scripting. So basically what we did was running a start transaction with consistent snapshot with the same GTID positions between InnoLady and Milox instance. Then getting the stopping slaves, then finding the GTID position, then starting transaction with the same GTID snapshot then running the same queries, same table scans. Then the result has to be consistent. So by running a queries, then comparing checks. And shadow correctness check is another important feature for us, but so let me check out the shadow test. So we have a shadow test framework. So capturing read and write queries from production instances. Then replaying them into a shadow test, master instance, master slave instances. So we have a custom audit province to capture queries. Then we have a logging framework called scribe. Then the capturing queries then starting into the data warehouse, then replaying them into a shadow queries on the shadow instances. And we did that as a shadow master test before deploying master instances. So detecting client errors, or finding queries relying on gap log. So we have several commands to list up queries and gap log, then fix them before deploying production. So note of issues and how we fixed was lots of snapshot conflict errors, which is so Milox transaction implementation. So the isolation implementation is a bit different from InnoReview. So Milox implementation is pretty close to Postgre style. And by default isolation is repeatable read. Then Postgre and Milox style repeatable read may return errors called a snapshot conflict errors where the InnoReview is silently updating the conflict. So it means that users may get more errors. And they, yeah. So we noticed that, and we decided to switch to read committed for the major queries. Then read committed fix the problem. And another issue, interesting issue we found was a slave stopped with IO errors on leaves. So this silently stopping slaves, and it may take time for automation tool to detect that. So we switched to make the Milox abort critical IO errors from strategies from both reason rights. And index statistics, I said as I said before, so inaccurate low counts on bulk loading. So we fixed that. And we are still working progress about the index cardinality statistics has not been as accurate as InnoReview. So we are working in progress in optimizing that. But so we currently we have worked around to setting the EQ count index div limit. So it is my scale parameter from five, six to zero. So to using index type to estimate the statistics. And if you fix some crash bugs in Milox and Buxity. And compaction stores is heavy life. So this we often seen in production. So heavy life causing lots of compactions which may cause stores. So we observed stores on schema changes. So we use online schema change tool. So it's a bit different from our github, the ghost. But we have one, the schema change tool. So it's basically recreating tables on schema change. And recreating tables, so previously it load too much to memtable and cause too many compactions which store replications for about 10 seconds even though it's called online schema change tool. So we added the paragliding feature and the first index creation feature and we made our own schema change to use them to avoid compactions. And we also used different compression algorithms for lockstabies levels. So lockstabies has multiple levels. And about 90% of data is stored in the bottom most level.