 Hello. So, I'm Daniel Vanader. I'm here to give the talk about Ghost. Many of you might have expected Slomy Noach here, but unfortunately he couldn't make it, so you're stuck with me. So, Slomy is the main order of Ghost, but I and some of my colleagues also worked on some of the Ghost source code together with Slomy and we discussed ideas about Ghost. I'm working for Booking.com. What we do is we sell hotel rooms, but we also run on many, many MySQL servers and we do many schema migrations like every day. So, maybe to start off with a question, who of you are using schema migrations other than just running alter table? Okay, I would have expected like a little more people raising their hand. So, who of you are using Perconas online schema chains? Who of you are using another tool? Only a few hands. So, Ghost is the online schema migration tool from GitHub. And to start off, there are a few limitations about things which are currently not working. Of course, there might change in the future because there's ongoing work. For some of these things, there might even be patches, but it's not merged yet. So, foreign keys don't work currently. For us, it's not too much of an issue, but for other people it might. Triggers doesn't really work. It requires row-based replication for now, and it requires a full RBR image, which should not be too much of an issue for most people. Generated columns in 5.7 is not yet supported, same for like the similar feature in MariaDB. Multi-source replication is not supported. Well, it probably works somewhat, but it's not really tested. So, probably don't use Ghost with multi-source. If you use an active multi-master, then Ghost is not the tool you want to use. It might be possible to add that later. So, Ghost has much more code and is much more involved with what's going on than most of the other tools, like PT online schema chains. So, there is more room for error there, because it does a lot more than just monitoring stuff, basically. There's a GitHub paste, and there you can see all the issues which are and all the limitations, because things might change and have changed a bit already. So, schema migration is a well-known problem, because if you just run an alter table, it might lock things on your master, and later on, when it replicates, it might block your replication from continuing, which causes your slave to lag behind. Well, then Percona made the online schema change tool, and there were a few other similar tools to do online schema changes, sometimes with like a multi-master and then switching replication. There were many tricks, and the Percona online schema change tool worked really well for most of the people for a long time. So, it is a tool which served everyone well, but there are also other options. Facebook also made a tool. I think it's written in PHP, which limits its user a bit. So, I don't really think many people are using that outside of Facebook, but they do it in a different way, which is interesting to see. Of course, these slides, they were originally made by Sloamy, so there might be some GitHub related stuff on there about internal operations at GitHub. I will skip over most of it. I might add some information about how we do schema migrations in booking and how we are using ghost in booking. So, this basically is a PT online schema change. You're creating a new table, you're altering the new table, and then to get all the data changes applied to your new table, you put three triggers in place, trigger for the insert, for the deletes, and for the updates. So, all the things happening on the original table will also happen on your new table. And then, of course, it also needs to copy rows from the old table to the new table, because not all rows might be touched while the online schema change is running, of course. This is what Facebook is doing with Facebook online schema change tool. So, all the insert, the delete, and the update are going to a change log table. So, it's like a three-step approach, because then later on all the changes are being applied on the new table, which does help a bit, but it also has its limitations. So, one of the things ghost does is that it's not using triggers anymore. There have been a few issues with using triggers for online schema changes. First one with sort procedures is, well, they're interpreted, not compiled. So, each trigger is running, and it adds a lot of overhead on the master just to execute the trigger. And, of course, there are a lot of issues with logs, because there are logs, of course, on the old table, on the new table. Also, at the eventual cut-over, there are some logs. And the more active your server gets, the higher the risk gets of actually these logs blocking your online schema change, and also blocking inserts and updates on the original table. Because with the triggers, anything which happens on the original table also has to happen on the new table. If anything is blocking the new table, then, of course, the insert cannot happen on the new table, which means that your application gets an error. The other issue is that it's not possible to suspend the triggers. So, you're running your online schema change. Suddenly, there's like a big peak in traffic. You just want to pause the online schema change for a bit. Well, that's not really possible. You can stop the copying of rows from the old table to the new table, but the triggers still have to be in place, and they still add overhead. So, that's a bit of an issue there. And I don't think I've ever seen anyone running multiple online schema changes with the PT online schema change tool in production and actually be happy with it. I also wouldn't recommend doing that. And there's the issue about testing. With the PT online schema change, it's not really easy to actually test if a code change in PT online schema change does the right thing, because, well, it has to put those triggers in place and remove them at the right moment. And if you're testing your code and you're, for example, removing the target table before you're actually cleaning up your triggers, then all the triggers will fail, which will cause a production issue because you cannot insert it in the original table anymore. So, it's difficult to actually test all the online schema change tooling. So, of course, Ghost does everything in a different way because Ghost basically acts as a replica of binary logs, which have to be in row-based format, then parses the information in the binary log and applies it to the master if it's running in production mode. And, of course, it also still has to copy rows from the old table to the new table. So, basically, this is how it looks like. There are inserts, deletes, and updates on the original table. Those go into the binary log. They're read by Ghost. Ghost writes them to the new table. But if anything goes wrong with reading from the binary log and inserting on the new table, the inserts on the original table will still go on. So, it's much more reliable in that way. Does everyone understand how this is working? Any questions? Feel free to stop me if you have any questions later on. So, of course, it's best to use a replica or a slave, however you want to call it. So, if you're reading from the replica, there is even less load on the master. And the only thing the master actually sees is normal inserts in a table, which is, well, normal operation. Nothing special going on the master. So, of course, we can read the binary logs from whatever thing is providing the binary logs. So, that really helps. And Ghost controls the whole data flow because Ghost is reading the data from the binary log and applying it. And with PT online schema chains, the trigger is doing the work. But here we have to actually read the data and write the data again. But also because we are actually touching all the data, we can do interesting things with it. And so, again, the writing to the master, reading from a replica from a binary log, and then eventually we switch the two tables. Also, there's an interesting method about how a cutover is done with Ghost. It's quite difficult to actually log two tables and rename them autonomically. But there is a trick Ghost is using, which actually ensures that when you're switching those two tables, they always exist. There's no gap in which no table exists in that place. So, all the writes which go on will still find a table and work normally. It's best to use a replica, but it's also possible to connect to a master. And the third option here is to test on a replica, which is a really unique feature for Ghost because that really allows you to run Ghost only on a slave, see what it does, see what the end result is, compare the old table, compare the new table, see if there are any changes. So, booking, the first Ghost version we tried, we actually were testing on a slave, and we always test on a slave before we run in production. So, we were testing on a slave, and we noticed that, well, some of the timestamp columns were like off by like one hour, sometimes two hours. Well, one hour or two hour, well, that's daylight saving time or not daylight saving time in Amsterdam. So, that was obviously a time zone issue because Ghost is used at GitHub, Ghost is used at booking, but of course the environment is different. We might use different time zones, we might use different schema layouts, different server versions. So, it's really good to do some testing because your environment might be different. And this really gave us the option to test, see what was happening on the data, work on the code, fix the bugs, the bug is fixed for quite some time already. And then when we were confident that the data was correct after running the migration, we could just run it in production. And we now often run Ghost in production at booking. And I know that at GitHub they're running multiple Ghost migrations a day. So, let me just, yeah, this is the testing. So, of course you could do some more testing because you can just automate and do like a really simple no migration and test every day if your data still looks the same before and after an online schema change. Because if you're adding a new column like a JSON column, well, you want to know if it still works and or that you need to work on the code. There's a Unix socket and Ghost allows you to connect on the Unix socket, ask about the status, change the trodding, change which slaves are being trodded on, and change more parameters. And usually when you're in Ghost, it will add like a delay of less than one second because it's using a heartbeat mechanism. So, it's not really adding any replication delay, which is really good. Also it's possible to delay the eventual cutover. So, you start your Ghost online schema change. Then when it's ready for a cutover, you make sure that you're in the office, that you had your coffee, and then you start a cutover, and then you can monitor it and do anything if something goes wrong. At booking, we never saw any issue with the cutover, but you might want to be in the office anyways because, well, a client might do something strange, or if you're adding an extra column, well, the client might suddenly start to do something differently. Of course, Ghost allows you to run a number of different hooks. So, at GitHub, they're using chat ops. So, it's possible to integrate Ghost with things like chat systems to give you a message when it's ready for cutover, for example. So, one of the last things I want to mention is that now the row copy is done with just an insert into a select from, but eventually there are ideas about decoupling that, and then you can actually read from a table and write to another table, and then there are other things you can do eventually, like a live table migration. So, you're reading from the binary logs, you're reading from the original rows from the server, you're writing into a completely unrelated server, which allows you to do a live migration of a table, which is pretty nifty. There are some other things Slomy's working on, and Slomy's colleagues, and we are also working on. One of those things is resurrection to actually restart an online schema change if it was stopped. And, of course, it's open source. And, of course, it's on GitHub, but that's, well, I guess you guessed it already. Some example. Well, I think that that's it. So, please have a look at ghost, give it a test, so it's really easy to just test it on a slave, see what it does, no production impact whatsoever, and if you're confident, then you can run it in production. Thank you. Any questions? I think that should be possible, not only for ghosts, but probably also for PT Online Schema Chains, because you, so the question is, like, after the cutover, it's really difficult to do a rollback. So both ghost and PT Online Schema Chains will just keep the old table around if you use the right options. Ghost does that by default. PT Online Schema needs an old option for it. But for ghost, it might be just possible to reverse the operation of inserting in the old table. So that would for sure be possible, I think with both of those tools. But someone has to do it. Yes. So all the details about how that's working are on the documentation on GitHub. So basically what happens is that there's a renamed table, and it blocks that by having a lock. And when it's done applying all the binary locks, then it will remove that phantom table and do some other tricks to actually make sure that it's atomic. So the first thing you have to do is do a lock tables, and then you have two tables under our lock. And then you want to do a rename table when once everything is right. But another thread is still applying. So there are like two threads which have to coordinate work. So that's, yes, so one of the, yes. So once you have the lock another, in another thread you're running the renamed tables. And the renamed tables will have a higher priority of already running inserts and the leads and updates. But all the information about how exactly that's working is on GitHub. So the question was, does it support GTID? Yes. We are running with GTID in a lot of places. I would have to think about in which places GTID really matters there. So probably the most important thing when running with GTID is that you want to have the GTID actually matching in the end. But you're inserting on the master. Well, every insert is just a regular insert because that's the things you're applying for the binary lock. And you're doing an insert into select star from, which will also just generate normal GTIDs. So I don't really think we have to do that much special things to allow GTIDs to work properly. Thank you.