 Thanks, Russ, for a very nice introduction. Hello, everyone, as Russ mentioned, I am Andrew. I am previously famous, as I still say, famous is a wrong word. I am the author of South, famous, yes, and of course of the now-released 1.7 Django migrations and a senior software engineer at Eventbrite, where I am sort of on our architecture team doing database stuff and sort of general difficult design decisions and things that are sort of, very now I guess, to what I've been doing in Django. So what this talk is about is essentially the history of South, what it came about, how it developed, and then what became of the 1.7 Django migrations. And more importantly, how those migrations were developed, sort of what the design decisions were, how they're implemented, and sort of the pitfalls and the issues that I came across. I want you to come away from this talk with an idea of like how that stuff works and not be scared of migrations. It's one of the biggest injections of new code into Django in the last couple of years, I think. It's mostly very clean, which I'm very proud. It's a little bit complicated. There's some stuff in there that is very computer science-y, which means it's very difficult to understand sometimes, but I'd love to try and explain to you and get some of the ideas across. So as Russ mentioned, I have been to every Django con in the US. This is me at the first one looking very young and slightly scared of a stage. And in fact, I believe everyone I've been to, given a presentation on migrations, or nearly thereabouts, I have put out the first slide deck I ever did on speaker deck view. This will be on my website after things. You can read the URL. It's a very short, eight-slide deck, but it's interesting how that first slide deck, I'm at first introduction of South, is very similar in some ways that we have today and different in others. But after all this time, after the past seven or so years of doing migrations, I've concluded that they're pretty good. This is my only conclusion, really. There's not much more we can say. Some people complain, some people like them, but in the end, people seem to really like migrations. South became sort of integral part of Django. It feels like, oh, you know, every tutorial went, you need to install Django and then you need to install South and then you can keep going. And for a long time, South was sort of like, you know, it was still not being competitive. It was the only player in the field. Then people were like, why isn't it in the docks? Why isn't it officially supported? And then finally, as you've learned, it's now 1.7. But how did that kind of thing come about? So I had to sort of step through the history of South here. So South was initially launched in August of 2008. It was launched in fact from the second to the right window of this building here. This is the stable block in Cornbury Park in Oxfordshire, which is a very lovely place to launch code from. I was working at the time for an agency called Torchbox, who are sort of small development agency in Oxfordshire. And they have, for some reason, offices in a country estate. So this sort of fits the British stereotype. It could be down in Abbey, like it's a British stereotype. Like, I launched my code from this wonderful country map. It's a fantastic kind of thing. So the first few versions were launched from there. And they were sort of made for a single project. At the time, we were making a very simple sort of CMS for a customer, I'll say simple, moderately complicated. And previously at the company, the method was, we have a directory full of SQL files. We run the SQL files and we just sort of keep the number in our heads. That then improved to, we have a column with a number in it. We run the numbers bigger than that number, then update the number. And then at some point, we sat down and me and Nick Burch, my colleague at the time, said that sort of thought we want some sort of more formal system to do this, some kind of better system. And out of that, South was born. Curiously, Simon Willison, who was also working with Nick at the same company the year previously, launched almost exactly the same month, I think, demigration, which was sort of his version of that, and looks shockingly similar. So I think Nick had influenced both of us on that. But in the end, demigrations was my SQL only and that sort of was one of its demises on that front. The other competitor at the time was Dragon Evolution, done by a wonderful, very old Russell Keith McGee. It was, I used it certainly for a couple of years before I wrote South. It was quite popular at the time. I'm not sure why it fell out of popularity. It did. It was in some ways sad to lose competitor because as you can see, if you look at the times, my releases grew further and further apart as sort of the pressures of having other things didn't keep up with me. Now that's all of the early years. South at this point, before South 0.4, there is no auto detection. You call South and you go, hi, South, dash, dash, add field, this, dash, dash, add, model, this, dash, dash, remove, model, this. And then I believe it was migratory that came in with the idea of sort of auto detection and storing state. And then I summarily nicked that from them and put it into 0.4 of which I thank them very much for coming up with the idea and a decent implementation too that I could sort of heavily borrow from. But so South as you know today probably appears more around 0.4 than 0.1. 0.1 migrations, I believe, still run on 1.0. There is still full backwards compatibility which I'm very proud of. I think only I have migrations from the first few versions that really exist anymore. But so 0.4, 0.5, you start seeing things like different operations and the auto detection gets a bit better. It sort of becomes more of a thing where you can simply go change models, run sort of schema migration and then have migration. That's fantastic. And then sort of as we get to 2009, 2010, we approach stability and very optimistic me, but this is four years ago. In the short term, South 1.0 will be released, says Andrew of 2010. He's very wrong. He doesn't realize it yet, but he's very wrong. This is a period where sort of South becomes pretty much what is required of it, mostly by the community. There's a lot of issues, a lot of missing holes, but it's good enough. Like the big horrible bug reports that are coming in, people generally sort of work around other issues and it's not perfect, but it does the job. And then sort of between 2010 and 2013, there are no releases, there are no major releases. I quote sort of the long gap. Part of this is that at that point, I had left university, I was in the world of work, I was freelancing, that's reasonably high stress. And then finally in May of 2013, at JangoCon Europe, or thereabouts, I released 0.8, which essentially just a collection of bug fixes and a few new features that accrued over the years. At the same time, roughly I launched the Kickstarter, which funded the remainder of the work until this day. So I asked for two and a half thousand pounds. I got, as you can see here, almost 18,000 pounds. I'm very grateful to everyone who contributed. Many of them are here. I've shaken at least a few of them by the hand. If you gave, please come to me. I would love to congratulate you again. This essentially funded the work in Jango. The idea of this was that I could go down to three days a week at my job at Lanyard. I could then work one or two days a week on Jango and get the work done. And then rather than having it done sort of a mysterious timescale, we could approach that almost, almost being paid to work on it. Like there's a guaranteed time. You can have this sort of fixed time. It's not only write code, but review bugs, communicate with people. As many things Russ was saying about having a Jango Fellow, like I experienced a little bit of that is having like a set amount of time for a week to do this kind of stuff. It's absolutely invaluable for this. And then sort of, again, there was sort of a long period of me working on Jango migrations. And so I wasn't working on South. And then a couple of months ago, or I think it was last month, 1.0 was released. And then of course, yesterday 1.7 was released. And that's sort of the culmination of all this work. Jango, I'm sorry, South 1.0 is essentially just 0.8.4 with one extra change, which is a way of having migrations that can exist alongside Jango migrations. So basically, South now looks for a South migrations directory by default, and then falls back to a migrations directory, meaning you can ship a third-party app with both South migrations and with Jango migrations, which is a big part of our sort of upgrade plan. But of course, why did this come about? So in 2010, in the same blog post where I write, oh, South 1.0 is going real soon now, I also write, for at least, people have been suggesting to me that South should be in Jango Core. And by people, I mean like Jacob Kaplan-Mott and other core developers coming and saying, Andrew, the thing I'm doing, should really be in core. I'm going, mm-hmm, and it's walking away. It's like, just terrified of the workload that might involve. So the idea for this sort of new version of migrations in core, I mean, it has been building since before this point, but certainly since this point. And at this point, I have, let's say I think, five years of South bugs and South issues. And like, as a maintainer, all you ever see from a project like that is the issues tracker, essentially, or the bug report mailing list. And so, your life is pretty much nothing but a continuous stream of issues and bug reports, where you basically have become convinced your software is absolutely terrible. And then you come to a conference, and we go to, oh, it's great software, we're learning, and we go, that's fantastic. And then suddenly you're like, oh wow, I actually like this stuff. So conferences almost, one of the reasons I come to them is that, it's a nice to get a more realistic snapshot of who's using Django and what they're doing than sort of the much more self-selecting thing you find on issue trackers. So at this point, I was thinking about, how do I structure this? What's the kind of thing they do? So the initial plan I had in 2010, and the bug post I'm outlining here, was I would have two different components. So I was very keen that the sort of situation, the environment that South grew up in, that of competition, that of sort of different ideas would persist. And so I was like, okay, the common parts of migration's frameworks, in particular the schema backend, where you abstract things like adding tables, removing tables, changing fields and things, that you should extract those out across databases. Like Django already does this for queries, using the ORM. We should probably extend that and do all the difficult work of like, well, how does Oracle add indexes, or how does SQL Server remove unique constraints? And map that into a common part of Django called a schema backend. And then also add some ORM hooks. So the key thing that South does, when it's doing different versions of models is that different migrations, it versions the models. So when it does auto detection, it has this big joint dictionary at the bottom that you'll see in a bit, if you're not familiar with it, because you probably are. And then from that, it makes brand new, well, brand old versions of models in memory and then runs migrations on those. So it reflects how things looked at the time you made it. And so Django didn't really support this. There's a module called south.hacks, which as its name suggests, is a load of horrible monkey patches to make this work. And so my goal was to, okay, let's remove that module. Let's make, let's have Django have proper sort of first class support for having coexisting model versions and have that kind of stuff in there. And then at the same time, separate of this, we'll have south too. And south too will use as underlying APIs in Django and then build a better user interface and better migration handling on top of it. So that sounds good. But of course, at some point the conversations move too well. While Django core at some point, we'd like to slim things down. So Contrib.localflover was a good example of this being moved out. We also think that things that are essential to web development should be in there. Like Django is a batteries-intuitive framework at some point. And so the plan kind of changed to, well, let's just roll it all in. And so the two components there, the migration handling and the user interface suddenly sort of roll into Django. And then, so this is the kickstart at this point, I very optimistically proposed that I would backport the same kind of migrations to 1.4 through 1.6. The code for this does exist. It is shockingly slightly functional. South too, at least in its native state, I should probably release it just for interest, is an automated source port of Django source code to 1.4. There's a whole load of regular expressions, a whole load of mappings that takes the source code from migrations, transforms it through certain things and dumps it in a South two directory with certain sort of helper functions to make it work. And manages to run migrations to 1.4. I have no idea how it works. Like, it has a giant monkey patch module that makes 1.4 look sufficiently like 1.7 than migrations go. I was probably fine, just try and run. And because it falls over with all the complex stuff, it is fascinating, awful, at some point I'll show it. But of course, how I'm suggesting that kind of fell on our side. So the revised plan was to focus on my work on Django 1.7 and unfortunately, I couldn't deliver South two. The sort of aforementioned South 1.0 migration path is the new suggested solution. You can have both proper South migrations on these weird sort of secondary ones that South two would have and 1.7 ones and use the full power of both frameworks. So if you wanna write rule code or rule Python, that is an option to you on both of those. It is a higher burden on maintainers of third party apps. I realize this, I'm very sorry. If I can help you write these things, please come and ask me because, you know, I am happy to help people port stuff over to 1.7 if that makes their lives easier. But with this new plan, you have sort of Django containing all these different parts. So it's not really moving South into Django. Like a common narrative is like, oh yes, Andrew is just moving South into Django. This has been a very common misconception over the last couple of years. Instead, I am adding migrations. It is a brand new framework. I say about 5% of the code is from South if that of the new stuff. A lot of the ideas are from South but it's entirely rewritten pretty much. And so it's not just a simple source port. It's been a lot of work of redesigning how migrations work, improving some of the design stages in the South failed on and then changing sort of the core sort of structure of how migrations work and making them modular, making them reusable as we're seeing a bit. You can take half of these components and use them yourself separately from Django migrations. I did have a keep that separation that you saw on that initial slide of my initial plan there. Django still separates the idea of schema changing from migration running. If you want to, you can just use the schema editor and never touch the migrations library. So, for example, if we're doing an IATN framework that needs a column to every language, we have you covered now. You have a supported Django interface to changing schema in a reliable fashion in transactions. Like that is there, that is core code, you can rely on that. If you want to, and I suspect you probably do, want to use a full migrations framework, we have that too. You can swap out parts if you want to. If you have a very custom big company problem, you can probably swap out one of these parts to do just what you like. That's the adaptable way it's meant to be working. So, I'll go through these different parts. I'll just write them briefly and there's more detail in the next few slides about them. So, the schema editor is what I called the abstraction earlier. It's the piece of code that takes the idea of modifying databases called DDL, that's called data definition language, and adds a schema editor to every backend in Django. You can do connection.schemaeditor, you get one of them, and it has methods like addModel. It has methods like addField, addField, removeField, changeUniqueToGear, all this kind of stuff. And so, if you can call that and do things. The second cool thing that's sort of separate from that is fields can now be deconstructed. I'll show this later, but what this means is that you can take a field, sort of say a char field or a foreign key, and you can ask it, how are you made? You can take that field and get it into a serializable format where you can recreate it again a second time. Before this, you couldn't take a Django field and serialize it. There are model instances. And most of the things they have were stored in instance variables, but some of them weren't. The way South did this, so South, I think, point one to point five, actually read the source code of your model.py file, parsed the line out of your model, then checked out where it will fill a bit and took that part, which is horrific, but works surprisingly well, as many things can do in South. They're not well implemented, but they worked. And then later on, South gained a whole load of rules about, well, if you see a char field, you do this, if you see this other field, you do this. And then if you were sort of a custom third-party field, you could implement a method called Southfield triple. And what that is, that's essentially what Deconstruct is. It said, okay, you're a field, tell me how it makes you. And Deconstruct is just that formalized. So all the core fields in Django have this. It is now a requirement in 1.7 for third-party fields if you wish to work with migrations. However, there are plenty of docs that have do-its. There are some examples in those docs. And if you're inheriting from a field inside Django and you haven't made any changes to the keyword arguments, you don't need to do anything at all. So again, if you want to help with that, just email me or Django Dev and we can help out. But that's one of the few sort of imposed requirements on third-party apps in 1.7. Finally, the sort of the O&M hooks I mentioned in that first slide back when I was doing the separation, now model options into apps. So you can now make models living in different worlds. And so in particular, the app loading stuff in 1.7 changed what we used to call the app cache to sort of more an app registry. It's a much better formalized concept. And so now you can load models into different versions of apps, of app registries. And so what the migration framework does, it makes a different app registry for every point in history, as I'll show you later. And so we can easily have like 40 different versions of a user model and address the right one, have the right foreign keys, point to the right places every time. Migrations themselves are much more complex, of course, they're much more moving parts. There are roughly separate into operations, which are the abstractions around the individual schema editor methods, things like add a field, remove a model. Those are still operations, I'll show you those in a bit. The loader and the graph are this sort of way of abstracting out, you have these files on disk, what do they mean? How do the dependencies work? How do I resolve stuff? How do I plan migrations? That's covered in that section. The executor simply takes a plan from the loader and the graph and just runs it. So it basically takes a list of things to run, loads them up, runs the operation. If it fails, it rolls back the transaction if it can. If it works, it prints okay and keeps going. The autodetector is the most complex part, arguably. This is what takes your current model state and the current migration state and sort of intelligently compares them and writes out full migrations. That's improved a lot since South, but we'll see that in a second. And then the state is the final part and state is a very clever thing where rather than working directly on versions of Django models, what migrations does is it works on a sort of much stripped down version of them called state. And because the way operations work, they can work on state very quickly. We can run through from nothing to your end state in hundreds of migrations in a fraction of a second because we're not doing model objects. We're just changing to chain dates. Like state is nothing but a couple of classes and dates. So we can run all that stuff and that's a much easier way of doing it. Let's show here. So the first sort of key cornerstone concept here is operations and state. Now, these two kind of go together. Operations, as it says here, are a declarative representation of model changes. If you look inside a new migrations file, you'll see a big list of operations in a list. All it's saying is, hello, as a migration, I am this operation, for this operation, for this operation. In south, you have two methods. You have forwards and backwards. But what operations do is they abstract away the concept of those two ideas. Because obviously the backwards is simply the reverse of the forwards. So operations know how to work both ways. Operations additionally know what changes they represent without doing database calls. An operation, you can say, here's a state. What does your new state look like? So you can do a quick in-memory change of, well, I end up adding a field as a model so here's the new version of the model you get. And then you can render those out to different versions we'll see in a bit. And so what it sort of looks like is you have this thing where you have a state, an operation takes it to a new state. And internally in memory, migrations is keeping a state for every individual thing between operations. And in fact, a migration is nothing more than a sequence of operations. So you can see here that I've said migration one, migration two, that could be one migration. And in fact, this is how the new squash migrations command works. It takes all the operations, it just concatenates them, it optimizes in a way a bit and then there's a value migration from those ones. So you can see that in fact, migrations are now more of a generalized framework or closure for operations in a sort of specific concept. You could actually do away with them technically but they're therefore sort of useful reasons so you can address things by name. These are the operations we currently ship with. There's quite a few of them as you can see. The most important ones that you might not know about are run SQL and run Python at the bottom. Those are operations that you can manually put into a file like migrations are meant to be writable. Please feel free to edit them, they're fantastic, good for that. And run SQL, you give it some SQL, it runs it, not very confusing. And run Python looks basically like an old south method. It takes, you get two arguments, you get a app, so it's like the old URL object and you get a schema editor and you can do whatever you like. So run Python is for data migrations, it's for really complicated changes, it's for things like adding stored procedures, it is too complicated, stuff like that. So you have all the power of the old stuff if you want it but by default we have a much more simplified migration format that's much more understandable and much less prone to manual error of missing certain different things. The schema editor, as I mentioned before is this sort of abstraction over the DDL and then databases. This is pretty easy for most things, Oracle has a few niggles, SQL server has a few more niggles, unfortunately SQLite, while a fantastic database, does not have support for altering tables. SQLite, I think you can drop tables, you can add columns, that's it, you can't alter columns, you can't drop columns. And so Django and South both have a full emulation layer where if you ask two things, something SQLite can't do, we make a new copy of the table, if you want it, we copy the data over, we leave the old table and rename the new table to the old table, all in one big go. And so if mysteriously your SQLite migrations are really long, it's probably because we're having to emulate the functionality SQLite doesn't have. In general, we shouldn't be doing heavy migrations on SQLite, it's meant for embedded systems or quick development, don't do serious, don't run production on it please, it's a single access locking database, it's not great. The other thing to mention is that schema editor takes Django models and fields. So South's version of this called DB took table names and sort of more, it took fields as well, but it took table names and column names, whereas this takes field names and models or objects. And so it's a bit more high level and this actually gives us more power to know, well, they passed a foreign key, so while it's called something, it's actually got column of something underscore ID, whereas South has all these sort of things like, if it says underscore ID, it's probably a foreign key and sort of weird exceptions, sort of trying to work around this stuff. Using it's very simple, so this is a very simplified example of how to do a couple of changes. It's a context manager, so you can just do with connexional schema editor as ed, create model author, add field, book author, foreign key author. So as you can see here, the first one is making new model, I just pass in a model instance. The reason we have version models is that I can give them the correct version of author in the migration framework for this. And then the second one is adding a field, so you pass in the model, you pass in the field name, you pass in a field instance, and that's it. And then again, the field instance here would come from a deconstruction, which is, of course, this bit. So deconstruction is this new thing where every field is deconstruct and it returns you the arguments you pass to double underscore in it to do place itself. It doesn't have to return you ones it was made with, as long as the things it returns you give you the same kind of field. That's the requirement is that basically it's a clone without having things like copy or the copy module and things like that. So using it's very simple if you want to use it. You just call deconstruction a field. You get like a four tuple of four things. The first thing is the name of the field in the model. So this is field.name on most field instances. It's none of this example, because this is a field not parented to a model. So, but it would normally be something like author or height or something like that. The second one is a fully qualified path to the field. So as you can see here, this is Django DB models char fields. If you're a custom field, this would probably be like myapp.fields.customfield or so on. And the third argument is positional arguments. We encourage you not to return these because they're much less possible across versions of your fields that changes. But if you have to have them, they're there. And the fourth one is keyword arguments. As you can see here, this char fields spat out the one keyword argument I put into it so I can take these things, I can do in it of that class with star, args, double star, keyword, args, and it reconstructs itself. And in fact, this is also a really useful way. We think you're using it in Michelle Petuska's not yet merged composite fields work because he needed to go to clone fields. Django doesn't have one of those. At least not the one that was properly, but this is one of those. And so even his work is using this kind of stuff now. The graph is this sort of generalized idea. If you know graph theory and mathematics, it's a directed graph. It is non-cyclic as well. So cycles are bad. And it's sort of an in-memory representation of both the structure of the graph and several methods for traversing through it. Most importantly, the leaf nodes are basically the most recent migration for every app. If we see more than one leaf on an app, it means that you have merged two branches. So the nice thing is that Django migrations can, because they have explicit dependencies on their parent inside the same app, they know when they've been merged. Because it says, oh, there's two apps that point to the same parent. Oh, that must be a merge. And then it can immediately quit and tell you, ah. It also has root selection, which gives you the first migration of every app. That's how it decides where to start. It has planning. So you can say, I want to get to these four nodes and it will give you an exact in-order set of migrations to run to satisfy dependencies to get to there. And of course, loop detection. If you try and load in a set of dependencies of the circular dependency, it will just go to circular dependency and quit with you and showing you the cycle so you can break it. The auto-detector by default will not make circular dependencies. This has been most of the release blockers for the last three months. It turns out that following keys and proxy objects and custom user models will get delayed to a very difficult. But essentially, in turn, it looks like this. Like this is a particularly complex one for a small example, but you've got three apps here and a fourth app that's called MyApp, OtherApp, App3. So the leaves are all the zero zero ones. So the leaves are the top ones on each column and then the roots are the bottom ones. And as you can see here, like if you wanted to apply OtherApp2, you've got to apply like a good, basically all the migrations in that tree first in a certain order to get to that point. And so the loader and the graph will tell you these things and more importantly, they'll load them from disk. The loader also loads from the database. It sort of goes into a table we have called Django migrations and reads out the applied set of migrations and it uses that to inform this part of the graph. A useful feature that isn't illustrated here is that there's also support for having migrations that replace other sets of migrations. So if you want to shrink down again, so you've got like hundreds of migrations, you can merge them all into one and then declare this one migration replaces all of these. And then the loader will intelligently swap in and out. So if you're midway through that set, it won't swap it in because you can't be halfway through a migration. If you are below or above the set, it will swap it in and reduce the tree down. So it's very intelligent to sort of give you that stuff and the commands squash migrations make those files for you at least to a limited extent. Hopefully in 1.8 we're gonna get a better version with better detection, but it's a good start and in much better than Souths, blow everything away, run fake zero and everything and start again. The auto detector is the most complex part of Django migrations and the one that I've rewritten at least once during the beta phase of Django. So the basic thing here is it takes two different states, your current project state, so state can have a thing where it says, hi state, make yourself from Django from the current models, it's very simple. And then it takes the final state for migrations branches. So it sort of runs through all the memory, it goes up, here's a state and then it compares them. And then from that it outputs a set of operations and a set of migrations and dependencies between migrations, all that kind of stuff. One of the problems is that there is a very complex dependency set between migrations. In initial Django and SyncDB in 1.6 and below, Django just makes all the tables in one big transaction and then declares foreign keys as well. We'll do them at the end. And so it can sort of get away with a lot of stuff at that point. Because migrations forces you to have these separate componentized things and you can run them whenever you like, we can't just leave foreign keys till the end as it were. Like the end might not be for another couple of weeks in that case. And so we have this very rigid dependency scheme where we go, okay, we have this foreign key points to here and this foreign key points to here. So you have to run this migration first and this one second and this one third. And it gets more complex when you have circular rings of foreign keys. Like say I have two apps that foreign key to each other. How do I do that? In the old stuff they'd be made at the same time but if they're in different apps, one comes before the other. And so the audit sector has to split that into make this model, make this model of the foreign key and a foreign key to this model. So it sort of gets a sort of cycle, like a non-cycle thing like this. And so that's sort of the difficulty here. The sort of brief way it works is that we take a very dumb diff. We sort of take the two states, okay, here's a set of models that have changed which is like an intersection of ones that exist. Here's one that's been removed, set difference. Here's one that's been added, another set difference. Here's some fields that were changed. And then we take all those, we dump out a raw set of operations, not any order, just a generalised order onto a big list in the audit sector. And then the operations themselves have individual dependencies. So things like a foreign key will say I depend on the other end of me being created. Or things like having a model will say I depend on, optionally, my same name of me being removed first and things like that. And a big bit of code goes through and rearrange everything to try and satisfy the constraints and dependencies. It pretty much always works now. There are some very edge cases that are either bugs or some cases aren't supported because of custom models. And you end up with this sort of nice set of migrations that are in dependency order with defined dependencies that spit out. There is a whole talk on how this works. I will give at some point. If you really want to find some nasty code, it's beautiful code, but also hard to understand to work on, that's the base to go and look. The optimizer is another complex part of this. The optimizer takes a set of migrations and returns you a smaller set, hopefully with the same effect. It can just return you the same set and go I can't do anything with this. It urns inside of caution. But the idea is that when you're squashing migrations and also during audit detection, which is very verbose, like audit detection outputs a add field for every foreign key. And then the optimizer comes along and says, well, you've just got add model, add foreign key, add foreign key, add foreign key and squash them into one add model with foreign keys. So what it does, it sort of takes your list of operations, sort of a big list of things in order, and it steps over them pairwise. It goes one, steps over them like this, then two, steps over them like this. And then if it finds a pair that matches certain patterns, so for example, if it finds an add field and a delete field, okay, those two, I know I could optimize a way to nothing if the names match. If it finds an add model and an add field, it goes, oh, those two can also, like those two are on the same model, I could optimize in a way if I could try. So here, let's say the green one here is an add model and the red one is an add field. So it's going along, the middle one is like, oh, it's add fields on the model, we don't know that. And it's like, oh, okay, this is two pairs that we could potentially optimize. So it's like, okay, then the models match, the field is not already in there, that's good. And then we have to look at all the intervening operations to make sure we can pass through them. So for example, if my add field is add a foreign key to a model and the middle one is create that model, I can't push the add field through the creation of a model, it won't work afterwards. And so there's all these checks, like, well, can I push through the individual operations? And if it can, we can replace those two things with one and then restart the thing again. It keeps looping and looping until it has no result and then returns. This is always not perfect, it is on the side of not reducing because that's a safer way. We know what was there work before. It can fall over on very, very, very complex stuff, but I've not seen any bug reports for a couple of months, so it's probably pretty good on that one. But nonetheless, this is somewhere that could do with an improvement. If you're very good at optimization stuff, please help me with this bit. I would love to have it optimized, I'd be even more intelligent, even better at optimizing. There are some things that just misses that it should be catching. The final part of this in terms of design is a brand new format. I'm sure many of you are familiar with this. This is, I'm not sure what that's like, size one font. If you can't see it, which you can't, the top section there is the actual concept of migration. There's a forward and backwards method. I think it's just, yeah, it's just a create table. This migration is making one model. That's all it's doing. You have this tiny, tiny bit of actions at the top and then this giant chunk of frozen error at the bottom. The reason that's there is that what South does is for every single migration, South dumps the entire historical state of models serialized into a big DIT. When we want to load that stuff back up again, we go, okay, let's read the DIT, load it into memory and then go from there. This is what the new format looks like. Notice it's much bigger. You can see it from back of the hall. What we're doing here is we say, well, create model. That's what we're doing. Don't repeat yourself. Don't have all the stuff at the bottom. The key thing we can do here, let's go forward there, is we have in memory running. What happens here is that we can take a state and we can apply the migration operations in memory to these states and then end up with the results. And we can have models at any point in history and we can run from those. And so we don't need that big DIT anymore. We can just derive that DIT from all the previous operations in history. And it's almost as fast too. That's a fantastic way of doing it. So I'll quickly go through how the two commands run. Two main commands are make migrations, which is the old schema migration, data migration commands. The reason it's got the pearls that make migrations will make a migration and all its dependent migrations. So you can parse that app name. It will try and limit itself. But if you have a migration that needs another migration and say off or use or something, it will make that migration as well because it knows that it needs the dependencies. South would give you a migration and get everybody like, it's not gonna apply, good luck. So it didn't have a dependency resolver. Migration is does. So what it does, it takes your two states, pass them to your detector. So basically the number two there is going and getting a state from disk. Number three is getting one from the project. And then it takes them, it auto-text them into a big list. It optimizes them so it has a much shorter list. And then it passed them to a writer. And the writer is a small class that can take a list of operations and write out a Python file in pretty decent like indented nice code that you could understand as a person rather than just a big lump of stuff. Migrations are meant to be human readable and human editable as well. Migrate then takes those different things. It passes them to, it sort of goes to the loader and says hi, load migrations, loader goes to the disk and finds the stuff and brings it out again. And then it gets a big list of migrations. It runs through them one at a time, has a transaction, then runs the operations individually. The operation is actually just called the schema editor individually. And then once it's all done, there's a thing called recorder, which I haven't mentioned here, but it's very simple. It just says, you know, it has methods like, multiple is applied, multiple is unapplied, what's been marked as applied? And so it just says applied, applied, applied, apply. And we store that in the database individually in multiDB too. So if you've got more than one database, you have to run migrate on each individual database. On the plus side, the database routers to do now have allow migrate, which is on and by the operation. So you can individually turn on and off migration to different databases. So sort of the thing I want to address here at the end is what went wrong? Like the design is fantastic and wonderful, but like the real juicy bits here, what was like, ah, for like three months there. So the first thing, and the thing I really don't like, and no sort of rut here who may have been involved in this, is swappable models. Now for those who aren't familiar, I imagine you are, swappable models are when you can replace, in this case, only this model one is auth.user. You say, okay, auth.user, it's actually now this other model. And inside Django, all the foreign keys repoint, all the constraints repoint, or like if you try and get the model, it just turns into your model, which is fantastic. Except for dependency graphs. So normally, this is fantastic. The problem is, that has a setting, and the setting is outside of migration. So depending on the setting is, I can just change to this, and then it just goes crazy, and then I run like this, and it's just, ah, I don't know. And suddenly there's a giant rubbed up on tab bridge, which is, and that's what I want. And so the problem is like, the dependency graph with this setting changes at runtime, and there's no way of telling what it was before. Like, we can't even tell those error, because like, as far as Django is concerned, the previous history is gone. There's no reference to it. Like, we didn't know it used to be a different setting. And so in the case of auth.user, we have a few workarounds in Django that support this stuff, and we generally do a pretty good job, and if you point dependencies at it, we try and resolve them. But you can still get loops out of the auto detector. If you auto detect with one setting, and then change it, then there's no guarantee the resulting migrations are cycle-free in dependencies. Unmigrated apps are a big issue. So the problem here is that we still run SyncDB apps, and the code still exists until 2.0. And so manage.migrate just runs the old SyncDB code first, and then runs migration second. This is fine as you've got foreign keys between apps that are in different sections. So the choice was, do we not allow you to have foreign keys from migrated apps into unmigrated apps, meaning that we'd have to run migrations first so that they were there for the other way around, or do we not allow keys from unmigrated into migrated, meaning we had to run unmigrated first? The decision we have is that you can't have foreign keys from unmigrated apps to migrated apps. This is a little problem because we moved all of the core contra-baps to have migrations. So if you depend on those, you need migrations, so about that. But on the plus side, this is much better for the way forward. We, like, if it was all the way around, then it would be this horrible thing where you had to bring in one app with this corruptural project. You couldn't do any changes at all. It's a little annoying, but adding migrations to an existing app is very, very trivial, even easier than it was in the south. Like, there is, I think this much documentation on the page, like, add a migration, and then Django auto-applies it, because it knows a bit, like, Django reads it, says, oh, there's some great models in here. We've already got those tables, and marked it as applied, so it does that bit for you. So you can just ship a migration, in fact, 1.7 ships with migrations for every core contra-bap. When you run migrate, they'll auto-apply themselves the first time, because they know that they're already there. And in fact, when you run it in 1.7, at 1.8 even, which is sort of the current dev trunk, you run migrations, and it sort of auto-applies, and then improves the length of your email fields and users, it sort of, it fixes all the old, like, all these low-number bugs we couldn't fix for ages. We now ship migrations to what stuff in core, so that's fantastic. Test persistence is another one, and particularly on MySQL. I'm not a big fan of MySQL. Test persistence in Django, we promise you that between every test, you will have the same starting state. In particular, the data you loaded at the start will always be there. That was great with the old stuff, where SyncDB, all you could do was load stuff in initial data, so we just replayed those. With migrations, you have data migration, so you can make stuff whenever you like, we can't even tell what happens. And so on things like Postgres, where we can roll transactions back, we go, okay, apply the test, roll that transaction. Apply the test, roll that transaction. But on MyISAM, and things where you can't do, and transacting test cases, we can't do that, there aren't any transactions, we can't roll it back. So what we do, huh, is, after we migrate the start of your test run, we then dump data into memory, a copy of all the data from the migrations, and then we load from that for every test after a flush. So it's a little bit slower. Notably, the tests are three times slower when it turned on, but it turns itself off in Django Core for things that don't need it. But if you're having issues with test persistence, those are now our setting and tests you should turn on that says, I actually do want this, I don't want this, to sort of tell Django to speed up or slow down as required. I perhaps was not terribly professional and didn't read the docs when I was doing migrations. I forgot a few meta options. I forgot all with respect to Existed and had to add it during the beta phase, and a few other ones, but Django has a lot of sort of like slightly odd meta options of like, oh, we have this thing where you can just make foreign keys be orderable, or this other thing where, you know, so a lot of the options had to go through and add a second run of like, well, we need to support this, we need to support chaining this stuff and different attributes as well. And finally, proxy models were the last thing. So proxy models are this thing where you can have a model, but it's a proxy to another model. And it has no table of its own, it has no sort of, it has no, it's nearly an extra code method, it's basically an extra sort of object. And so to migrations, they're basically worthless. Like they don't have a table, so they don't have to do anything with them at all. But you can foreign key to them, and so we need them around to sort of point foreign keys out and sort of just say, you point at that empty thing and be on with it. And so I had to go through it and had a separate set of things that said, well, if this model being added has proxy true, then skip all of this database part and then just leave it lying around. And that's sort of another sort of, the gotchas that I forgot about. Like, you know, I had assumed, during the development of this, that proxy models would be useless because you couldn't, they weren't in database stuff with them, but I forgot you'd foreign key to them. So, you know, it bears in mind that you could think about all the things Django offers to start with. And, but we're there. This slide was written before we released 1.7. I'm very glad it's still, it's actually in date now, I'm presenting it. 1.7 is out now, it has migrations. Please use them. Please report any doubts as you find them, but like, I am pretty confident in the quality of this release. But we have gone through a very painful, like three, six months of bug reports and fixing and really sort of hardening that stuff up. Like, migrations are fantastic. I hope you will use them. I hope that you upgrade to third party apps to use them. If you need any help, I'm always around, like I'm an IRC, you can email me, you can email Django developers. I want to sort of get this painful transition period of going to migrations over and then we can all live in the wonderful future of migrations. Thank you very much.