 Welcome, everyone. This is the last talk before the lunch break. So if we just on the way. Introducing Tim Mitchell. He works with Aaron's Geo. Company based in Christchurch. So it says home turf here. And I could see you have quite a turnout. And Aaron is also looking for Python developers. So if you haven't heard about it, talk to these guys in these t-shirts. They're looking. Really interesting stuff. And yeah, Tim is going to talk about database migrations. If you haven't done it, it's a world of pain. So you're in for a treat. So take it away. Thank you. Yeah, as the slide says, I'm Tim Mitchell. I work for Aaron's Geo. I'm a senior software engineer. I've been there for about 10 years now. Let's not do anything. How do I make it go to the next slide? Right. We make geological modeling software. We have a whole family of software with three different products. All under the leapfrog brand. And we do most of our work with Python with some C and Python for the heavy lifting. You can't hear me? I can't do anything about that. Apart from a look down when I talk. Hello. And hello. I've been working with databases since last year. Speak up. You want me to shout? Okay. So we started on a new project last year. And it's when I started working with our real databases. So, overview of the talk. First I'm going to set the scene of my working environment, which I use Alembic for database migrations. Hang on, where's my mouse gone? And give you an example, schema migration and some of the things we've learnt to do. I go beyond the tutorials that you find on the web. Some tips from our experience using Alembic and also the really good news about how the pain has got less. And how our processes have changed since we've been using Alembic. Okay, the situation is we're working on a relatively young product and we're adding features to that product as fast as we can. And so, because of that, there are frequent schema changes and hence many schema versions. So, our test data is at different schema versions because they're written at different times. We have my databases on all the dev machines. You know, they're at their own versions for whatever branch they're working on. We have VMs for the testers running and they are at whatever version they're on. And we need to manage that whole mess of different versions left, right and center. And so, the situation for us was that migrations were giving us migraines. You start up your application and then you find that the database didn't have this table. It was quite a nightmare to manage. Right, a little bit about Alembic. It's obviously a database migration tool. It's written by the same author as Esco Alchemy which is very popular. It's very straightforward to set up. It's got good documentation. The user form is really good to respond to questions really quickly. The development status is only four in beta but we've only had one or two bugs in Alembic in the time we've been using it and I think it's more reflection on the capabilities at this point than the reliability. And the last thing is, if you're going to use Alembic that means the OOM or rather your source code is the source of truth. So if there's a table in your source code and there's no table in your database then it's the database that's wrong. That's not always the case in everyone's setup. Okay, getting Alembic set up is pretty well documented. I'm just going to mention a few things that we need to know for this talk. Alembic installs an executable into the scripts folder and this executable does pretty much everything you need to do. It has an init command that you can see there and you pass to it a folder where you want to keep the files and it will create this folder structure. The Alembic INI file simply points you to the folder that you specified and has your connection string to your database. The inf.py sets up the actual migration environment for Alembic with your OOM tables and other settings. There's a readme file for you to delete, Mako script for templating the creation of your revision scripts and there's a versions folder and this versions folder contains all your revision steps, upgrade steps as you make them. We've missed a slide. Yay, there it goes. So for our example migration today we're going to start with this simple OOM. It's got one table, it's got two columns an id and a name and our table has some data in it. Some test data, first customer's Mickey Mouse, the second customer is Donald Duck. Now we want to change our schema. The first thing you do is you edit your code to say what you want it to be. We've changed the single name column to be a first name and a second name. However, our database still looks like this. So what we need to do is write a script that migrates our database to match our code. This is what Alembic does. So using the executable you use the revision command and you can pass it this wonderful flag and auto generate. And what it does is it looks at what's in your code, looks at what's in the database, figures out what the differences are and writes a script for you. And it also lets you specify a short comment to say what you're doing. And it's going to enter the version folder, put this file starting with a revision identifier and your comment. And there it has everything you need to do to generate your database. It's written in Alembic operations, which means it can work on multiple backends. So if you need to support SQL server and MySQL and PostSquares, then this will generate the correct code, the correct SQL for all those database backends. There's also a downgrade function, which doesn't fit on the slide, if you need to roll back your database. Now, all we need to do is run the upgrade command. And so the upgrade, the head, just specifies upgrade to the latest revision of your database. We could put in that long UID if we wanted. And you get kabong! And this doesn't happen in the tutorial, but it does happen in real life. And to be fair, the data part of the migration is quite hard to document because it's different in every situation. So what do we do? This is reality, but it's not under documentation. We need to split our migration into three steps. First, we need to add the new columns, but they need to be nullable because they're going to start off having nulls in them. Then we can add the data to the new columns, moving it out of the name column. And then we can alter the columns to be not nullable and drop the column name. For those who are asleep, I'm going to say it again with pictures. First thing we're going to do to our database is make it look like this. We're going to add nullable columns. Then we're going to copy the data out of the name column into the first name and last name columns. And then we're going to delete the name columns and change them to be not nullable. Now, SQL Alchemy, however, wrote me add column nullable. What we want... It's not SQL Alchemy, Aliembic. What we want Aliembic to do is to write a nullable column followed by an alter column statement. Otherwise, we have to do it ourselves. Thankfully, Aliembic will do this for us. We just need to rewrite here nullable columns are written into our upgrade script. This has been copied directly from the docs. So Aliembic has the concept of a rewriter. You can override how it writes an add column operation. And you can see in the code there if the column is nullable, then just do it. Otherwise, change the column to be nullable and add an alter column operation. This goes into our m.py file. We need to only do it once. We need to do it after all our nullable columns get rendered with two statements. Rewriters are also very useful for other things such as setting permissions when you add a new table and other tasks like that. So now our migration script looks like this. We have our add columns, our nullable, then followed by our alter columns. And all we need to do is add our data migration. Right. We'll just write some normal code. The first thing you need to do is get a connection to your database so the op object has a get bind which gets you the SQL Alchemy connection. And then you can use the execute method on that to do whatever you like. So this is pretty straightforward. We iterate over all the customers, we split the name, and we update the votes. Now we're going back to migrations. We can upgrade our database to the header vision and it prints you up. It's a useful little message and that's the job done. Quite straightforward. And our database now looks like this. What we also have is an Alembic version table, which Alembic puts in by itself. You can control where it's put. And it stores for you the version of your database. This is really helpful so that your database now knows what version it's at and can know which upgrade steps it needs to run and which ones it doesn't. Now, in some situations, workplaces, you need to get your DBA to migrate your databases. And so you can't pass your script and say, run this Python against the database, please. He's going to say, no way. But that's all right. Alembic knows about database administrators and there's a simple SQL option and so instead of actually migrating your database, it's just kind of like your script. Easy peasy. Not. No type object is not iterable. We don't have a database connection. We're just iterating a script. So we can't iterate over the rows of the table, as you see here. So rule number one when writing data migrations is that you can't actually use the results of a select statement. This is a good thing in reality because otherwise your upgrade step would upgrade the first customer to be Mickey Mouse and your second customer to be Donald Duck and would not migrate the rest of your customers. So it's quite simple in this situation to rewrite your upgrade step to something that doesn't use a for loop. We're using this instra and the substra, SQL commands, an SQLite, and away we go. We can run our upgrade commands. It generates a script. You can give that to your DBA and that's your job done. There's a couple of tips for using Alembic that we have found. If you trawl through the docs, you find you can actually run it programmatically and actually it's really straightforward. You need to create a configuration object from your Alembic Ioni file name. And then there's a command module which has all the commands that you have on the command line and they're very easy to use. And so we hardly ever use the executable. We run our own scripts. And the good thing about being able to customize how you run Alembic is that you don't need to use random UIDs for your identifiers. You set the start of your file names and you get a folder full of UIDs and you don't know what order they're in. But they're really good for uniqueness across branches so I can kind of understand why they've done it. But, you know, sequential is good for us. And so we changed our code to use sequential numbers one to 001, 002, 003 and so on and so forth. And then John on his branch would make 10 and I on my branch would make 10 and then you had two databases with version 10 and then you had a mess on your hands again. So when I get to round to committing my code we're going to start using three-digit numbers followed by a random bit. And this morning I thought to myself you could just use the comment string as well. So yes, do write your own new revision command that puts in a sequential revision identifier because then your upgrade steps are sorted in your IDE and in your file explorer. Yeah, that was all I wanted to talk about when I planned my talk but when I came to actually writing it the things I really wanted to say was actually how it had made my life so much better and easier. So what has changed for us since using Alembic? I used to do this all the time just blow my database away start with a new one while I'm writing my code and then I'd switch to another branch I have no idea what version that database was at because it's been the weekend so now I always stamp the database when you create one. That way you always know what version your database is and it helps you keep control and manage all the different copies of your database you have all over the place. We had, well we have, so integration tests which use SQL dumps for our test data to restore the database to a known state and every time you change the schema you have to recreate that data in the new schema. And that was actually as much work as the feature or bug that you were fixing and it was a real pain. Now we added the version number to our SQL dump so it knows what version of the schema it's on. When we restore the database we then run the upgrade steps. Now when I add a table I don't need to change the test data at all except for perhaps if I need a new test data for what I'm actually doing. That means your test data actually changes a lot less often and makes life a lot easier. It also means that your migration steps get tested as well which is excellent. For our testers they always used to use the restore utility that came with the database. We used Postgres and that was fine except that they had their own... they'd done some experiments, they'd done some stuff, they'd dumped it and they had dumped all these different versions and they'd restore it against your branch which didn't match and that was a pain in the butt. So now whenever we build our software we build a restore tool that comes with it and so when they're running your branch of the application they run your restore tool and it runs your migration steps and migrates their test data up to your version of the schema and it's made life a lot easier for our testers too. As I've already alluded to we had a lot of problems with running up the application it talks to the database it runs along fine for a while until it hits the bit that's not there and that happened quite a lot. Now, whenever we start up the application we check the schema and the database matches the schema in our OAM. We use Alembic for all this it has this lovely compare metadata command which produces a list of diffs which is exactly how the auto-generate command works for generating revision scripts and you just check if that length is zero. Just checking the version number is not enough because otherwise you'll change the table and the OAM go have coffee, come back, run in code the versions are the same but actually the schema is a different and you've just forgotten about it. So yes, Alembic has definitely taken the migraines out of migrations for us and I guess that property is not unique to Alembic but I hope some of these practices will be useful to you guys. That's all I have to say. Thank you for listening and do you have any questions? Yeah, thanks very much, Tim. Questions? You can make me run. Tim, you've obviously used this as you've been developing the software and you've been iterating so fast that the database has been changing but is the intention that it is also part of the release product for the users so that when you go from a version one to a version two that your users actually migrate their databases as well or is it just purely for internal development? Both. So at the moment that when we install our server on a machine if there's a database, when you configure it to a database server and there's a database there it'll upgrade it. So it's envisaged for some of our customers they'll just install the new software and it'll upgrade their database when they install it. We know that some of our other customers will have a DBA and I'll say no way, give us an SQL script, thank you and we'll use Alembic for that to generate the script for them. Yeah, so as you write the actual data migration as a SQL rather than using the IRM I guess there's no way to actually make that back-end neutral. You can use SQL Alchemy Core functions and methods but yes, for anything that's beyond a test example for a talk then you do usually end up having to use back-end specific code but again, you can with the Python switch on the dialect and you know, generate a script for Postgrease and generate a script for MySQL. Great, thank you. Does it do backward migration, handling rollbacks as well? Yes it does. I didn't show any of my slides because of space but yes it writes through this of an upgrade step. So there's downgrade steps in all the migrations file but when it comes to data involved and we have to manually do all that stuff we tend not to use them. Have you done any comparisons with like Django Migrate or South or anything like that? No, I haven't. We looked around we said, oh it's written by the same person let's give it a shot we haven't had reason to not use it. Yes, I haven't actually done any real comparisons. Any more questions? Well, I have a question for the audience. Who has used Alambeg? Hands up, keep your hands up. And we have like two shows and comment any further on the use? Yes, thank you for the presentation. I've only I guess used the outer wrapper, the vanilla flavor of it, so you've certainly opened my eyes to the scripting possibilities inside of it. I can see it being a lot more useful. Well guys, it's almost lunch. Thanks very much Tim. Thank you.