 Hello everyone, welcome, can you hear me, okay? Awesome, so a little bit about me, just a little bit, I'm Reb Cwok on GitHub, Twitter and most other places, I'm a software developer at Ecometrica, we're here in Edinburgh, we are hiring but only in Montreal at the moment, but if you're interested then come talk to me later. I'm also an ex-psychologist, I haven't spoken at a conference since my academic days which were a long time ago, so bear with me if I'm a little bit rusty. So this talks about how we as an individual company went about upgrading a large established Python 2.7 code base to Python 3.6. I'll talk about some of the available tools that are out there to help you, I'll talk a bit about our experience and our general approach and some of the gotchas and pitfalls that we encountered along the way. This was only our approach, it's certainly not the only way to do it, by no means telling you that it's the right way to go about it and I'm not going to attempt to give you a single best way of approaching the problem. Instead I just hope to give you a useful case study and share with you some of the lessons that we learnt. So why did we and why should you even want to upgrade to Python 3? Well the main reason obviously is that the deadline for dropping Python 2.7 support is approaching pretty fast, January 2020, so less than a year and a half away now. Major projects are also either dropping or planning to drop support for Python 2, including Django, NumPy, SciPy, Pandas, but there are lots of others. So in the not too far off future you're going to be stuck with old versions that are only being bug-fixed if you're lucky and you don't get any new features. So in terms of motivation to upgrade, that's the stick. What's the carrot? Why should you want to embrace Python 3 rather than just grumbling about how annoying it is you're being pushed into it? That word and has filled at least another few talks, so I'm just going to touch on a few highlights here. I've got some references at the end of the slides that go into more detail if you're interested in that. So first is that Unicode thing. So Python 3 gets rid of the overloaded string type where string objects can represent either textual or binary data. This link here is a nice description of how it came about why Python 3 largely exists to fix that. In Python 3, string is always a text string and it's Unicode by default. There's some better iterations. So in Python 2 you have a lot of pairs of functions that do the same thing except that one's eager and one's lazy. Python 3 eliminates all of the lazy versions and instead makes everything lazy. So everything's an iterator. Iterating over them works exactly the same way, but it no longer creates an intermediate list, so it makes it harder to write code that accidentally uses up lots of memory. We also have some restrictions on comparators, so you now can't do nonsense comparisons between different types. Incidentally, foo is greater than four according to Python 2. We get some advanced unpacking. I won't go into detail on this, but in Python 3 you get the nice star notation for unpacking both iterables and especially dictionaries. So this one I came across quite recently, but it's a nice way of making new dictionaries from existing ones. We get the option of keyword only arguments in Python 3. So this is a function with two positional and one keyword arguments. This is how we would do it in Python 2. And any of these three methods of calling it would be valid, but they might not do what you expected them to. You can use the same definition in Python 3, but you can optionally add this star argument, and that means that the keyword argument that follows it has to be called by name. So now only that first method of calling it is valid. So when you use keyword only arguments, you can avoid accidentally passing too many arguments to a function and then having them misinterpreted as the keyword argument. F strings are awesome, and are totally the reason to go all the way to Python 3.6, and the reason that we did go to Python 3.6. As well as the just variable substitution, they can contain any Python expressions, including method and function calls. They're more reasonable, they're more concise, they're less printer error, and they're also faster than other ways of formatting strings. And then there's async.io, which is the new concurrency module that's been introduced in Python 3.4. I'm not going to say much more than that about it, because I don't know a lot more than that about it, but I'm told it's very cool. So a little bit about the project that we're dealing with. So Ecometrica's mapping platform is a big Django project that does some cool stuff with GIS data, and some of my colleagues will tell you more about that if you would like to know. It works with GDAL and other underlying GIS libraries to import and transform mapping data sets, display areas of interest on an interactive map interface, and run user defined queries across multiple data layers. It's about eight years old. It consists of around 70,000 lines of Python code in the core project. And it has a bunch of dependencies, including some of the typically temperamental GIS ones. This is what it looks like. We're looking at national parks here in the Amazon highlighted in purple. So we can upload and show display layers like this one, which shows land cover in 2010. You can explore individual areas of interest and have a look at results of some user defined questions based on information from raster data sets like carbon density, biomass within an area, or how land use has changed over time, and there's a number of ways that those can be displayed. So there are a bunch of useful tools that are out there to help you with your Python 3 upgrade. I'll take a look at a few of them, but this isn't by any means an exhaustive list. There's lots of help out there. These are just some of the ones that we used. So first up, you want to upgrade your project. That's all well and good. But what about all your dependencies? Will they still work when you upgrade? And there's a quick way to do a first check, and that's the can I use Python 3 package, which does what it says on the tin. So you pip install it, and then you just run it on your dependencies from the command line in a variety of different ways. Can I use Python 3? You realise on projects being classified on PyPI is supporting at least one version of Python 3. So it's not perfect. It depends on you saying that you're Python 3 compatible. Otherwise it won't find it. So next, there's a tool called 2-3, which I think most people will have heard about. It's usually installed with the Python interpreter as a script, and it reads Python 2 code and applies a series of fixes to transform it into valid Python 3 code. So here is an example of a little Python 2.7 program that just takes some input from the command line and says hello to you. Welcome to Europe Python, whatever year you want. So we just run it from the command line with a list of files or directories to transform. So 2-3, welcome.py, and 2-3 outputs a diff of the fixes that it's going to make for Python 3. So you can see here it's identified print statements, and the raw input that changes to input in Python 3, and it also picked up the change to the exception syntax. So that's a useful first start for us. Linting can also help you. So PyLint has a Py3K flag, which will highlight Python 3 incompatible code. So when you run it on our little example program, it prints out a list of identified Python 3 issues. Note that it identified this one, which 2-3 didn't. So it's neither of them are perfect, and you'd still need to review your code, but they can help. So it's briefly a side note about supporting Python 2 and 3. So our main project, so our mapping project, is end user software. It doesn't need to support external developers, and we were happy to go Python 3 all the way with it, but we did have external dependencies, and those need to continue to support both Python 2 and 3. And there are tools to help you with that too. So future in 6 are libraries that provide utilities for writing Python 2 and 3 compatible code. Modernize is built on top of 2-3. It's used in a very similar way to 2-3, but it's more conservative. So it uses 6 to try and fix up code to be both Python 2 and 3 compatible rather than just changing it all to Python 3. Tox is helpful to let you run your tests with specific environments, so you can make sure that your tests are going to run under every version of Python that you plan to support. And the Python docs and Django docs also have a lot of useful information on porting to Python 3, but are still maintaining compatibility with Python 2. So going on to what we actually did. First things first, we needed Python 3 on our system. We were on Ubuntu 16.04. That ships with Python 3.5, but we wanted Python 3.6 because of strings. So there was a little bit more setup involved, but only a little bit. So we had to install these additional packages, but that was pretty much it. And other than that, we were using virtualMf with virtualMfrapper, so we just specify our Python 3 version when we create the virtualMf. And there really wasn't that much else that we had to change in our deployment process. That pretty much covered it. So the first thing that we really needed to do in terms of upgrading the code itself was some research. So you need to learn about the differences, the main differences between Python 2 and 3. The unicode issue is the one that everyone knows about, but there are lots of others, and it's worth reading up on the differences before you start. Python3porting.com has a free online book. It has guidance to porting to Python 3 and a pretty comprehensive description of the differences. And the Python Future Projects cheat sheet, which is here, is also a useful reference. So next we had a look at the project's test coverage. So we're going to use our unit test as a tool to help figure out whether our upgraded code is working. So it's important to have decent test coverage before we started. Ours could have been better, but it was respectable. So we didn't spend a lot of time improving the test coverage specifically for doing this process. So dependencies, this is the one that tends to put people off upgrading. So we want to use Python 3, but we rely on external dependencies, and dependency X, Y and Z doesn't support Python 3 yet, so we just give up until they do. And for quite a long while, that's as far as we got. We periodically checked the blocking requirements and just put stuff off until the list looked better. But by the end of last year, our list-dependent dependencies was looking kind of manageable. More and more packages are supporting Python 3. So this is the result when we ran, cannot use Python 3. It doesn't look so good. We've got 11 projects blocking our Python 3 upgrades. We had about 75 total in our requirements file, so it could have been worse, but still. But things start to look up a bit when we look at the list in a bit more detail. So three of these are things that we didn't use anymore, so we just took them out. Another four were, they showed up because they don't have Python versions identified in the classifiers on PyPI, so they weren't correctly identified by Can I Use Python 3, but they did all that Python 3 supported versions that we could upgrade to. There was Python Scrubber. We took that out too. That wasn't compatible, but it also wasn't a very active package, and it's a bit out of date, and there was another more up-to-date package we could replace it with that did the same thing and was compatible, so we did that instead. Django Migration Test Case. That one also wasn't compatible at the time, but hopefully someone had already made a PR to upgrade it, so we used that. Then we had YAS 3FS. YAS 3FS is a package for syncing your local files with S3. That did cause us some issues. It looked good to start with. There was a poor request supporting Python 3 that had been merged, but it turned out only to address a few fixes, so we added the remaining Python 3 support to that package. Then the last one on here, Django hashed file name storage. That's a library that we maintain at Ecometrica, so our bad for not upgrading it sooner. The same went for a couple of other dependencies that we installed from private repos, so we upgraded those ourselves. We maintained Python 2 compatibility for other users. We added CI and Tox to make sure that we're testing under multiple Python versions, and we keep our compatibility. In the end, we whittled this list down to only a few that really needed to put any significant effort into fixing up. Next was the exciting bit, fixing the code. Actually updating the code is quite a daunting process to start, because you know that your changes are going to be so widespread throughout the project. In this respect, it was nice to be working with the Django project where the code was mostly divided up nicely into Django apps. We worked app by app. The first thing we did was to run 2 to 3 on the entire app, keeping the backup files that 2 to 3 generates so that we could easily check back on the previous version of the code. Pretty much, we just accepted all the changes that it suggested, and then we ran the tests on just that app, fixed the code as necessary until the tests were passing, and committed the changes app by app, made things a little bit easier for code review. Then we ran Django, something invariably broke. We fixed it again until it ran properly. We ran the application so we could manually check the functionality of that app, and then we proceeded to the next app. That got our code mostly working, but the next step was to review it and refactor things. Here again, committing app by app was useful. It helped to keep things together. It also made it easier for other people to code review. My code reviews are actually in this room and will attest to the fact that it was still pretty horrible to do, but in the previous step, we also just fixed up code until the tests worked. Now, what we needed to do was to review the diffs more carefully, in particular, to fix up two to three's over-conservativeness. Two to three is designed to convert Python 2 code to be valid Python 3 code for any version of Python. In some cases, it may add extra code that you don't actually want. The main cases we found of this were converting new iterators to lists unnecessarily. Whether you need to convert to a list depends on your current use case, two to three tends to be over-conservative and wrap everything in list when it isn't necessarily needed. It will also sometimes wrap print statements with extra parentheses, especially if you've got print statements that have been expressed as functions already. There's also the specific case of is callable that was initially removed in Python 3 and reintroduced in Python 3.2, so it doesn't need to be replaced for newer Python versions, but two to three still does it. Then we did quite a lot of refactoring, so especially places where we've been doing manual byte to string conversions, we've sometimes got a bit convoluted because we've just done what was necessary to get tests to pass and get the app running. With a bit more attention, Python 3 generally allowed us to simplify things quite a lot, and then just as a warning, if you use from future imports unicode literals, it helps to keep your Python 2 and 3 compatibility, but it does sometimes introduce some sort of subtle issues. Python Future Project has quite a good review of the pros and cons of using that for two to three compatibility. Next up is linting. We didn't actually do this. I wish we had, but I didn't know about it and didn't discover it until later. It would have definitely avoided a few issues, but once your porting is done, you can run pylint with the Py3K flag, which will highlight some Python 3 incompatible codes that your tests might not have found, and then use the testing. We dedicated quite a lot of time to front-end manual testing. It's tedious, but it did find issues that our unit tests didn't, and it's also useful for us to have our GIS specialists who are familiar with the platform data review it, and make sure that process data sets look like they should and queries generated expected results. And then everything was going so well. We thought we were more or less done, and we ran into one final hurdle, which is this library called Gdalt MbTiles. So Gdalt MbTiles is a library that generates mapping tiles from georeference files, and lets you display them with a mapping library like Mapbox. It has some extra fiddliness around installing. We install it separately in our deployment steps. It's also minimally used in the mapping project. It's kind of used in a side one, so it slipped onto the radar when we were assessing dependencies and during our initial testing, and upgrading it turned out to be a mammoth task that I don't have time to go into. But just when we thought we were more or less done. So the moral of the story is check all your dependencies no matter where they're coming from. So there were a bunch of gotchas that we encountered, things that tripped us up along the way. Most of them were a result of lack of thoroughness in the first step when we should have been learning about the Python 2 to 3 differences, but some are maybe a little bit less immediately obvious, a little bit more obscure, not necessarily identified by things like 2 to 3. One is rounding. So the rounding strategies changed in Python 3. Python 2, it works the way you were taught at school. Exact halves are rounded away from 0. So rounding 2.5 gives you 3, rounding 3.5 gives you 4. In Python 3, that's changed and exact halves are rounded to the nearest even. This is bankers rounding. The advantage is supposed to be that it's unbiased, so it produces better results than with operations that involve rounding, whereas the old way is biased towards the upper value. But now rounding 2.5 will give you 2. 3.5 still gives you 4, but it may introduce bugs that you didn't necessarily expect. Exceptions. The exception.message no longer exists. Exceptions, if your tests don't actually check for every exception that you have, then you may miss them. This doesn't get picked up by 2 to 3. If you've got custom defined exceptions, they may have a message attribute. Django has some. So you kind of need to check anything that's not a core exception and find out whether the message attribute is valid or not. Hash. So in Python 3.3 and up, the inbuilt hash function uses a random seed for each Python process, and that means that hash returns different values in different Python processes. That was introduced to address the security vulnerability. So while you can turn it off, you really shouldn't. And in a few places, we were using hash on cash keys, which meant that whenever you had a new Python process, you didn't find your your cached items anymore. Pickle also turned out to be a problem. So objects that are pickled in Python 2 give you unicode errors when you try to unpickle them in Python 3. The pickle protocols also change. So you have 0 to 2 and Python 2 is 0 to 4 in Python 3. So if you need to load objects pickled in Python 2 and Python 3, you have to make sure you specify the right protocol. And if you're using Django Redis, that defaults to the latest protocol. So if you're using the default, it won't work in Python 3. Sorting and comparing things, I don't really have time to go through that much, but you now need to be using the same type in Python 3. You'll get errors if you don't. And it can sometimes give you some odd bugs, that you didn't expect. So this is just from a rough estimate of the git commits on the core mapping projects. So for lessons learned from this, well upgrading any project to Python 3 is going to be hard work. It doesn't have to be too painful. It went more smoothly than we expected really once we got started. If you're not quite ready to embark on your Python 3 upgrade yet, you can make your Python 2 code Python 3 compatible as much as possible. And in the more recent parts of the code base where we did this upgrading was much simpler. Being familiar with the changes is really useful. It also lets you know what new things you can take advantage of. 2 to 3 is really good. It's a fantastic tool, but it can only do so much, so you really need to review everything that it does. You can't rely on it to find everything. Tests to your friend. If your test suite covers all your major code paths, then you can be reasonably confident that your code is working. Check all your dependencies, not just the ones in your requirement file. And lastly, be prepared to spend some time upgrading third party libraries. Don't give up or justify putting off your upgrade just because the maintainers haven't done it for you yet. That's it for me. Thank you. There's some resources on things that I didn't go into in much detail, and I'll upload the slides later.