 So, good morning, everybody. Stefan Schwarzer is a Pythonist for 15 years. He has written articles and a book on Python. Also, he's a regular speaker at Linux and Python conferences. And also, he maintains the FTP Util library, which is quite handy. Thank you. Today, he's going to discuss problems and back practices for maintaining code for Python 2 and 3. It's going to be about, like, a 35-minute talk, and we have five minutes for Q&A afterwards. So please welcome Stefan. Thank you very much. Wonderful. Good morning. OK, I want to give the talk about supporting Python 2 and 3 with the same code. Maybe, for the interaction, something about me. OK, some things were already mentioned. But, OK, I have a degree in chemical engineering, but in 2000, I kind of switched sides. I mean, I'd been programming since I was 15 before. And in 2000, I became a full-time software developer since 2005. I am self-employed. I'm in charge of this FTP Util client library. And the starting point for this talk was that I was myself in the situation when users asked for Python 3 support in FTP Util. There was one ticket, and then at some point, there was a question on the mailing list. But I had shied away a bit from this, and so I went through. OK, last year, I had this FTP Util 3 zero release with Python 3 in addition to 2.6 and 2.7. OK, Python 2 or Python 3. I think most in the audience will actually have read this. This is from the Python website. When you go to the download section, there's a link, Python 2 or Python 3. And this is the wiki page you get when you click on that link. So it says Python 2 is legacy. I mean, it doesn't really say you shouldn't use Python 2 anymore. But yeah, it's strongly in the direction. Python 3 is the present and future of the language. OK, so is Python 2 obsolete? So in a way, it is, yes. On the other hand, it isn't, because it's very widely used. There's lots and lots of legacy code. And yeah, Python 2 is also the pre-installer of the version when you say, yeah, I'm installed Python or aptitude install Python, you get Python 2. Or it's even pre-installed, usually on Ubuntu and also Fedora and Red Hat. OK, and Python 3 is optional. Python Red Hat Enterprise Linux is only yet getting there. I mean, they have a Python 3 package now. But OK, also, if you want to host, if you buy hosting and don't do the hosting yourself, there are more hosting offers available for Python 2, I think. And many libraries don't have a Python 3 version yet. And I guess that's the reason why some of you are here, because you want to change that. OK, so my recommendation is use Python 3 if you can. And one thing I find very important is also if you migrate or adapt your library, you use the transition for others who say, OK, I still need your library to migrate myself or adapt myself, my software for Python 3. OK, there are different approaches. The first one, in the beginning, when Python 3 came out, the recommendation from the Python development team was use 2.2.3, write Python 2 code, convert this to 2.2.3 with this command line tool. Then later, there was some 3.2.2 tool. I think it's rarely used. But the idea is that Python 3 code usually looks a bit cleaner. And it's nicer to be able to maintain Python 3 code than to maintain Python 2 code. OK, but what it seems the Python world ended up doing or mostly doing is developing or having the same code for Python 2.2.3. So you don't have to run 2.2.3, neither the user needs to run this nor do you, during development, if you want to test for Python 2.3. So it's just always the same source code. One major problem, I mean, there are many things. If you look at what's new in Python 3.0, there's lots of stuff. But I think the most important thing when it comes to this adoption of the code for Python 3 is this bytes versus Unicode topic. And so I want to say maybe refresh something on this. OK, we have in both Python 2 and 3, we have bytes or bytes type or byte strings. I think the Python 2 terminology is mostly byte strings. And in Python 3, usually when you read documentation, it says bytes, it's not really byte strings. So because the intention is you don't use it for strings, not for corrector data, or only for encoded corrector data. OK, so these are the kind of raw bytes that you store on a disk or send over the socket when you need to make a decision now. I have Unicode, but it needs to go somewhere and I need to encode it. On the other hand, Unicode text represents corrector data where characters have number code, code points. And this is unrelated to how the characters are stored. So it's not just another corrector encoding, like Latin 1 or some code page. I don't know, yeah. But the characters are numbered, these code points. And at this point, it's unrelated to how this is later represented in bytes. OK, so this Unicode text can be encoded to bytes. Yeah, you can choose an encoding. Yeah, of course the encoding should support the characters you have in your string, in your Unicode string. So if you have, for example, Chinese characters, you can't encode this to Latin 1. OK, and this here at the bottom of the page, this is just one example. These would be the Unicode code points for this German word Hören to here. And the bytes, if you encode this Unicode string to UTF-8. So this one byte becomes these two bytes, otherwise it's unchanged. But this depends completely on the encoding you apply. Something that might be confusing while you are working on this adoption for Python 3 is that both Python 2 and 3 call their default string type. So if you write a byte literal in Python 3, this is the byte string type, the binary type here. And in Python 3, it's actually the Unicode type, the text type. Yeah, you also have the byte string type is named bytes in Python 3. And Unicode is the Unicode type in Python 2. OK, yeah, one major change. And this is, again, one of the difficulties for Python 3 support often is that in Python 2, you can just add a byte string and a Unicode string with this prefix. And mostly it works unless this byte string contains anything non-esky. Then you get a Unicode decode error dependent on the process data. So one day you run the code and it runs fine. The other day you read the text file which has an umlaut or something or some special character. And you get an exception. And Python 3, this has changed. And you don't have these implicit conversions anymore. Get a type error every time you want to add a byte string and a Unicode string, which I think mostly is a good thing. But yeah, but it makes this migration harder, this adoption for Python 3. In Python 3, you have to be explicit. And, for example, if you have a byte string with just this B prefix, you need to decode it explicitly. Then this becomes a Unicode string. And this is in Python 3. This is the Unicode literal as well. OK, but in actual code, you shouldn't usually do it like this. But I will talk about this later. OK, also in Python 3, I mean, the thing that I find logical that makes sense is that almost everything which took strings or byte strings in Python 2 now requires Unicode strings. So in Python 3, again, by default, this is a Unicode string. This works, but if you pass the constructor of decimal, a byte string, it complains and gives you a type error. There's also a new file API in Python 3. Here the thing is that in Python 2, if you open a text file, it still gives you byte strings. If you open a text file for reading, it always gives you byte strings. Whereas in Python 3, it gives you Unicode strings, really. In Python 2, it just does whether you choose text or binary, only changes the line ending conversions. And there's also normal file objects. The return value of open depends on the arguments of the open call. The good thing for this migration, for this adaption for Python 3 is that this open function, which is the build in open in Python 3, is also available in the IO module in Python 2.6 and 2.7. One thing that tripped me up in another small tool is that standard in and standard out, and argument values here, this is ARCV, are Unicode strings. I mean, this system, SCDN read, it gives you Unicode strings, and write requires a Unicode argument. But you can work around by using the buffer attributes, which give you the file object, which works with the raw bytes with the binary data. Okay, steps. So, no, the first part was introduction now, but how you actually do this migration or some tips or steps here. Okay, you should have automated unit tests, if possible unit tests. I like this keynote from Emily Bach on, I think, Tuesday, where she also mentioned approval testing, so if you don't have unit tests, but I recommend you have them, if you don't have unit tests, so you should at least have some automated tests that you can run on your code. The code, the unit test should pass 100% on a Python 2, and since you are just starting out supporting Python 3, you can't expect that they work on Python 3. Actually, lots of these tests will fail if you just do the experiment, and you have unit tests for a Python 2 version, and you run the same test with Python 3, you probably get lots of failures and errors, yeah. Okay, that's completely normal. Also, sometimes I notice that, for example, in FTPU, that most of these failures come from string literals in the test code, in the unit tests, so sometimes another thing is that sometimes you only need to change a few functions, which do the conversions in your code, and maybe even then lots of your tests pass again, even under Python 3. You can also do this test or try running even the Python 2 version with the option minus 3. This is supported in Python, I think, from Python 2.6 up. We'll give you some information. We'll print some warnings or information on things that need to change for Python 3. While you are changing your code to adapt to Python 3, make sure then you have to change the actual code, the production code, and the tests, and keep them in sync, so you don't have, if you can, not that many failed tests. Okay, one tip. I found this nice tool problem. I can imagine many of you know this already. This is TOX. Nice tool because you can easily run Python 2 and 3, the test both under Python 2 and 3. So you can check if they still work on Python 2 and if they already work or to which extent they work for Python 3. I also like that it implicitly checks whether the packaging works for your library or for your code. Okay, since we are going to support Python 2 and 3 with the same code, you run, I mean, 2 to 3 is still useful, but I recommend running it once. So, for example, print became a function and some, the exception syntax, yeah, changed in Python 3 or the required exception syntax. And 2 to 3 does many of these straightforward conversions. So in this way, it's nice. You should have a look at the documentation for 2 to 3 on the, yeah, about the fixers. There's also, yeah, these are different steps, these different conversions that can be applied. So it's not just everything or nothing, but you can, yeah, individually turn on certain conversions. You should exclude the future fixer because this removes all from future import statements. What you don't want, you want to keep from future imports for your Python 2 code. Many of the changes will be, you don't want to keep them literally. I mean, the print conversions, you probably want to keep as they are. And some changes, for example, changed module names. For example, config parser in Python 2, which was spelled with a capital C and capital P, is now our lowercase in Python 3. So 2 to 3, it expects that Python 3 is your final target destination. It just changes the import to our lowercase config parser. But if you look at the changes that 2 to 3 did, you see, yeah, you do a diff with a version control system and you see what it changes. And so, for example, you see the change import statement and you need to take care of this. Have different code or switch or something later for Python 3. Yeah, yeah, really check all the changes. One, someone I talked with, I think yesterday or so, yeah, mentioned or suggested running the fixers individually. So with each diff, you see the changes only from this fixer. Of course, this requires that you know all the fixers. I mean, you can get this information, but you have to make sure or should make sure that you don't forget any of them. OK, and after you ran 2 to 3 and made your changes, everything at this point should run again under Python 2. OK, and if you're lucky, you already have an API that you can keep for Python 3 or doesn't make you jump through hoops if you want to support Python 2 and 3. I was not so lucky with FTPUTIL, so I would say I had to change APIs to support Python 2 and 3. So the new FTPUTIL 3 is not backwards compatible with FTPUTIL 2.8, which was the previous version. OK, and the standard library in Python 2, almost everything that accepts a string accepts either unicode or bytes. And in Python 3, with rare exceptions, you have to use unicode for strings. And my suggestion, therefore, is use unicode for text data. If you want to decide, you should decide on unicode for text data and not keep the bytes interface that you may have for Python 2. And you need to know or even define what data is text data, sometimes corner cases, where you really have to think or maybe even define whether something is supposed to be bytes or unicode. One recommendation is encode and decode text at system boundaries. So everywhere, this example is reading a file from the file system, decoded. And later, if, for example, you want to send this over a socket to some other host, you encode this, but only then. So you should try to have most of your code deal with unicode strings. So if you, as far as strings are concerned, I mean, of course, the other types are not affected. So if you look at code so you don't have to think, in Python 2, this will do that thing. On this case, I will have byte strings. And when I run this on Python 3, these strings will, I think they will be unicode strings or something like that. So you should avoid this. You should try to, if you have strings in your code, try to get to the point that you can confidently say that or know the locations where it's unicode and where it is not. OK. Sometimes it can get a bit hard. For example, in fdputer, maybe I should show something. This is code you can write with fdputer. It's mostly sitting on top of encapsulates fdplib, fdp, but it's more high level. The intention is that you can write code as if you were using the OS module or SHutil module, OS part. You can use walk and listier on these host objects. The with statement is supported. You can use this file. And you have these convenience methods like download, for example. And you can also open remote files and read from remote files, for example, or write to remote files. OK, and since the one difficulty with fdputer that I don't only have to convert between bytes and unicodes for file contents, for the remote files, but the harder part was dealing with the encoding of file names or directory names when I sent them over socket. And I also checked a bit, these are some sections. So what you see on the next slide is this upper right corner. And on the slide after that, you see this part. So, for example, I made this diagram and attached nodes and how these interfaces behave on both Python 2 and 3 to wrap my mind around this. And I get this straight how I should deal with this. This is the other part where since I'm using fdplib, how does fdplib handle this in Python 2 and Python 3? Where are the conversions? Because the string arguments for files that fdplib uses, it also requires unicode. So I was wondering, how do they encode this? How do they know the encoding or something when I want to finally send this to the fdp server, this file name? So it really depends on your project and how complicated that is. Sometimes maybe it's straightforward, then you are kind of lucky. I mean, you are even more lucky if you can keep your API and only make internal changes. But especially if you need to change the API, if you can't come up with a clean API that is expected by the user or that a user can easily work with in Python 2 and 3, then you need to think harder. And that's really the part that can't be automated. Some more tips. Don't let functions or methods accept both unicode or byte strings. So something like, yeah, if I get a byte string, I convert this to unicode. Or if I get a unicode string, I convert this to a byte string or something because this makes the API confusing. And you always have to think about, is this now, now which conditions is this a byte string or unicode string? And it also makes the tests more complicated because you always have to check both unicode and byte strings for the arguments. And imagine you have something that takes two or three strings and you want to test all combinations maybe even. So you should try to avoid this. Special case, file-like objects or strings for paths because in both in Python 2 and 3, you can use all these APIs that accept file names or directory names with either byte strings or unicode strings. And it will, yeah, under the hood will call different APIs on the operating system. And this also gave me some headaches with FTPuter because I do this as well. And so in this case, I have to accept both unicode or bytes and do different things because I try to mimic these APIs from, for example, the OS module. OK, if you don't accept the strings but let accept file-like objects, so this is handled before. So it's not part of your library anymore. So if you can get away with this, you probably should use file-like objects instead of accepting file name strings. I mean, you can, of course, still have this for user convenience, but this makes it a bit harder. So also avoid different APIs for Python 2 and 3. For example, for FTPuter, I was thinking at some point for backwards compatibility when you run on Python 3 and I open a text file for reading, it should give you, as before, to be backwards compatible, give you byte strings. And Python 3 give you unicode strings because you expect this under Python 3. But this would really be a mess. I made up my mind and wrote summaries on the advantages and disadvantages and posted this to Comlang Python. What do you think? How should I deal with this? And I got two answers, both saying explicitly, go for the unified API, even if it breaks backwards compatibility. OK, and I think this was a very good decision to make. Also make a list of changes before actually changing the API because this will hopefully make sure that you don't forget to change, that you still maybe need to change different parts of your code and don't want to forget something. And it also helps you write some release notes, for example, for FTPuter. I wrote what's new in FTPuter 3.0. So I could check this list and make sure I don't forget anything. You could also use commit messages, but commit messages are usually more fine grained and maybe not so useful for this purpose. And another tip, if you need to change the API, increase the major version number the first part. And then after all, adding Python 3.0 support is a major change, so you can get away with the API changes. So I think it's justified. I mean, it's not just a trick, but I think it's fine. OK. Some other tips, in general, on this Python 3.0 adaption. Read what's new in Python 3.0. At least I actually recommend you also, I mean, not instead, but also read this porting to Python 3.0. I have links at the end of the slides, and I will put the slides online. But even if you just search on the net for porting to Python 3.0, you will probably get this website. This is practically an online book, which is really nice. If possible, support only Python 2.6 and up, because Python 2.0 and Python 6.0 and Python 2.7 have some very useful Python 3.0 features backported. For example, you can say from future import print function, and you will have print as a function with the same behavior as in Python 3.0. Also, the exception syntax, so starting from Python 2.6, you can write accept exception class as exception object. You can't do this in Python 2.5, and if you need the exception object for exception handling and want to support Python 2.5 and lower versions, it gets really messy, and it's not really nice. It isn't fun, I guess. Also, Python 2.6 and 2.7 have this IO module. So you can say if you want to from IO import open, and then you have the open, like the build in open function from Python 3.0, but have it available in Python 2.0. If you need to support Python 2.5, you can use the 6th library, and yeah. So yeah, in summary, anything below 2.6 will probably be output to support. OK, thanks. Yeah, OK. There will still be some things that need to be different for Python 2.0 and 3.0. And I recommend, I mean, I'm not the only person recommending this. I saw on the net to use a compact module, for example, for Python, for FTPuteal. It looks like this. You just say if Python 2, if the Python version is Python 2.0, by the way, use this indexing if you want to run your code on Python 2.6. Because in Python 2.7, this is a name tuple, and you can write dot major. But this doesn't work on Python 2.6. OK, and here I have intypes, a tuple of intypes, unicode type, bytes type. This is much easier than reading the code and trying to remember, reiterate, yeah, I'm now on Python 2. And the SCR I'm seeing here is the bytes type or something. This is, yeah, error prone. OK, yeah, if you have a larger project, also have a look at the future or six libraries. I think the future library looks a bit more modern. It's actually newer than the six library. For FTPuteal, I decided to not use it because FTPuteal doesn't have any dependencies, apart from the standard library. And I didn't want to introduce dependency just for these few things you saw in my com.py. OK, and to, yeah, for every Python file, I mean, that's at least what I suggest. These are some changes which make your Python 2.0 code behave more like Python 3.0 code. For example, the absolute imports are required, the float division or the integer division rather has changed from Python 2.0 to 3.0. And for example, if you use from future import division, you get this Python 3.0 behavior for integer division. I already mentioned the print function. Unicode literals, when you do from future import unicode literals, all the literal strings in your code will, even under Python 2.0, will become unicode strings, unicode string literals. Alternatively, you can use alternative to unicode literals in Python 3.3 and up is the U prefix. This was removed in Python 3.0, but was reintroduced in 3.3. But you still have to know what string type your literals are. And I mean, it's maybe a better off taste, whether you generally use unicode literals for the import or if you use the U prefix explicitly. OK, then summary. So Python 2.0 is still in wider use. But yeah, I recommend using or developing for Python 3.0 if you can. Yeah, using the same source code to support Python 2.0 and 3.0 is feasible, makes sense. Also, larger projects are doing this. Django, for example, have gone this way and have the same source code to Python 2.0 and 3.0. You need to know the concepts of unicode bytes and encodings and the changes from Python 2.0 to 3.0. So again, read what's new in Python 3.0 and porting to Python 3.0. Yeah, you should have tests for adapting to Python 3.0. Otherwise, yeah, it's much more difficult. You should at least have some tests, even if you have a kind of acceptance test or these approval tests that Emily mentioned. You should prefer APIs in Python 3.0 style. So write modern Python, plant and implement necessary API changes carefully. I mean, as you would design, I mean, if you designed your API for your library from scratch, maybe even, do what makes sense for Python 2.0 and 3.0. I already mentioned reading what's new in Python 3.0. And yeah, if you can, if you can actually use require at least Python 2.6, because this will give you several of these future imports. And if you want to support Python 2.5, for example, yeah, you have to write some convoluted code, maybe. OK, that's all for me. Thank you very much, Stefan Schwadzer. So if you have a question, please raise your hand. And now we'll come by with the microphone. Well, this is half question, half remark. I have a project where I also maintain both Python 2.0 and 3.0 at the same time with the same code base. But I noticed that while it is feasible and cool to be able to do that, you give something up. So you give some of the features in Python 3.0 up. For instance, I spent a long time struggling to find out if there's equivalence in Python 3.0 of the, admittedly, weird construction in Python 2.0 where you can re-raise so you catch an exception. And you want to raise another exception with the same trace back as the original one. And you don't like to raise exception, comma. Can you get you the microphone, because I can't hear you. OK, in Python 3.0, what you probably mean is you already have these chained exceptions. And in Python 2.0, I think you can use an additional comma separated argument to give a trace back object or something. But I feel in the end, you have to do it kind of manually and then setting these extra attributes. That's also one thing I thought about. Yeah, this one. Thank you for the talk. It was really interesting tapes. What about porting C extensions? C extensions, like for Python? OK, this is more. Sorry, sorry. OK. I haven't migrated or adapted any C extensions so far. So I can't. OK, from what I heard, changing C extensions is much more complicated. It is, but there is a document that ships with Python. I think it's in the how-to's directory that gives you guidance on porting C extensions from Python 2.0 to Python 3.0. OK, any more questions? I think, let me check. Last question. Thank you for the talk. I think right now you're one of the most competent persons for the topic in the world. If you would reconvert your FTP util library again starting from scratch with all the knowledge you have now, what do you measure it would take in by means of percentage to do it again? With my knowledge now, I can really maybe a week or maybe less. I really don't know. I mean, this was stretched out over several weeks because this is a free time project. Yeah, you had to find all these things out. OK, OK, OK. I have this habit reading a lot of stuff before I start. So I did some research before. For example, a compact module is also mentioned by Armin Rohnerha in some blog post or something, this recommendation. But I think it makes sense. Anyway, yeah. So if I would say it takes several person days, if you just mechanically apply all these algorithms. OK, some of the kind of mechanical changes can be done by 2 to 3. So this at least helps. I find it very useful to run 2 to 3 not only because I mean, you should read What's New in Python 3. But running 2 to 3 on your code will also give you some insights or maybe some things you forgot when you read the What's New document. So it will change some things. And you might wonder, oh, what did it change there? Obviously, there seems to be something different in Python 3 in comparison to Python 2. OK, thank you. OK, thanks a lot, Stefan.