 Hi, hi everybody. My talk is about Python 3 ten years later. My name is Victor Stiner. I'm a C-Python core developer since 2010. What does it mean to work on C-Python? It means, for example, to maintain the CI, to fix the regressions, to take care of the backtracker, to help to review patches, to help to debug some issues, but also to take care of the mailing list, to answer questions, to follow some peps. In fact, there are many, many things that should be done in Python. And I'm working for C-Python, but also on OpenStack for Red Hat, and I am a very happy Veeam on Fedora user. I decided to group my slides into four seasons, so the ten years will be in four seasons, and we will start with Autumn. The birth of Python 3000, because at the beginning it was called 3000 and not Python 3, was in 2006 with the pep called Python 3000. And the thing is that at this time, some people started to complain about design issues of the Python language, which were called Warts. For example, in Python 2, you have the small integer and you have also the large integer, and the idea is that if you start with a small integer, depending on the operation, you may get a small or large integer. So to check the type of a variable, you have to check for both types at once. There is also a new class introduced during the development cycle of Python 2, which is called the new class. You get a new class when you inherit from the object type, and if you don't inherit from an object, you get an old class, and some features like properties doesn't work as expected on an old class. So having these two things, like small integer, new class and old class, can be very confusing for newcomers because you have to explain why we didn't have a single thing at the beginning. There is also a question about division, because when you start learning a new language like Python, it was surprising to the division of two integers give an integer and not a floating point number. Maybe if you are used to Python today, it seems you understand the reason, but when you start a new language, again, it can be confusing. About Unicode, I think that Python 2 has a good support of Unicode. If you are using Unicode everywhere in your application, which means that you decode all inputs and you encode back the outputs. If you only use Unicode everywhere, everything is fine. But if you start using a module which is not compatible with Python, sorry, Unicode, you may or you may not get some issues depending on the content of the text. It means that if you only process English, it will be fine. But if you start to get French names with accents, you may get a hard Unicode error, which is something quite annoying because it means that you get the error at runtime and you don't get the error at the first run. It depends on the content, it can be very annoying. Another design issue I would call a design issue of Python 2 is that when you compare two types, which don't implement the comparison for these two types, there is a fallback in Python 2 when you take the name of the type. For example, if you compare a number and a string, Python will take the name of the type, like int and the string str, and compare the string. And this order may not be the order that you expected. And one part of the Python philosophy is not to make a guess of the intent of the developer, but the philosophy is not to let the developer make the choice. And the last issue was the imports. In Python 2, you may be aware that if you have a file name with the same name as a module in the standard library, you get your local file. For example, if you create this.py in your project, you may get this file instead of the one from the standard library. But Gwendoff and Rossum, when he started to design Python 3, he didn't want to break everything, so he wanted to control the risk to reduce the risk of big failure. For example, we decided to not break everything, but really focus on a few known design issues, the Python warts. And there was also an open community process for deciding what to change. This process is the PEP. So we had many PEPs called three thousandths in a number to describe changes made in Python 3. For example, there is a PEP to reorganize the standard library to change all names of the modules. Another choice was to not implement the interpreter from scratch. The idea is that if you start from the same code base and make changes, you get a better backward compatibility, especially for the C API. These are all the tools to reduce the risk. The last one is to announce the end of life of Python 2 to make it very obvious for people that there is a deadline. You must be aware that at this time, you will have to support your project. And here it comes. It's a holy grail, Python 3. In 2008, you get it. We did it. And the first migration plan was very simple. In fact, Python 3 comes in the standard library. Python 3 comes with a tool called 2 to 3. It supports your Python 2 code, 2 Python 3 at once. And the idea is very simple. You run this tool on your code base and you're done. You are compatible with Python 3. Maybe it didn't work as expected. Maybe. Because the thing is that when you port code to Python 3, the thing is that you drop the support of Python 2. And basically at this time, it means dropping support for all users. So the authors of the modules on the cheese shop decided to let them know. That's a no-go. And they didn't want to make that change. They didn't want to be the first one to make the change. They didn't want to be the early adopter. Another unexpected issue is that in practice, when you have a code base, sometimes you have external dependencies. And if you don't port all dependencies of your application, even if you port your own code base to Python 3, you are still blocked by the dependencies. So the problem was that everybody was waiting until someone moves and nobody wanted to move. And the last point is that in fact, we didn't expect that Python 2 was so much popular. We did not expect that many companies are very large code bases written in Python because there is not only the published code, but also the private code written by companies. So it didn't go as expected. Another issue with the Python 3 is what is called the technical depth. So explain the issue. Imagine that you have to ask your manager to get some time to work on Python 3. So your manager, what should I let you work on Python 3 support? The developer? Oh, for all these new cool Python 3 features, obviously. Okay, but can we use all these features? Well, since we are still stuck at Python 2, we still have to support Python 2? No. So the issue with Python 3 is that even if you do all the boring process of the migration, you don't get any new feature of Python 3 because you have to support Python 2. So because of that, it was very difficult to ask your manager to get time, but also to motivate yourself to spend time on that. And any migration like that means to modify the code. And you may know that any tiny change in the code is likely to introduce regressions. So you have to justify not only that you spend time on a useless migration, but also that you are going to introduce bugs. So that's not easy to sell to a manager who was customer waiting for the new feature. Another choice to make when you port code Python 3, when you use 2 to 3, is to decide if you would like to keep the Python 2 code base unchanged or if you would like to create a single code base or a Python 2 and Python 3 in two different branches or maybe a two different project or two different repositories. So some project decided to really fork to have two different repositories or at least to have two different branches. And to give you an example, there is DNS Python. I think that this one decided to use two different branches but also to distribute the code with two different names because you was not able to install DNS Python on Python 3 and DNS Python 3 doesn't work on Python 2. For some projects, the company behind the project didn't want to spend time on Python 3 because of the hairy manager. And the community decided to fork the project like PIL to create a new project called PILO. And the first kind of PILO, it was to add Python 3 support. Another more tricky issue is when the project is made by contributors, an open source project like MySQL Python, but the maintainer doesn't show up. So for this case, it was very annoying because many people are using MySQL, like many people using Django, but also, in my case, for OpenStack, it was the SGPD used by OpenStack for my company Red Hat. So not being able to discuss with the database, it was really blocking. And I think that three different people wrote the world change, full request, quote, MySQL Python 2 Python 3. But after two years, we still don't have any news from the maintainer. So some people decided to fork the project and to create a new one with a new name. But when you get a new name, you may have new issues because you have to modify your project to change the important name. And when Python 3 was released, the stable version of Python 2 was Python 2.6. Python 2.6 already has some things to prepare the migration to Python 3. For example, there is a bytes type, which is an alias to the string type. There is also the B prefix to annotate which string are bytes. But you still need many, many tiny changes in your code days. If you would like to have the same code days for Python 2 and Python 3. So when you still had to support 2.6, which was the only version available, it was quite difficult to make many, many tiny changes. And you also need backports like unit test 2, but also others to get new features from Python 3. And on the Python 3 side, up to Python 3.2, there is using the U prefix for unicode string with a syntax error. And when you add write codes for Python 2 and Python 3 in the same file, not being able to annotate unicode string was a blocker issue for Python 2. There was a trick which is the U function of the 6th module. And using this function, you get a unicode string, but it means that instead of just writing a string, you have to call a function. So it's quite annoying, quite surprising. It's not the most straightforward solution. And by the way, the 6th module is a code with many small tools to write the same code for working on Python 2 and Python 3. So depending on the Python version, you will get the different functions. But in your code, you only write one function code. After Outon comes the cold winter. It started with Python 3 of shame, shame, shame. In 2011, someone created this website. Zinten was not really to blame people. It was more to motivate people to start to spend time on Python 3. And you may see that on this picture, almost no module of the most popular module were compatible with Python 3. So in 2011, we started very far. At this time, I identified three big players on the Python community, three big applications. There was a twisted framework, which is a framework to write clients on several networking codes. There was also Mercurial, which is a source control management tool similar to GWT. And Mercurial is fully written in Python. There is also Django, which is nowadays very famous, but maybe 10 years ago it wasn't. And the problem with twisted is that it's only exchanged bytes because it's a network code. And on the wire, there is no unicode. On the wire, you only have a flow of bytes. So having to use unicode as our first citizen class in Python 3 was an issue. And it's the same for Mercurial, because Mercurial doesn't really try to understand the content of the file. In Mercurial, the content of the file is basically bytes. And for the case of Django, I think that when Python 3 was released at least, the support of unicode was not so good. And because of all the issues of immigration, of small wars, of small things, more and more people started to complain that maybe Python 3 wasn't a good idea. Python 3 doesn't bring anything, because as I explained, even if you port your code, you still don't have access to new features. And they also complain of unicode, because when you have a Python 2 application, you only process text as bytes. And even if you take some two different texts and two different language and two different encodings, when you combine them, there is no error. In the worst case, you get MojiBake, which means that you get strange letters. But it's not a new issue, because the program will not crash with a hard Python exception. So according to them, bytes is the way to go. It's the best idea to store all the text. And even worse, the troll started to discuss about an idea called Python 2.8. The rationale was that since people are still using Python 2 in production, since it just works, people are very happy with Python 2, maybe we should continue the development of Python 2 and just add new features, or maybe at least backport some features from Python 3 to Python 2. But the C Python code developer disagrees with that, because the thing is that many people are volunteers to work on C Python, and they didn't want to have to duplicate the work between Python 2 and Python 3. So the C Python code developer really didn't want to duplicate the work and really focus on the future and help people to migrate. But even five years ago, you could still read something like, I think that Python 3 will never take off because we only have 2% of people using Python 3. Maybe it was a bad idea, just forget it. But the C Python code developers decided that no, there is no Python 2.8. It's not going to happen because we own the language, we are the developer, and we don't want to duplicate the work. It doesn't make sense to go backwards because of the Python words, the design issues. We wanted to fix all these issues. So our PEP was published in 2011, the PEP Not Found, 404. It's a Python 2.8 Unreleased Schedule. And if you look differently at numbers, like the top 50 most popular project on PyPI, in fact, we are more close to 80% of projects which are compatible with Python 3. So it's not like 2%. And in my opinion, the best thing that we did in the last years was to extend the support for Python 2. The idea is to make it very clear that we are not going to abandon people. We are not going to kill Python 2 users. We really wanted to help people to migrate and give time to people to migrate, to force them to do it right now. So Guido van Rossem decided to extend the end of life by five years. In fact, it means to double the support time from five years to ten years, which is the longest support that we have in Python. And for you information, the end of life is very close now in two years. After the cold winter comes to spring with flowers, the plants are growing, things are changing. So the first very good news is that we fixed our first problem in Python. What is the first problem in Python is how can I install something in Python? How can I install a dependency? And the usual answer was, you just have to install setup tools. But I wanted to install something, I have to install something else. How can I install setup tools? So you have to find an installer on a website to get it on your computer, to run it, which means to have the administrator privilege, which is maybe not the case. So it was annoying for everybody because it was very difficult to install setup tools. It was very difficult to find documentation. And there was not only setup tools, but also distributes and maybe other competitors. So in 2011, the PIP 1.0 was released and the huge thing was in 2014, the Python 379 and Python 34 now comes with a new module called Insure PIP. It's not really PIP itself, it's an installer to install PIP. And you fix the bootstrap issue of installing PIP. And the thing is that since it's part of the standard library, people stopped to ask the answer, what is the best option? They just started to use PIP because it's part of Python. So slowly it became the de-factor installer and now the line distribution and the Windows installer, the Mac OS installer all come with PIP. So you don't have to worry about that. And today it's much, much more easier to install something to install an external dependency. Maybe the first approach of dropping Python 2 was not a good idea. So slowly a new idea comes up, but you have to understand that it takes a lot of time to understand that it was a mistake. It takes time to listen to users, to listen to developers. So it took us a few years to come up with this new approach. So maybe it seems very simple today to say that, but you have to understand that it took us time to find this clever idea that instead of promoting 2 to 3, which was not a good idea, so we should stop to drop Python 2 supports. Maybe a better idea is just to add Python 3 supports. And by doing that, a lot of things are changing because, for example, the migration is not a single shot. You don't have to port your whole application at once to Python 3. You can do it by small pieces, one by one. You can work in a single directory. You can port a single dependency. And by doing that, you can also check for regression on Python 2 if you have tested a CI running to check your code. And we started to see new tools like Modernize. This tool takes your Python 2 code and adds calls to the 6th module to make it compatible with Python 3. So you keep the Python 2 support and you get Python 3 support for free. And there is another project that I wrote called Sixer. This one has a different story. I'm working on an open stack. If you don't know open stack, it's a giant pile of code, two billion lines of code, so it's huge. And for open stack, I had to start with Evangelize Python 3 because even four years ago, people were still not convinced that they have to port code to Python 3. So when I started to write giant patches using Modernize or other tools, I didn't want even to look at the pull request because it was a giant pull request. And you have to know that in open stack, things are moving very, very quickly because it's a huge project with, I think, 2,000 people or more working on the same code base. So if you generate a pull request, in a few hours later, you get a conflict. So you have to wait to fix the conflict, push back, and wait for the new review, and it's not going to finish. For six hours, I took a different approach. Instead of making all changes at once, in fact, you can just make a single change, like just add parenthesis to prints to make the prints statement compatible with Python 3. And by doing that, I was able to produce very small pull requests which are straightforward to review. And with very small pull requests, quickly I was able to run unit tests on Python 3. And now, basically, the whole OpenStack project has unit tests working on Python 3. And even today, we have functional tests running on Python 3. So we are very close to our full support. When you have a very large code base, like OpenStack, you get new issues. Because, for example, in huge companies, the people are moving from one team to another or quit the company or leave the project for other reasons. It's called the turnover. And when you lose original authors of the code, it's very difficult to modify a code that you don't understand, especially if you don't have unit tests. So maybe before starting to make changes to add Python 3 support, maybe you have better time to work on testing your code days just to make sure that you're not breaking anything. The Dropbox company decided to take another approach is that they wanted to annotate the type because if you are coming from a language like Java, the type of all function parameters is very explicit. And the benefit of that is that you can run a static analysis to make sure that you pass the right type. So by doing that, you get a lot of bugs at the compilation time, so you make your code better, it works better. But a few years ago, there was no tool. Python 3 just had the ability to annotate type, but on purpose, we decided to not standardize how to annotate types, which means that you can use a string, you can use an expression, you can write whatever you want, but you are not able to use your own custom annotation to validate the code. So what they did is first, with the help of Guido van Roosom and others, they wrote a module called typing. Typing is a standard way to describe annotation. To say that you have a list of integers, you have a very specific syntax for that, and there is a different syntax for more evolved types. So they started to write a lot of specifications for all these annotations. And in parallel, a different team was working on a static analyzer to use this annotation to check that everything is working. And the idea for Python 3 is that they started to annotate type to make sure that they don't introduce regressions and to touch issues. Another approach to make the migration easier is that when you look at individual changes in Python 3, they are very, very small. Just add parentheses, print, it's not a big deal. But if you look at each individual changes altogether, in fact, the gap between Python 2 and Python 3 is quite big. So what we try to do is to reduce the gap by building a bridge between the two versions to make the migration easier. For example, in Python 3, we reintroduced the U prefix. It was very difficult for the C Python Core developer to understand that because we like the purity of the language. We wanted to have a language very regular, very simple to learn. And it doesn't make sense to annotate Unicode because everything is already a Unicode string by default. So it took a few years, so four years after the release of Python 3.0, to accept that maybe it wasn't the best idea to remove the prefix. Maybe to write a single code base for Python 2 and Python 3 using the U prefix. Okay, maybe it makes sense and let's do that. And trust me, it was a really big change for me because before Python 3.3, it was very, very painful to annotate 6.U to call the function with a string. It looks bad. It was difficult to explain to people why you have to call a function just to get a string. So it was a very good idea. In 2015, the Python 3.5 introduced back, in fact, the formatting of bytes. This specific change is very useful for projects like Tristed because Tristed only used bytes for the networking client and servers. And before this change to format a string, you have to take your bytes, decode your bytes to get Unicode, to process Unicode to format the string, and encode back Unicode to bytes. And people like the Tristed developer doesn't understand why you have to decode and encode. If you are only under bytes, it doesn't make sense to make these two useless changes in operations because it's also slower to decode and encode than doing directly the formatting on bytes. So this tiny change unblock the migration of Tristed, for example. At least it was much more easier for them to pause the code. And on the Python 2.0 side, we also made changes to simplify the migration. What we did is to add warnings. It means that when you enable these warnings, running your code, your Python 2.0 code, start to complain that maybe this code doesn't look correct in Python 3.0, maybe you can look at this code and do something to fix it. And the good thing with that is that you don't have to wait until all your code days, all your dependencies is ported to Python 3.0. You can start by looking at all these warnings and to fix them one by one. So you can see that we made changes of both sides to reduce the gap. And another thing which doesn't come from C Python directly but more from the community is that more and more people started to back port Python 3.0 features to Python 2.0 because technically sometimes it was just possible to do it. And by doing that, you can start to use new Python 3.0 features because they are now available on Python 2.0. So for example, the Enum 3.4 is a new module, Enum of Python 3.4. So it becomes possible to use new features like that. And after the spring comes, summertime, it's time to enjoy. So to come back to the previous slide of the Python Wall of Shame, we only had 9% of the module which were compatible with Python 3.0. And the author of the website built this website to motivate people and one day he decided to change the name of the website to Python 3 Wall of Super Power. And today we are very close to 100% of projects of the 2,000 most popular projects compatible with Python 3.0. In practice, we miss something like 10 projects, but it's not really an issue because the 10 projects are usually deprecated or replaced with a better solution, a new way to fix the issue or just a fork of the project. For example, MySQL Clients is not compatible, but you have MySQL Python which is not compatible, but you have MySQL Clients which is compatible. And you have to know that Python 3.6 is now faster than Python 2.7. Here you can see that the green lines are smaller. In fact, the timing is normalized on Python 2. So if it's smaller, it means that Python 2.0 is faster. And this is just the most significant benchmark where the difference is the largest. So you can see that on many benchmarks, especially on the same pie, it's way faster, up to two times faster in fact. So to give you an idea of the performance work that we did, there was a talk at the previous Python US made by Instagram because Instagram is working hard on porting their code base to Python 3.0. For different reasons, for example, they would like to use AsyncIO, and AsyncIO is only usable on Python 3.0. But also Python 3.0 has less bugs, it's much better. And they didn't want to postpone the technical depth. And the very good feedback from Instagram is that not only they reported a huge code base, because Instagram, for your information, is something like 700 million users. So it's a very, very, very large project with a lot of users, and it's fully written in Python. So it's not a small thing. And because they have a lot of users and because Instagram has competitors, they are not able to restart from scratch in a different language or restart from scratch on Python 3.0. So what they did is to port the codes piece by piece, and they succeeded to port the code. So at the end, you can see it on the CPU side. CPU, in this case, is more UWSGI and Django. On this side, they saved 12% of CPU just by moving the code to Python 3.0. So they are using less hardware, and trust me, for Instagram, it means something, because when you have 700 million users, hardware becomes very expensive. But on the memory side, they also saved 30% from the memory. So again, it's very, very, very important for them. And for the memory, the most saving was on the celery side. To show you also why you should move to Python 3.0, you have to know that I started to collect a list of known bugs of Python 2. Even if I'm working on C Python, we all want Python 2 to be very stable and to work perfectly. Sometimes we are not able to fix bugs because I would say that Python 2.7 is super stable because we have a support of 10 years. We have large companies based on Python 2. We didn't want to break the language. We didn't want to break applications. So the backward compatibility is even more important in Python 2.0 than in Python 3.0. Another issue is also technical because, for example, Python 2.0 supports a lot of legacy platforms and we don't want to lose their support. We support multiple threading implementation while Python 3.0 only supports P thread and Windows. For all these reasons, it becomes very difficult to fix bugs. So to give you an example, Unicode, we cannot change Unicode in Python 2.0. It's just not possible because people rely on the current behavior. Another example is the Python dict type as a vulnerability where you can make a denial of service on a server. When you inject a specific HTTP header, you are able to crash the server with a very small payload and when the countermeasure for that is to randomize the hash function because the attacker is no longer able to generate a specific pattern to crash the dict type. But in Python 2.0, we were not able to enable the protection by default again because of the backward compatibility and the sub-process module is not thread safe. Many of you are not aware of that but when you start to get such issue it's very painful because you may know that multi-threading is not something deterministic. It depends on the timing of the sub-process so you may or you may not get the crash. It's very difficult to find the relationship between the multi-threading of the crash and we cannot fix that. The recursive lock is not signal safe. A signal is, for example, when you spawn a sub-process and the sub-process completes, you get back a notification with a unique signal and when you get a signal, if you get it on the bad moments, the recursive lock may become inconsistent so it's also a very annoying issue because you are not able to expect signals, you cannot control signals so it's very painful. The last issue is that even if you modify your code to use monotonic clock to avoid issue with winter and summertime, the DST change when you add or remove one hour, even if you make your code safe, Python internally still use the system clock which has the issue. But on the bright side of Python 3 we fixed the issue like we added the time dot monotonic clock which is now available on all platforms. Another more tricky change is that the file descriptor has no non-inheritable. It means that when you spawn a process using fork, an exec or sub-process, you don't inherit the files that are open in the pattern process because in Python 2, if you open a file and spawn a process even if you close your file in the parent and if it's still open in the shell process the file is not closed technically in the linked line external so the data may not be flushed you may not be able to remove the file on Windows. You have a lot of very annoying issues because of that and something more critical is that if you open a sensitive file like a file with passwords or sensitive critical information if you inherit the file descriptor the shell process technically is able to read this file or even to write into this file. So we decided to make them non-inheritable by default just to fix the issue for everyone and you don't have to modify your application for that. Another change of backward incompatible change is that in Python 3.5 we changed how we handle signals because previously you was able to get an exception in Python when your function is interrupted but when you get it you have to restart manually your function until you get no signal and we decided to make it as a C-level and we don't have to worry about that anymore and about file descriptor non-inheritable I wrote the PEP and it took something like 8 months to convince Guido Vamorosso and what he said is that we are aware of the code break edge this is likely to cause and doing it anyway is for the good of the mind-punt. To give you an idea of Python 3 we have not less than 21 new models like the very famous asyncio for asynchronous programming the new enum, the popular pass lead module for fine lines but also the very cool unit test.mock to mock functions to write moduli tests and since Python 3.6 you have an amazing f-string if you are not coming from Python it seems very stupid that it's a new thing but it took us a lot of years to say that it was maybe a good idea to support that it's a new way to format string you say with no percent operator or dot format function it's just f-string and the string is formatted in place you can pass a variable but you can also pass technically it's a Python statement so you can call method, you can write operation we added a lot of things to the language itself to the Python language for asynchronous programming for asyncio we first added the yield from which delegates a generator from another sub-generator the async and await keywords were added to make the asynchronous code look more straightforward more simple and we also added support for asynchronous generators and asynchronous comprehension there are way more new Python 3 features new syntaxes like keywords only arguments like the print function which now take arguments like file on end you can use star on double star to unpack list on dictionary which is really cool and a very small thing which is again obvious is that now you are allowed to write underscore to make numbers literals more readable you can annotate the type of variable you can write multiple with context manager statement on the single line we get the new matrix multiplication which is something very useful when you use numpy so with all these changes and successful migration now the question is is it time to bury Python 2 in fact people already started to do it like Fedora 2023 and the latest version of Ubuntu already has no Python 2 in the base system in this case it means that if you install Python 2 application you still pull Python 2 but in the base system there is no Python 2 and you have to know that Ubuntu and Fedora are very keen of Python a lot of things are written in Python so it means that we already did most of the work there is Python 3 statement which is a combined timeline of all scientific projects to show when the Python 3 support will be dropped they would like to coordinate to make it clear that you have to start working on Python 3 the Python clock is a countdown until the death of Python 2 and last year two big players of Python which are iPython 6 and Django 2 decided to drop Python 3 support so i think it's a huge thing for Python because Django is very very popular and the thing that even Django with large code bases a lot of dependencies is able to move to Python 3 means something okay i think we are done now seriously for Python 4 we learned from our mistake we understood that it was not the best way to migrate from Python 2 to Python 3 and we will do it very very differently I think that Python 4 will be as a GTK4 and other project it will be just the next release not a backward and incompatible release as it was Python 3 we will use exactly the same deprecation process that we are using between each minor Python 3 release like 3, 4, 3, 5, 3, 6 so it means that we start to deprecate, generate warnings and only the release after or sometimes two release later we start to remove the code so we are giving more time to people to follow the code but also to communicate on our changes on documentation but also on the code itself thank you