 Hi everybody, I'm here to talk about how to introduce incompatible changes in Python and if possible how to mitigate risk of incompatible changes So my name is Victor Stine. I'm contributing to Python upstream and Don't stream for reddit. Don't stream means that I maintain the federal and rail operating system and upstream means to fix Issues in Python upstream. I'm a core developer for 13 years I'm a happy federal and the user and Sadly, I went through many incompatible changes since 13 years So first I would like to come back a little bit to the past how we did the migration using the D-day immigration So a long time ago in a galaxy far far away. There was Python 2 Let's travel 15 years in the past before Python 3 So For the one who don't know we had a language called Python 2 before and we had some some issues in that language and the first one is that 15 years ago Django became more and more popular became a good competitor to PHP frameworks But they wanted to use a unique code for anything related to text and the problem is that As soon as you use unique code in a Python you get into troubles It means that if you deploy your application on Production everything is fine until a single user put a single non-ascii character You will get an error, but you don't know exactly where because you don't exactly control how Python decide between bytes on unique code and In general in Python 2 it was a very frequently asked question So in Python 2 the string ABC is a byte string. It's not characters, but bytes and If you concatenate a byte string on a unique code string, which are non-ascii you get a unique code at all and Yeah, just in short getting unique code correctly in Python 2 was very troublesome So what we decided is that's in Python 3 we move to the correct solution by default Which is that most people actually wants to process text in a pattern So we want to use unique code for everything and it becomes a first-class citizen in Python 3 because if you declare single string like ABC this one now is a string of characters but the trouble is that if you have a large application like Django or Zope or Mercurial or anything which is based on Python 2 So written with the assumption that all strings are bytes moving to unique code That's once is very complicated because you have to rethink for each function each input each output What do you exactly wants is is it more appropriate to use bytes or to use unique code? And the second issue is that you cannot decide that on an incremental way when you migrate from Python 2 to Python 3 You have to fix all your technical issues at once and Python is also a very old language and we added slowly One by one some features like for example in Python 2.2. We introduce the cool concept of Eterators the pep 2 3 4 and also generators the pep 2 5 5 and The problem is that's in Python 2.0. That thing did that exist So the existing function like the built-in map and zip function return actually a list and The problem with a list if you have a large data set It consumes you a lot of memory just to create the temporary pre lists just to process the output That you are likely to iterate on it So what we come up is a new module called Etertools which contains many Recipes many functions to process everything as iterators and generators And for example, you have to replace map and zip with I map on I zip But the trouble is that you you have to migrate your existing code one by one and it was a long process to do that and the advantage was not always obvious and It's the same for dictionaries when you have a dictionary and you want to iterate on the pair of Key-value you have the items method, but this one also return a list So we added a second method called Eter items so we decided to start from a from a new Language which is which has better defaults. So in Python 3 we decided to move to generators by default for the map and zip Inbuilt in function and for the items method of the dictionary. It also returns a generator by defaults and if You really want a list instead of that generator You just have to cast the output to a list and that's it So the idea of Python 3 is that we collected Everything that we didn't like the bad pattern and we tried to address them all at once and the idea would to have a good same defaults behavior for example use unicode by default create generators by defaults and I would say that the language become consistent again because we we saw many new features in Python 2 which means language a little bit inconsistent and now We have you have the good default and it just works Just a minor issue at the end. Oh It's backward incompatible. Oops So why do we do we have incompatible changes in Python? So we have a pep of Python as the Zen of Python the pep 20 which say that there should be one and preferably only one obvious way to do it and This principle is very strong in Python, which means that we have a consistent coding style Python it's easier to teach and easier to review Python because most people have the same Code so you can compare code between colleagues and even before between projects So what we said for Python 3 is that to make everything consistent again? We have a very simple plan everybody has to run a tool which Automatically convert everything from Python 2 to Python 3 you do it once you're good Almost We had some troubles which are Dependencies this is something that we didn't plan Actually when you have a large application Everything is not in a single code base. You have things called dependencies Today we are more used to it with a pipey high, but before is there was already something like that and The problem is how do you migrate your application if the dependencies are not Prepared for Python 3 Because if you have a single dependency, which is not compatible with Python 3 you have to port all dependencies which have also Dependencies it can be a very long tree and The second issue is that when you run the tool 2 to 3 it's really a single way path So you you go to Python 3. Okay, but you cannot come back. This is a one-way option and When we propose that to dependency maintainers they say no we We have as a majority of our user on Python 2 I don't see the the advantage of migrating to Python 3 because All Linux distribution are using Python 2 and we are fine with Python 2 so we will wait until some other people migrate and For all these reasons immigration Didn't took one day as a plant, but ten years and Then comes the second thing in Python. It's called a C API So the C API is used by many third-party extensions to extend the Python language and I think at the C IPS as a key of the Python success Because thanks to that if you are limited with the Python language you can easily plug your existing very 50 years old Fortran code for NumPy you can plug your favorite Graphical toolkit application quite easily. It's very easy to write the bindings. It's very easy to call existing function and Yeah, if you have no C API there is no site and there is no NumPy there is no Scientific stack. There is no Psycho PG the driver for PostgreSQL and Another problem is that in Python 3.11 we made many optimization work in Python But to be able to optimize Python we had to make some Subtle changes in the C API Especially in object which are related to code execution So the code object the frame objects and what we call the thread state which contains the state of all Python internals and the problem is that As usual people actually use it and they use it for many different things and they use directly There's structures and there is no abstraction between the Internals and how people use it So this changes broke a few C extensions So for example instead of accessing directly to the F underscore code of a code object You have to know to call a function which is pi code get code To get a previous frame you have to to call pi frame get back and to get a frame from a thread state You have to call the path thread state get frame And another problem is that this function are new in Python 3.11, but you don't have this function in 3.10 Okay, I saw how we did things in the past And now I would like to see what what is a present solution and how we manage to have a little bit smoother API updates So first of all about Python 3 The migration from Python 2 to Python 3 There was a new module called 6 and this one is very helpful because you can have a single code base To port your application. So you you use a 6 module it works on Python 2 And it works on Python 3 and this is very very practical solution and very helpful and because previously people tried to They started to fork the project and to have two different names It will it was very annoying because you had different dependency depending on the Python version or Some bugs were fixed in one version and not the other So having everything in a single code base is a key for the for successful migration And the other idea of the 6 module is that instead of having to merge To to migrate everything at once to the Python 3 you can Migrate your file one by one using the 6 module and it is more incremental And also we decided that the D-day approach should be abandoned because it didn't work because of the whole issue that I said and We learned from from our mistake About incompatible changes There is also a new practice is that we are trying to make incompatible changes as early as possible in the development cycle and When we see that We break too many things we open a discussion to say that okay, maybe this change Can be reverted and we can wait maybe one or two years until more people get used to the new API and the problem of Python 3 10 is that's when it was released Python 2.7 was just The support would just ended and some projects still had to support Python 2 and Python 3 and They didn't want to drop Python 2 support right now So we made a few reverts To to give more time to this project to be great for example We remove the U mode for the open function because this one has no effect on Python 3 and it was deprecated for 10 years and also the Aliases of the collection module For also for many many years it was deprecated But skipping the code and didn't were the big maintenance burden So we decided to keep it in one release, but to remove again in the next release And the main idea of this process of making incompatible changes early and make reverts During the development is to give more time to people to adapt the code because We know that we have users and we try to be respectful to our users To give some example on Python 3 11 We also reverted the removal of a new nicode aliases because again, we had many aliases deprecated for many years People didn't pay attention to the deprecation warning. So we reverted that change for one more lease and also the There were aliases in the configure parser and some functions and The as in core module I'd expected that nobody would still use it, but in practice It's still used for different reasons and moving to as in coyote or the option is not that easy and There's free changes and gets reverted But we did it again in the incoming Python 3.12 release The problem of this changes is that they affected too many packages and it takes too much time to fix them so to To decide about a change in Python. We have a Process for that. It's called the pep 3 8 7 the deprecation process and We had some conflicts between this existing pep and the new Release process in Python because in Python 3 9 we decided that we are going to release a little bit faster Instead of having a release every one year and a half. We are going to have a release every October so once a year and the old deprecation process was Fitted to the old release cycle Because we had like one year and a half to remove a function and with the new release process It was only one year to remove a function and we noticed that one year was too short Because people don't read the documentation. They don't pay attention to the warnings or just their different life duties So what we did is to update these documents to to require to Deprecate something for at least two years So this is a bare minimum, but obviously you can deprecate a function for longer And for example if you deprecate a function in free 11 it has to stay deprecated in a free 12 And we are only allowed to remove it in free 13 so free it takes three years in total About the deprecation warning It was decided to hide them by default in 210 Because we noticed that most of all your users actually users and not developers And they don't know how to deal with this warnings because only people who have access to the code Know how to modify the code really care about it. So the idea is to make it more pleasant for users and give the access to This warning to developers who can enable this warnings And we made a tiny change in Python 3 is 7 is that when you are write a script This warnings are shown by defaults, but only in the main script in the main module so to display a warning once what you have to do is to use a dash WW default to see the warning once to treat Every kind of warning as an as an error you can use error instead Or you may want to try the development mode of Python, which is dash uppercase dev And this one not only show warnings, but also enable more features More checks at runtime, which are very helpful for developers And if you get too many warnings you can dig into the warnings documentation to see How to filter some warnings to only see the one that you care about So what we are trying to do in Python now is to have what I would call a smooth deprecation So the first point is to add the new way the new API deprecates the old way only in the documentation we start to emit a warning at runtime and Something which is very important for me is that we we try to explain how to port existing Code which is using the old way without losing support for the old version And this is something new because in the past we just remove code and you are on your own Now we are trying to help users to actually propose a solution working on the old Python and the new one with a single code base and Making this exercise and help us to see that oh actually it's not that easy to have exactly the same behavior with the old and the new way so maybe sometimes we we have to rethink the change to To have an even smoother migration And once you're done with all the steps, okay now you can actually remove the old way What we also started to do is to run a code change So I there is a script to download the source code of the 5,000 more most popular project on the PI PI repository and once you have a whole source code on Offline you can have a desk script to church with a regular expression to see if an API is used in that code and This work help us to see how the API is used how many project are you are using it and Once we identify the most popular project which use it we try to either report the issue upstream propose a fix or to To come up with a solution for them And you can find the script on my GitHub repository So for me ideal migration will be first to add the new API documents the change and provide a tool to help the migration to Identify and update all affected project or the majority of affected projects and If possible that would be the ideal case to wait for a release Because if the release of the of the fix becomes after Python is there is a delay between the new Python version and The tool which is compatible with the new Python So once all affected project of get a release you can deprecate the old API remove the old API and The issue with that process is that it's quite slow It takes between three and five years and sometimes we want to move faster So we are trying to fit into that migration path, but sometimes it's too slow And a very recent change the recent spins two weeks ago I defined what I call the soft deprecation and The idea of the soft deprecation is that this one doesn't imply to remove a function It's a way to mark a function that all you should no longer use it to new projects But it's perfectly fine to continue using it for all the project because it's still tested. It's still supported We have still the documentation and not only it's not There is no removal which is catered but also there is no warning at runtime and This is also something very important is that more and more project are tested with the warning are checking for warnings in the test suites So the idea of the surf deprecation is to mark something as deprecated but not affect any project because the Deplication is only in the documentation And and I told about the code church in the most popular PI PI project but sadly we don't have access to every project in the world and some Some co-op item deponensive have also a single an unavailable maintainer. So even if we find The project which is affected we propose a fix sadly sometimes takes a few months or a few years to to get a fix and There can be many reasons The maintainer can be busy with work with other live duties get bored about the project got sick or it's can be someone of A friend of someone of his family of have family So it's not about the best factor of people get his but hit by a bus. There can be many many reason like also burnout So how can we update this project if the maintainer doesn't reply? I Have no solution for that There is also the problem of funding the open source project. Maybe some of them are aware of that big companies are relying of key dependencies, but there is no funding for that and Also maintaining this project is a thankless work And yeah, but project which are developed behind the closest door We don't have access to the source code. So we don't know if they are affected or not they can be short script or very large application and Sometimes they are very old projects, which are no longer maintained There is also turnover in the in the team. So people who knew how the code was working along no longer in the team So for that project and there is a script for Python, which is called pie upgrades so you can run the script and gets some automated change for the new version of Python to make it compatible with the new Python and For the C API I've read a script called pie upgrade Python C API which adds support for the new Python version without losing support for the old Python version Or at least you have one solution, which is not great, but works You just keep an old version of Python, but be aware of the security So about the C API what I did is to write a new tool to provide new functions of the New Python version to the old version of Python. So the idea is that you only use the new names But you have this new functions on your old Python version So I created this Python three years ago and I created the script to automatically update your C application with the new names and I had to add support for Python 2 So Python 2 is still supported because I needed the support for the Mercurial project We each didn't finish his immigration to Python 3 and Last year I added many functions for Python 3 11, but in the meanwhile I added functions for Python 3 12 and even now 3.13 At least 10 projects are using it and you can find the documentation online So the idea is that you update the C the C extensions to use the new functions and You copy the header file into your projects and once you did this change you don't have to update the header file anymore because Unless you use new functions, there is no need to update that file and as I said, it's still support Python 2.7 what we also did for the C API is to Define new guidelines to avoid issues that we had in the past and issues that we want to get right At least to avoid this issue when we add new functions. For example, a functions must not in return Borrowed references we should avoid to steal references and we should define the ownership rules on the lifetime of arguments on structure members and The idea is that if we follow this new guideline, it should be easier to To support the C API on the Python implementation other than C Python We also try to reorganize the header file in three different categories So what you call the limited API which is related to the stable Ibi The Publix C API and the internal API the internal is the one that you should not use and now it's well separated I'm a little bit hot of time. So I'm going a little bit quicker so the future on the future would be to To spend more time to think about a stable ABI and this one already exists since Python 3.2 And the idea is that you build your six dimensions once and you don't have to change your binary anymore Because as the ABI is stable you can distribute a single binary and it's just fine And we made some changes to check the ABI and to better document it And there are two well-known projects which are cryptography on PySide, which are using it So there is a new C API called Hpy and this one is designed to work to be efficient on PyPi And it also works on C Python and you can decide If you want the best performance So to get access directly to the C API of Python or to have something called the universal ABI Which provides a single binary working on all Python version all Python implementation And the idea is that you have single API. So it's quite convenient And there is a work in progress port for the NumPy project you do to use it and That this is something for you please test over alpha release please test over beta release and At least try to test the release candidates and provide feedback as soon as possible Because we need to know about your issues if we know about your issues earlier We have more time to fix your issues to help you to be great So it's very important for us to have the feedback as soon as possible Thank you Thank you for the for your insights on compatibility and multiple levels actually We have maybe time for one question. So there are two microphones in the room Or you can ask your questions in this court, which is also possible Thank you very much Victor one questions Do you know if anybody has experimented with depreciation by slow down? So still provide the API or the function but slow it down by a factor of whatever I Know that the Linux channel is doing that for all the API to motivate people to be great and So far we didn't decide to Decrease the quality of the old API on purpose There is no need for that. We just help people to move to the new one And if possible, we try to support the old and the new API But yeah, we are sometimes we think about deciding Introducing memory leaks or crashes random bugs Thank you very much. Please give another round of applause for Victor