 Hi, welcome back. Please join me in welcoming Francisco Passos, who is from Google. He's a software engineer having 15 years of extensive experience in building productive and efficient software. He'll be discussing about migrating code bases, having millions of modules from Python 2 to Python 3. And he'll be also taking the inference from the work that has been done by his team and him in Google for creating the tools around the same. He has previously worked on Google Maps, Mail, Cloud, and Search infrastructure. And he is also keen in risk detection for the international trade. Please welcome Francisco. So welcome everyone. Thank you for attending. My name is Francisco. I work for Google. Thank you for the introduction. I'll be telling you today about how we're migrating to Python 3. Along the way, I hope that I can share some ideas that may be useful to you at your scale. Now, a warning, Google is not normal scale. So some of these tips may generalize well. And I expect that they will, but not all of them will, for sure. And so please apply generous amounts of common sense. We all know that sometimes we end up doing things just because Google or any other big company does it. And that's definitely not going to apply here. But for some things, yeah, maybe. So take your pick. Right. So a submission context. Google works in, we code into a monorepo, a very large colossal monorepo, but monorepo all the same. And we build and test using Blaze. If you, you may know Basel, Basel is an open source build system. And that we that we open sourced and blaze is a super set of it. It's basically so that it can actually tie in with our, you know, building infrastructure. So our support internally for Python 2 is limited. And it's, it's going away within a short timeframe. And, and it's kind of unrealistic to actually expect to migrate everything centrally, because there's a lot of Python modules. The code of individual teams is moving really fast. So even if we were to try and start migrating a bit of code today and finish it within a week or so. Maybe that's a bit of code that is receiving a lot of intensive attention and it may be hard to actually catch up and have to do with a lot of conflicts. Code ownership changes at this scale code ownership changes teams sometimes die and that code needs to be inherited, or a team gets defrag to another location, or a team gets absorbed into another organization. And so the ownership of code actually does change. No individual piece of code changes hands often, but at this scale, there's always things changing like daily and some assumptions of the code can be very quirky in some places you will have C++ integrations. This is not the norm but it does happen. And the fundamental remains, which is domain specific code is hard to migrate for someone who does not have intrinsic knowledge. So this is always going to be something that is going to be easier for the code owners to actually migrate. Now that's not to say that you can't drive things centrally, and you should we'll talk about that. So, coming into this, we have this problem we have way too much work, we have to little structure, like we want we don't know if this is feasible to, at some point, some people are going to have to make a call do we need this. And some people are going to have to to figure out how to plan it, and figure out where to start. And, and which is my bit, which part do I work on. It's not exactly clear. If something is mine to own because we're all coming to the same repo sure there's folders. But that's not the entire story because you can contribute codes to pretty much any other folder so long as the owner is accepted. And so there's this cooperative climate that makes a change of the skill, not easy to approach. So things like something has effectively gotten unknown, because the team that owned it really doesn't own it anymore I mean they still own it on paper but they don't do it. In practice, and it's primarily contributor based the users who still use it actually are the people who are maintaining it. So now what's going to happen. And this is this stuff, and it's very easy to have this point to get overwhelmed by the lines of code by the number of modules by the number of changes you predict you need to do by the complicated dependency tree that you're going to have. And by the number of people involved. This is approach. Let's talk about mechanical transformations first. This is the tiny tip of your dependency tree. So let's say, let's say that you have this reflects only the Python to like library dependencies. And the key, the key concept here is, you're going to want to avoid making forcing a bulk migration, you can't bulk migrate a thousand, I mean shouldn't both migrate a thousand modules at the same time. You're going to have to do this piece in order for you to do that. We're going to have to individually pick at each of the modules that you can work on and make them Python to and Python three compatible simultaneously. If you can do that, then you can actually unblock further work. And it also means that you can also detect regressions, because if you break a library for one of the sides. So for Python two, you'll break a bunch of binaries and already executing Python, there are still executing Python two. If you break it for Python three, you're going to break a bunch of binaries for Python three so you can actually detect regressions and make sure that things keep going. At this point, let's say you actually know your dependency graph, and these live one with two and three, they are the leaves of your Python to depend dependency graph. It means they have no Python to dependencies themselves, they are dependent on, they do, but they do not depend on Python to libraries. So as these that they are, they are unblocked so you can actually go and work on them. And you find these targets that you can work on, and then you work on them individually change the code to make whatever changes you need to do to ensure that this code will now work well, both in Python two and Python three. Once you've done that, you haven't blocked work on the libraries that are depending on those. Now notice that even if you're still executing binaries that are running Python to this is going to be fine because live four and live five, import all those libraries and they are still supposed to work well for Python two. But importantly, at this point, live four and live five have become unblocked in your Python to dependency graph, they are now leaves that dependency graph. And so now you can actually work on them. If you imagine that in this case we want to leave three belong to your team live for belongs to another team and live five become belongs to another team. Now both of those teams can be working in parallel. And now they have leaves that they can work on because until now they couldn't. So there's a bit of an unblocking effect here. You keep going right until you're done. At some point you reach an entry point, a binary, basically a module that is an entry point module, and that's decided that you need to tell your build system this is supposed to run in Python two or Python three. If all of its dependencies are Python three compliant, then you can actually switch that one is done. You don't need to go back. It is rather interesting that. So one thing we found is that it's, it's, if you have a number of unblocked targets to work on, you can actually be smart about it you can figure out which is the one that is blocking most people and work on that one first. That is actually the one that is going to work best to the advantage of your of your organization. It's an interesting idea we wanted to share with you, which is, and that's definitely this is not for for the small code base, but if you have a large code base with a lot of teams. And there's not a lot of buy-in into actually affecting the migration. You may actually benefit from a mechanism to prevent backsliding. If you could, we tried, and it actually to the great work for us, we introduced an allow list saying your modules are going to stop working if you're running in Python two from this date. Unless you actually entered this allow list, and then you have more time to actually work on your migration. But in order to enter the allow list, you have to commit via an issue on a bug tracker or whichever way you want. You have to commit to plan to a timeline to mechanism that actually gets you there. So that is actually a powerful push because it could also drive people to just go like, you know what, we kind of have been thinking about dropping this this module entirely and replace the usage with this other new fangled thing which is compliant. So regarding mechanical transformations that there's some pieces of opensource doing that can be incredibly valuable. The first concept to know is if you have tests, and you definitely should have tests and if you don't, you probably should create them because it's going to be very hard to be confident that your code base is going to work. If that sounds really expensive, not being able to verify that hundreds and thousands and tens of thousands of modules are actually working is far more expensive if they reach production in a nonworking state. It would be absolutely chaotic. It would be untenable so we could not afford that. For the most part we have tests and where we didn't we created a bunch. So the notion of once, if you have tests once you have tests, you can actually run them both in Python to Python three. And if they both run. I mean that gives you no solid strong guarantee, but it gives you some guarantee. It's much better than nothing because as a dynamic language you're not going to know. Unless you run it definitely consider using modernize before you even do anything else that's we've realized that modernize is such a solid tool. Most of our automated changes begin with running modernize and then go and we we hack at the rest of it. And importantly, pie type might buy any any type of any kind of type checking that you can throw at it may actually be very useful, even before you migrate, because it may find individual type violations that you have in your Python to code. And if that might work accidentally in Python two, and it will break in Python three. So getting getting your types actually intentionally right means that by the time you get to migrate to three. You, you are, you're in better shape, because if you need to reverse and you go back to two that is solid. It's just good for more around if you can get away with that, then definitely consider that. All right, now instrumentation, we've, we've talked about what you exactly need to do mechanically, but quite the main questions remain, where do you start. So I'm going to tell you about something that is actually pretty costly. But in order for you to leverage that dependency graph you need to know the dependency graph. So you need to build it, you need to maintain it, you need to refresh it often, like once a day, maybe more than that. You need to instrument it. You want to be able to use it to answer questions like, what did we already do. What, what do we still need to do. And here's the things that we can do, which ones are more relevant right now. And is this ours because then you can actually restrict you look at the dependency graph that you want. And within that sub sub graph, you can actually figure out what are my projects individual leaves, what are my projects individual bottlenecks. And so more questions that you want to be able to respond from there is, is this module Python to compatible Python tree compatible or both. And from there. So you want to ask more interesting questions like how many things are blocking this module I want to have this as fast as possible. But this is blocked under the things how many different things are blocking it. Which ones are they should I are they mine, or are they in other teams ownership contexts, and should I go talk to them, should we actually be in conversations, might it be that they're thinking of doing this later, and I can actually jump ahead and go and fix it for them. At that point you can actually ask, while this is the one that I was mentioning, which are the leaves for me right now, which are the Python to libraries that are in block that I can go migrate right now. Where are the bottlenecks, like, which are the libraries that if I go and fix right now will unblock the most amount of work and will increase federalization of work across my project or across the company. And then, ideally, you want to, to give people enough of a context to understand, is it simpler to just draw my dependency. Because if you know that migrating this library is going to be very, very costly, but it's going to take months until we can even properly test it because it depends on so many other things. So maybe, maybe that other module that you've been thinking about migrating to but you never get the time maybe that maybe that migration is actually less expensive, less costly. So, so you can look at it more critically because you look at what is the cost of bringing bringing this along versus the cost of dropping it, because there's a cost in both and you can you can try and put them on the scales. And that can allow you to be strategic. Right. You can figure out which are the business critical models either because they have really high business value, or are on the delivery path for something that is really critical, or because they have high reputational value, or Oh, yes. Oh, perfect. Thank you. Or, because you want to focus on the things that will maximize the amount of work that can be done in parallel by your team or the rest of the company. It is important to know about on the topic of bottleneck modules. There's, there's a fundamental set of libraries that your organization and pretty much any organization is going to rely upon for a big fraction of the code. And we ended up calling these core libraries, and your progress at the beginning of them. And this is especially important for for PM type people that might be watching or or higher than that. It's going to be the case that at the beginning of your migration, you're going to, it's going to look like it's not really making a lot of progress, or it's definitely very slow. And the point is, so many things are blocked by these bottlenecked bottleneck core modules that unless until you actually fix the last one of those. It would be really tough to actually make any progress on measurably, visibly tested and passing passing tests modules in other pieces of the code base. So project tracking. Once you have all of that information, once you have all that instrumentation and all of your data flowing and being regenerated, you can actually make a dashboard, it can make a dynamic dashboard. And bonus points if you make them, if you make the dashboard filterable by individual projects or things like that, because then what you can do is at a global level you can see how far you've come, how far is our company come in terms of doing our global Python three migration project managers can actually track progress within their focused areas. Let's say you have brought manager for a specific product area for specific set of projects, then you can actually go and have a look at that specific project within a team or within a project, you can track your individual progress there and figure out what you need to do there, how far you're coming. And, and effectively to your engineers and I can state this for a fact, having this graph refreshed at least daily for the progress of your team means you have a burned out the yellow line is burned down. And the right, the red, the red line is how much you've accomplished. So this is effectively gamifying by providing a positive feedback loop on the progress of these these things, and we cannot underestimate the effect of psychological motivator motivators here, right. A lot of engineers will be intrinsically motivated to do this because they want to be on the on the new fangle thing. But a lot of people will not be motivated in the same in the same manner. So that's, yeah, it's something to keep in mind. Cool. More things about instrumentation. Something that we do on our dashboards is we provide fundamental things, fundamental information about individual modules we call them targets but they're for all intents and purposes, their modules. And so the idea is that you have two key pieces of information for the, your project, which is, what is, what are the things that you can actually do right now, you're a great upgradeable targets. And what, what are the things that you cannot do right now in the project. And what's walking them, right. The other things are the relationship between your code base and the rest of the code base. So what is blocked by my code and what other code is walking my code. So these are important things we found these to be incredibly useful. And it's like we hope this these work for you, if you can put them together. So centralizing is still an important thing. Now we've been talking about how centralizing kind of is not a technique that you can rely on to solve the entire problem, but it's still very important because you need to know where you're going. And at Google, we have a permanent dedicated Python team and a dedicated migration team. But if you don't, which is totally reasonable, consider creating a subset of, of your people that is dedicated to this. So you can have people that are setting the technical strategy that are setting up the tooling instrumentation, the mechanisms for blocking regressions. And there are both curve converting things were applicable. And in some places if something is really highly strategic they are hand migrating it. So we managed to actually migrate 30% of our modules centrally, either by using automation or by hand. And so this also helped a lot, because both migrating 30% of your code base means you're sending code for review to the owners. And they actually get a lot of visibility. Oh, this is moving we need to get going too. So at Google, we tend to approach program management problems as engineering problems. And all that we've been discussing so far is how you actually establish a process that will work for you. So what this slide is about change management, if you have a change management team. If you have a change management team, you're probably going to want to engage them and actually pull them in to help wrangle this. So you're going to want to be available to educate the leaders so that they can actually establish a mandate. If you don't get this coming from the top this is not going to happen. Be prepared that they will ask very deliberate and very detailed estimate costs and pilots and planning. That's a lot of hard work that you need to be prepared to do. And the other additional to that your team is going to need to be able to document everything and answer a lot of questions you don't want anyone to stay locked just because they don't know how to go from where they are. Now, you won't be able to scale that unless you write a lot, you create common conversion patterns documentation, you have an FAQ, you have lots of pointers to the things that will allow people to unblock themselves. And also, there's, there's some communication engineering you don't want to over communicate. You want to be incredibly motivational when you when you speak about this, even if, if teams are coming in like six months late to the migration. You still want to want to welcome them and say this is a great time for you to start. And here's all that you need here's all the materials here's all the tooling. Go forth and let us know if you get blocked. Know that victory is possible. Because we have all of that we can actually track our progress globally so this is actually our progress. You may be wondering what are those blips over there those big bumps and ups and ups and downs. Those are basically times that fixing a bunch of things that then introduced breakage that we could not catch by continuous integration then rolling that back. The big jumps are actually automation that went in and we both migrated a bunch of things. And along the way, but especially in this last mile here, you should expect to find a lot of very tricky code, or a lot of very organizational complexity. At that point, dedicated project management and using the centralized team connection be very, very useful. These are probably what they are the experts right. And with that quick summary, try and drive as much as you can centrally expect that you can't drive everything. So ensure that you create whatever is necessary for other people to do it too. So what my great what you can create as much instrumentation as much tracking as you can make sure you have your leadership support in and your communications is settled, and your incentives are done. Try prevent back slides if you can, and then make sure people can see what they've done, what they have left to do, and that they have the tooling and the documentation and the knowledge to know what to do from that point upon. And with that, thank you so much for attending. I don't know exactly if there's questions, but I'll be happy to take them. Thanks, Francisco. That was quite a nice talk. Excellent work. We have two questions. One is from Diego, he's asking, what are your projections towards completely migrating from Python two to Python three. I'm not sure. Let me, let me go with, we have a hard deadline. So we had a hard deadline for the, for people to actually enter the white, the, the, the allow list. And that deadline has passed. So all of the targets that had not been migrated actually have a plan. We have a line in the sand we have another date by which everything should have been should have been migrated. This is, it's, it's tricky business and I'll tell you why, even if you do get people to commit to a specific plan and say by this date will have done. What if there are other teams that also need to commit, but they are depending on the, the libraries from the first. That's hard, right. So if someone commits to the, to one day before the, the end of, of the deadline, the, the ultimate deadline and then the other teams are left in the dumps and that's not a good, good experience. We're trying to actually figure that out. I'm not going to give you the date because the date is not, is not exactly relevant, but, but yeah, we expect to do it soon. And we're, we're kind of in good shape to do it. All right. So Andy has a question, how did you build your module dependency graph. Cool. So there is, in terms of open source swing, there's something called module graph module graph that will actually inspect your dependency tree and do that for you. And the question that we are using blaze, which is a super set of basil and basil has querying and graphing, graphing mechanisms. In our case, we did not even use, I mean, we do use that. Yes. We use that to, to build. Well, Dremel, if you look up Dremel, there's an open source paper on it, an open paper on it. Query engine, query engine that we, that Google built. And so basically we export everything into, into these tables that then we can literally query and SQL and joins and do all sorts of things. So what you saw there in the graphs, and also on the tables that I, that I showed the headers of those are basically literally SQL queries that that we write that basically join targets against their dependencies and tries to figure out if they are, if they have been seen to actually pass Python two or Python three tests and things like that. All right. I think that answers the question. We have another question by Thomas, as well as met here. How do you manage to have such a big code base placed in a repository. What are the challenges with respect to managing this code base of this, this much volume. Yeah, that's, that would be another talk. And I should probably attend that talk. But I will say it is, it is tricky. And for me it was quite an adaptation, even when I joined nine years ago, because not only are we in the single repository, we are all working off head. So if your module depends on another module that another team created, you don't depend on a version, you depend on head, which means greenness is a is an important concept for us. You don't want to submit things that are broken. So the things like TDD where you submit a broken where you commit a broken test and then you fix it. That's fine if you do that locally. Just don't please don't do it to the global repository because you're going to break somebody somebody else and they're going to have a bad day. Fortunately, we have pre submit triggers that will actually avoid that happening. And, and so it so we will know this we will not for the most part we can prevent breaking changes from going in. But you are correct. It's, it's, it's a bit of a challenge. However, it comes. It's, it's a trade off. There's actually some some write ups on this online on on the benefits that you get from not having things like branches and so while certain cases will actually benefit a lot from branching the way that we conduct our project work really does not lend itself to a lot of branching so staying on heads simplifies release management and things like that. So there's, you know, it's a it's a given take it's it's a trade off. It's definitely not a common trade off so I take your point it's it's not an easy thing. You have time for more questions if anybody is interested to ask else you may please join by creating Python two to three breakout channel in the brand track. I'm there. So I'll see you there. Thanks a lot for joining enjoy the rest of the conference.