 Hello everybody, thanks very much for coming. I know the social event was pretty successful, so I'm even more glad that you made it. I will try not to shout too much during the talk. So before I go through the topic of this talk, first I'd like to talk about something more important. So my name is Mark Smith. I am a developer advocate at Nexmo. I run the user group for Python in Edinburgh. My handle on most social networks is Judy2k for reasons that aren't as funny as you might think. I am known for writing silly Python code and then showing it to people for my own amusement. And despite the Viking helmet that I kind of use as my general brand, I'm not a Viking or Norwegian or Scandinavian in any way. So talking about Nexmo, briefly, we're a sponsor of the conference. You've probably maybe spoken to me on the booth, if not, and you would like to come and talk to me some point later today. We've got REST APIs for sending and receiving SMS messages and making phone calls. Can I just do a quick show of hands? We generally do this because not many people have heard of us usually. So before this conference, who had heard of Nexmo? Yeah, it's getting better. So my job at Nexmo is to maintain our client libraries for Python and Java, and in my own time I maintain a library for Go. And so I think a lot about compatibility and maintaining compatibility. My job is really to communicate with developers and find out what they want and try and feed that back into product pipeline. So this talk really ended up not quite being about refactoring. It's related, but it's not quite about refactoring. If you do want to know more about refactoring, I highly recommend this book. It's been around for a while. I think it popularized the concept generally. It contains 72 refactoring recipes for modifying your code in order to improve the quality of the structure and the architecture of the code. It's very Java centric, but it's got lots of experience in the pages of the book. It's got lots of good advice, but some of the patterns also don't really transfer to Python. Java's a very different language to Python. The main theme of the book is to be prepared all the time to be modifying your code to keep it clean and to allow new architectures to emerge as you discover them through developing and extending the library. What the library's not really worried about is whether you break your user's code. It doesn't worry so much about interfaces. It's more worried about code that you're allowed to change at any time. This isn't really true of libraries. This talk is really about Python library maintenance. It's got this assumption that you have released code with two or more users. What that results in is you've got a one-to-many relationship. If you break backwards compatibility in your one code base, your users in order to upgrade are going to have to change their code, and that's going to be many changes. That's something that should be avoided. The other part of this talk is about hiding change from your users. It allows you to change things internally without affecting your users in two negative ways. What we're really talking about is keeping our interface stable while changing the code. The reasons you'd want a stable interface, I think they're pretty obvious, but I'll go through them anyway, is really to keep your customers happy, to keep your users happy. Every time if you keep breaking backwards compatibility, they're going to get unhappy in general because you're making more work for them. Nobody likes being given work that they didn't necessarily want. Also, every time you break backwards compatibility, you're giving your users an opportunity to switch to a competing product, competing open source library. If they're going to have to rewrite their code in order to keep on using your library, why not just switch to the new hot thing? This has happened in web frameworks certainly a long time ago, frameworks which I don't really want to name because I feel like I'm maybe demonising them a little bit. There was a big glut of frameworks about 15 years ago, and nobody was really sure what the right way to do things were. As they learned, often they broke backwards compatibility, and really what ended up the successful frameworks, the ones that we're still using today, were really the ones that worried a lot about maintaining backwards compatibility with their users. So really libraries only survive if they're stable, and that's why we want stable interfaces. We talk about this idea of an interface. What is an interface? I think of it as the boundary of your code, so it's the code that your user interfaces with, so it's where your user uses your code. If you change it, it's where your user will be affected, and it consists of various components. The obvious one is a module. They'll be importing your module or package, so if you change the name of that, that's going to break things. The thing we usually think about is that the module consists of classes and functions and methods, and that's our interface, our agreement with the user. To go slightly finer grained, we've got the parameters that our functions and methods call. If you change the names, the expected types, return values as well, this is going to affect your user in negative ways. We've got global variables that are usually static, but potentially you might want to change them with something dynamic, and that may cause problems. Something that people often don't think about up front is exceptions, so if you change the names of your exceptions or the hierarchy of your exceptions, that's going to cause problems for your users. We've got the structure of our code, so this is really where your code lives. If you start off with a big single namespace and you've got a kind of utility function in there, your user might find it, decide it's useful, start using it, and then if you decide to tuck it away in a sub-package later on, that's going to break your client's code, but it's not something you necessarily expected. Finally, this is a little bit of a silly example, but we've got byte code as well, so potentially even changing your byte code could cause problems for your users. I don't know if people saw Sebastian Dax go-to hack the Lightning Talks a couple of days ago, but he's got a decorator that decompiles a function and modifies the byte code and recompiles it again in order to allow you to use go-tos in the code. Potentially changing the structure of your code might also break your compatibility with your users. This isn't really something you have to worry about because really they've broken all the rules if they do that kind of thing. But this is really kind of the bigger topic of the talk, is making an agreement with your customer. So ideally, interfaces, your deal with your customer is as strict as possible. I think of it as kind of one of these kids toys. So you've got the idea of well-defined objects passing through the wall of the interface. They're validated on the way in. You may call functions and methods with these objects, and then ultimately, well-defined, stable, well-understood objects are passed back or exceptions are raised. In the ideal world, it's impossible to refer to classes, methods, functions or variables that aren't exposed by the interface. A couple of days ago in Hinex Talk, he used the analogy of ravioli, which I think is particularly appropriate given where we are, and the idea that your interface is like the pasta that keeps the filling of your ravioli separate from the tasty, tasty sauce. It stops your pasta, it stops your dish from just being a mess. But the concept here, really, the idea is that even if you're not publishing your library, even if you don't have external customers, your code should be divided up this way. It's much easier to understand small, well-contained code libraries. Talking about strict interfaces, here are some things that are like that, like that definition of a strict interface like the toy. There'll be a few sniggers, I suspect, looking at this list of technologies, because none of these are technologies apart from maybe the first one that we particularly enjoy working with. They're all a little bit painful. If you want to define interfaces with J2EE, you essentially define your client code, your server code and a third module that's shared between the two that is your well-understood interface. With SOAP, you define a Wizzdle file, which defines, again, incoming and outgoing objects. So you're defining your interface separate from your implementation, and these ideas have been around for a long time. The dates on there are kind of 16, 17 or 18 years ago. But it goes back to Cobra as well, which is a hideous technology. It's enormous and horrible to deal with, but that's 26 years old. So these are things that are getting towards our ideal world. Python isn't like that. Let's have a look at a quick example. This is a bit silly, but I've imported requests, and I'm just listing the stuff that's inside the requests library. I'm sure we've all used it. At some point, we've got a magic thing that begins and ends with a double underscore. We've got a private thing that starts with a single underscore. We've got some things with interesting names, like get, which is a function, compact, which is a submodule and models. And it's not immediately obvious with some of these, whether they're private or public, whether we're really supposed to be using them. And something stupid we can do with it is we can import the module, we can then define a function, and then we can modify the request module that we've imported with our own get function. And then anywhere else in our code base where somebody imports requests, so they get our modified requests, and when they call get, it doesn't make an HTTP request, it just prints out a silly message. But it gives the idea you can dig as deep as you want into somebody else's Python code and switch out things that they may not want you to. And so we've gone from another appropriate seaside analogy. We've gone from the idea of our library as having a kind of tough shell that's difficult to break through that only gives you access to things on its terms to what Python is, which is really just kind of a box of Lego. You can reach in, you can grab anything you want, you can take it apart, you can put it together in different ways and then put it back in the box. So what we really need to do is kind of build our own. So we need to work with the tools that we have in order to build an agreement with our users. So we need to make our interface knowable because we can't stop people from digging into our code so we have to make sure they understand what we expect from them. And how do we do that? We've got a few conventions and so here I've got a private method. We know it's private because it begins with a single underscore. The single underscore doesn't change the behaviour of the method at all. We can still call it, you can see the last line of code, I call it, it works. But we know just by reading the name of the method that it is considered to be an internal detail that we're not really supposed to call it from outside this code base. So we should be using these. This is something that you kind of get for free, an agreement with your customer that this is a private implementation detail. What you sometimes see in less experienced developers' codes is that they've read a book, they've discovered that there's this double underscore prefix and then they start to use that for private methods especially. And that isn't what this was designed for, this was designed to avoid namespace conflicts. So this does change the behaviour of the class, the method name internally has changed and it's just prefixed with the name of the class and a single underscore. So you can still call it, you haven't stopped to be from calling it, you've just made it a little bit more difficult. And ultimately if your user really wants to call that method they're going to call it anyway. Although by this point they've probably copied and pasted the code into their own code base so it's a different kind of problem. And a third convention we have in the Python world is just to kind of move the code out of the way. So often you'll find in some quite popular libraries there's a public module or package at the top and then there's a sub-module internally that really contains all the code and then in the public module there's the thing that the user is expected to import. They've just kind of imported the public interfaces, the public functions and classes from the private sub-module. So it just puts it out of the way, if somebody imports your public module and then does a DIR everything they see is something that they're expected to work with and that's good so I recommend that too. But these are kind of, you get these for free. These are conventions in the Python world. Something a little bit more expensive but infinitely valuable is documentation. Documentation is the primary way that you communicate with your users. And the Python ecosystem has excellent documentation so much so that it's spawned a series of international conferences dedicated to the documentation, the read the docs conferences, that I would say spawn largely out of the Python community. And the reason documentation is so popular in the Python world is that we've learned that users will use the interface that you document. We usually distribute source code, our libraries as source code so somebody can read the code but you don't really want to. If you don't write documentation users will read your code. If you don't write documentation users will guess your interface and they'll guess it wrong which is not something that you want. So for documentation I recommend Sphinx or MakeDocs. MakeDocs is supposed to have a capital D, Google the maintainer of MakeDocs is sitting in front of me and has warned me about this before. I don't recommend auto generating your documentation. It's a really bad idea. By default everything is going to be published as documentation. That's not what you want. You want to filter your documentation down to the things that you want your customers to use. Slightly related to this is type hints. It's a recent addition to the language. I think it's still pretty controversial. It doesn't change the behaviour of the code at runtime by default. But it is used by IDEs to help users use the right types. In going types into your functions and return types as well. It gives the user an idea of what they're working with. So in general I think they're a good idea. Whether they're worth the cost of maintenance is difficult to tell. But I'm trying it out in the next mode of Python library to see whether that's something we want to expand and take forward. But once again we're sharing an understanding with our users, which is a good thing. So in answer to our original question, what is our interface, the answer really is code that's documented. Because that really is what the users will see. It gives them no reason then to dig into your code to get confused about what they're supposed to work with. So this is a good place to be. So once we've created an agreement with our users, then how do we stop ourselves from breaking the promises that we've made? And the answer is testing. So this is a famous quote, which although I've attributed it to Jacob Kaplan Moss, he denies ever having said it. So really it should be anonymous at the bottom there. So I would say there's a corollary to this, which is that if you're not testing your interface definition, then you don't have an interface definition. And to look at this standard testing pyramid, the idea here is that at the bottom we've got unit tests. We want as many of those as possible, or we want more of those proportionally, because they're quick to run, they're cheap to write, they're easy to maintain. And then at the top you've got manual tests, which are heavy and slow and you don't want to do those. So in the middle somewhere I would say you want fewer than integration tests, but you definitely want to have your interface tests divided out. That means that when you find yourself changing your interface tests, you know you've changed your interface, which means then you need to work out what you're going to do later on. You need to communicate these changes to your users. The way you communicate changes to your users is through versioning. I recommend semantic versioning. So the thing here is that your major number, which is the first number in this, I mean we've all seen this, but the major number indicates backwards compatibility. So whenever you break backwards compatibility that first number changes this is going to cost them some effort to upgrade. Minor and patch numbers are less aggressive changes and usually won't change the compatibility with the user. This is documented in detail on semver.org and it's there on version 2.0.0 of the specification, so they're reading their own dog food. Along with versioning, we've got release notes. I recommend you check out KeeperChangeLog.com. They offer a guidelines for maintaining your release notes. They say that every release you should divide your changes into a list of the things that were added, changed, deprecated, removed, fixed for our security changes. So the two that we're really interested in here are changed or removed. Those are things that are changing backwards compatibility so the things that we need to, we really, really need to make sure our users know because it's going to break their code. Although we're talking a lot about not breaking code, it is okay to break backwards compatibility but you want to do it occasionally. I recommend keeping a branch of breaking changes and pushing them unless they're essential, pushing them in a block so that you're not drip feeding changes to your users. That's a pretty negative thing to do. In general, add new functionality alongside the existing functionality and deprecate the existing functionality so that they should be migrating their code. Have a deprecation policy. I quite like Django's deprecation policy. Anything that they mark as deprecated gets removed apart from a couple of caveats gets removed in the next major release but you could leave it two releases so it gives the user more time to adjust to the changes that they need to make. So the real theme of this so far is that good engineering practices allow you protect your users from changing your library. Documentation, testing and versioning things you should do. I'm going to skim through some technical solutions now. The last time I ran this talk it ran over 40 minutes so I'm really going to be giving you some ideas but I'll give you a link at the end of the talk which links to some code that demonstrates all the examples that I'm about to give. The first one is pretty common. I suspect everybody in the room has used it at some point or not. You've got a class with a static attribute so it's pre-generated and you want to change it to a dynamically generated value and the way to do this is to use the app property decorator which just makes something look like it's a static attribute lookup but in fact calls a function instead so you can generate the value at that point so you can have it generated based on network value or time or calculated value inside the class. That's pretty straightforward but it's a good example of one of the things you can do in Python that you can't do in many other languages and the way that that works is down to something that you can read about in the Python documentation called the descriptive protocol which I highly recommend. It gave me a real understanding of what's going on beneath the surface of classes and attribute lookup. The next kind of refactoring thing you might want to do so this might be so that you only ever have one instance of a class essentially a singleton or so that you have a factory so objects that may be expensive to produce are pooled when they're not in use and then you return something from the pool if there's one available or you generate a new one on demand if there's not one available or possibly block until one's put back in the cache. In the refactoring book this is actually called replace constructor with factory method but the nice thing about Python so we can implement the lesser used dunder new method and then inside there we can either return a new class by calling new on the superclass or we can look something up and just return that value. There's no kind of prerequisite that you have to create a new class when you call a constructor which is kind of neat. So we can make these changes transparently to the user which is switched from instantiating each time to returning stuff from a pool you've done that under the hood which is good. The third example is replacing functions with methods. So when I start a new module design I tend to start with functions I prefer those to classes generally but sometimes what you find is you end up with some global state in your module which then if you change in the module it's essentially a singleton value and so you can't have multiple configured modules so really the way that you do that is to define a class inside your module and then instantiate that for the different cases that you have. So if you've published your library already with these top level functions that's kind of a problem but we can get around this because Python's got some pretty cool features. So this is the end result of this so the idea is that you had a module with the do the thing top level function and what we want to do is put it inside a class. We've taken the top level function and we've moved it under the app class so this is great. We can now instantiate an app and we can call the method but this is broken backwards compatibility so we need to fix that. The way that we fix that is we create a default instance of our class. We can either make that a public thing or a private thing. I think it's quite nice to make it public so that people can start to use module.default to refer to it and then what we do is a bit of magic to make the method which can just be assigned to a variable so that's what we've done. We've assigned it to the global variable do the thing and now it looks like a function. It's not a function, it's a bound method on your instance and it's just quite a nice way you would do this with each one of your top level functions that you've put inside the class. And finally and this is a slightly silly example but it's neat and again it shows how flexible Python is with being able to switch out certain types with other types when necessary so in this case I have a module which I really want to replace with an object so this is the module and it's got a single global variable in there called salt and you can see it's static, it's number four but somebody comes to us and tells us actually that shouldn't be four, that should be generated that should be the result of time so that's awkward because people are already using our module and they're calling module.salt or in this case important.salt to get access to that value and you need to generate it dynamically so you really want to replace that with a function but you can't do that because that breaks your interface with your user and if it was inside a class you could use the property decorator but you can't do that because they don't work on top level functions, they only work inside classes and so what we need to do is replace the module with an object instance so this is like crazy stuff I don't want to do this in your code but I just want to show how you might do this so here we've got a class called fake module because it's going to pretend to be a module and then we've defined our property on it so if we had an instance of this we can refer to the instance.salt and get back our value as if it was an attribute rather than a function and then here we instantiate our silly object and then we define it to the module so every module you've loaded inside your application is put into this dictionary insist.modules, you can have a list you can list it at any time you want to and see the modules that have been loaded but like many things in Python it's mutable so you can replace the module that's been loaded with an instance so now anywhere in the code that refers to this oops I don't actually have a slide for this excuse me so anything that refers to that code will continue to work but it will get back a dynamic value from this point on so just to run through some further techniques really what this comes down to is understanding the Python execution model and there's various of these magic methods and variables inside Python that you can use to replace one thing for another to essentially objects pretending to be other types of objects so we've already seen using DundunU to change the construction of a class we can make objects behave like iterators behind the scenes this is how iterators and iterables work we can make method calls look like attribute access using dundaget this is what property uses behind the scenes we can make classes that act like functions by implementing dundaget calls so you can replace global state with essentially an instance that's callable which is kind of neat and then there's manually extracting parameters so we can be extremely flexible about how we extract parameters inside a function normally you list the functions that you want sometimes you have an asterisk args to kind of collect random stuff that people might send into your function if you're particularly flexible but he's just shown me a number and now it's distracted me it's completely throwing me up but we can be much more flexible about that we can kind of define the rules of what's accepted inside a function and kind of almost deal with different sets of functions different sets of parameters that are provided so if you were to look at the low level documentation for flask for example almost everything in there takes asterisk args or asterisk asterisk kw args and then it just kind of pulls out the values that it needs to make the call behind the scenes and if you don't provide the right set you'll get an exception of some kind to tell you that you didn't follow the rules that have been defined in the documentation many of these special methods are described in Dive into Python which is free and available online and I highly recommend having a look through the list to see what's possible when using these tricks especially the last trick you have to be aware that every change you make has risks attached to it and the more magic you use the more risk you're really taking that you're going to break compatibility accidentally with your users sometimes it's better often it's better to break code than to use magic I would say generally default to the straightforward changes but if you have a lot of users that might stop you from making changes at all so just be aware that there are opportunities sometimes to use a little bit of magic to help you in your relationship with your users the use of exceptions in code varies wildly between different code bases it's very difficult to know what your user is going to do in their code base to handle the exceptions that you throw so I don't really have any advice for this it's just be aware that any changes you make to the set of exceptions that you throw may cause problems for your users it's another issue for documentation make sure you document clearly what the exception conditions are and what will happen in those cases type assumptions because it's a dynamically typed language are slightly scary in Python so when you switch out one type it's a supposedly compatible type so you change a string for a list of characters or you change a string for an iterable of characters or something like that they often look compatible and your tests might continue to pass but suddenly you can't loop more than once because you're dealing with an iterator rather than a data type that you can loop through multiple times so just be aware that these do have different properties and make sure that either your documentation changes to match that you don't make the change at all really monkey patching is always a risk people do crazy stuff especially for finding a bug in your code that you don't respond to quickly enough they will inject their code into your code base there's not really anything you can do about this they've broken the deal with you so it's not something to worry about there's just something to be aware of that will happen there's no silver bullet you will break client code Python is such a dynamic language and so open and transparent any change you make carries some risk but ultimately it's a question of documenting to your users what the assumptions are and trying as much as possible to keep to that agreement so to sum up if you want your interface to be strong and stable you need to know what it is you need to document it so your users know what it is you need to test it so that you're keeping the promises that you've made when you need to change backwards compatibility you need to communicate that with your users it's better to do that upfront than after the fact and it's helpful to know some tricks for swapping out code in a dynamic language like Python so there's a github repository on that bit.ly address which contains more low level details of some of the tricks that I was showing and the PDF of these slides if you're interested you can contact me on that email address or feel free to follow me on Twitter if you've ever seen me around the conference I'm usually wearing a pink hat feel free to talk to me about Nexmo or this talk or anything else thank you very much