 All right, hello, everybody. I'd like to start off by saying that as a North Bay native, I am very glad that we finally have a conference in the North Bay and that we're getting some programming attention here. Could I get a round of applause for the organizers for putting this together? All right, so the title of my talk today is Import Madness. This talk stems from an experience that I had about a year ago when I decided it was finally time for me to seriously try upgrading to Python 3. So I made a little toy Python 3 project. Pretty soon, it wasn't so little. I needed to clean it up, so I moved some files around, and boom, something broke, and I got an import error. Now at the time, I had a foolproof method for dealing with import errors, which was to randomly rearrange keywords and package names until magically something worked, as one does. But it didn't work this time. This was a particularly stubborn import error. Not knowing what else to do, I sat down and I read the entire documentation of the Python import system. Not only did that teach me how to fix my particular problem, but it also taught me that the Python import system is a lot more powerful and interesting than I had realized. So today, I'm going to first walk you through how the Python import system actually works. Then I'm going to show you how to use that knowledge to solve some of the more common import errors that you might get. Then I'm going to do some stuff that makes building a Fortran interpreter seem like a good idea. Then I'm going to close by warning you about why the Python import system should really scare you. So let's go. You may have noticed the title of my talk, Import Madness, is actually executable Python code. So let's go ahead and run it and see what happens. Oh no, an import error. How could this happen to us? Well, we've got to do something about it, so let's start by reading the text of the error message. It's telling us that there is no module named Madness. Okay, what does that mean? More specifically, what even is a module? Now this may be obvious to a lot of you, but I have to admit that it really wasn't clear to me until I finally read the documentation. According to the official Python tutorial, a module is a file containing Python definitions and statements. So your Python source code file with its PY extension is a module. That's pretty straightforward, but there's a twist because it turns out the word module has another usage in Python. A module is also an object in a running Python program that contains variables that we're defined in. Wait for it. A module. That's a little confusing. To understand why we need this double usage of the term module, we need to understand what the Python import system is actually doing. So what does the import keyword do? Well, before I tell you, let me tell you what it doesn't do. I think it's for a long time, I had a vague and incorrect impression, probably based on an analogy to some other languages. In some languages, like C, if you want to include some code that's not in your main source file, you do it using something like this include directive. As you can see in the screenshot, which is taken from the C Python interpreter, which is written in C. This looks an awful lot like a Python import statement, but it behaves very differently. What this actually is is an instruction to the compiler that tells the compiler, when you're compiling my program, first go find these files and glue the text of them together into one great big source file and then compile that all at once. That gluing together process is called linking. And that is not what Python does. Python does not textually link your code. Instead, it does something a little more special. And what exactly is that? Well, it turns out that the import statement in Python is actually syntactic sugar. Syntactic sugar means a simpler way of writing something more complicated. And in this case, the something more complicated has two parts. First of all, the import statement will call a built-in internal function called dunder import, passing in the name of the module that you want. Dunder import will find, load, initialize, and return a module object. Then the import statement will take that module object and use it to bind any local variable names as appropriate. So if I do from foo import spam, dunder import will return the foo module. The import statement will look inside the module to find the spam variable and will set the local name spam equal to that variable. All right, so what is this dunder import function? Why do we need another function with basically the same name? Well, as I mentioned, dunder import has a very important job, which is to find, load, and initialize your module. And it does that in a five-step process. First of all, Python caches the modules that have been imported so far. That cache is pretty much just a dictionary. It's held in a variable called sys.modules. So when you try to import a module, Python will first check in the cache and see if the module you want is already there. If it is, it will just return it and not do any more work. If the module you want is not in the cache, then Python needs to find it. There's actually some pretty complicated machinery available for finding module code, but most of the time it's going to come down to looking at this variable called sys.path, which is just a list of locations on your file system that you've told Python to look for your code. In my personal experience, most of the import errors that I've gotten have had something to do with the problem with my sys.path, meaning that either I had not told Python where to find my source code or my source code wasn't where I thought it was. So when you have an import error, start off by looking at your sys.path, making sure that Python actually knows where to find your source code. All right, so Python has found your code. Now it's time to make the module object. Python is a built-in type called module type. It will instantiate an instance of module type passing in the source code that it just found. Now this partially initialized module object will be put in the module cache for later use. Finally, the exciting part. Python will initialize your module by executing the entire source code of the module. Any variables or functions that get defined during that execution will be retained inside of a special dictionary called a namespace. You can see what's in that namespace in the future by looking at the dunderdict attribute on a reference to that module. Let me take a second here to talk about namespaces. Because the Zen of Python teaches us, namespaces are one honking, great idea, and we should do more of them. Namespaces are how Python manages scope. Anytime you reference a variable in Python, the interpreter has to figure out what you're talking about, and it does that with the help of namespaces. Namespaces can take many forms in Python, but in this context, they're pretty much just a dictionary that contains the names and values of any variables that are defined and accessible at a particular point in your program. They nest inside of each other like Russian dolls. The outermost namespace contains the keywords that are built into Python itself. Inside of that is a namespace for the module where your code is currently running. We also call that the global namespace. Then if your code is inside the body of a particular function or class, there'll be a special local namespace attached to that function. So let's say I'm inside of a function and I reference a variable called spam. The interpreter will check and see if spam has been defined inside of the namespace of the local function. If it hasn't, it will check and see if spam is defined inside of the module. If it's not defined there, it will check and see if spam is a keyword built into Python, which it's not yet, so Python will raise a name error telling us that it doesn't know what we mean by the name spam. Namespaces are really helpful for the import system because they let modules smoothly coexist. As you know, it's no problem at all in Python to have two functions that have the same name as long as they live inside of different modules. I can define my own function called post and still use the post function from the request module by simply defining my version inside of my main module and then importing the request version because they live inside of different namespaces, they don't clash, they live together nicely. All right, so in a nutshell, this is how the Python import system works. Now let's use this knowledge to debug what is probably the second most common kind of import error that you might see in Python, which is a circular import. Here's an example. I have two modules, a person and a cafe. The person wants to order some food from the cafe, but what she orders depends on what the cafe is serving today. The cafe is gonna decide what food to serve based on the person's food preferences and specifically whether or not she likes spam. Here's how I can get myself into trouble. I start out inside of my main module. Main imports the person module. Person module imports the cafe. Cafe tries to import a variable from person. Python does not like this and I get an import error. So what do I do when I have a circular import? The first step is to ask yourself whether you really need a circular import in the first place. Most of the time you don't and it's kind of a code smell when you find yourself with one. So the first thing you should do is ask yourself, is there some common shared code or functionality in these modules that I can refactor and extract into a third module that my first two modules can import? If you can, do that and it will solve your problem. But sometimes this doesn't work and sometimes circular imports are genuinely useful. And you can in fact make them work in Python. You just have to think about the exact order of operations of the import system to make sure that you do things in an acceptable way. Here's that same example I showed you a moment ago. But this time it's going to work. All I had to do was swap the order of two lines. So here again, I'm gonna start inside of my main module and import my person. Before, the person immediately imported the cafe. But now, the person is first going to define the variable that cafe is going to need. Then, it can import cafe. Now cafe can import from person this like spam variable. This time it's actually going to work. And here's why. As soon as I started importing the person module, Python created a module object and put it inside of the module cache. So when cafe tries to import person, it's gonna get that module out of the cache. The reason this failed last time is not that circular imports are specifically forbidden. It's because when the person module came out of the cache here, the variable the cafe needed wasn't defined yet. But now I have defined it by the time it's needed. So that import statement can actually succeed. You know, it can finish defining cafe, finish defining person, and then my person can place their order with no problem. All right, so this is a lot of really cool stuff about the import system. I learned how to do this. It bubbled around in my head for a while. I thought about it a lot. Then I thought about it a bit too much. Then all of a sudden, a wild question appeared. I had learned how to use the Python import system correctly. I had not yet learned how to use the Python import system deeply incorrectly, which led to the natural next question. Can you implement a merge sort algorithm in Python using only the import keyword? Spoiler alert, yes you can. Now, should. Should you implement a merge sort using the import keyword? No, you absolutely should not. For anything, they'll be used for anything, ever. That said, it can be done, and I'm gonna show you how to do it because I hope the experience of seeing it will sear some permanent knowledge about importing into your brains. So here we go. First, let's talk about merge sort. Merge sort is a recursive sorting algorithm. Give it a list of numbers, it'll give you back a sorted list. Merge sort is cool because it's computationally efficient, meaning it has good computational complexity. If you need to sort a list, the longer the list is, the more time it will take to sort. But merge sort is special because there is no other currently known algorithm that will consistently slow down by less than merge sort does. Here's how it works. Take a list of numbers, you split the list in half. You split the halves in half, keep splitting all the way down recursively until you have individual elements, like you can see here on this middle row. Now we merge by taking adjacent elements and interleaving them in sorted order. On this middle row, we have 38 next to 27, interleave in sorted order to get 27, 38. Move across the row and then move down to the next row. Now my adjacent elements are 27, 38 next to 343. Merge them by interleaving in sorted order to get 327, 38, 43. Repeat that all the way back up your execution stack and you will end up with a fully sorted list that only took you N, log, N operations to sort where N is the length of the list. All right, here's a working implementation of merge sort in Python. I've blurred out a lot of the code so I don't want you to focus on it but I just want you to see the overall structure. You have a function that's called merge sort. It takes an input list to work on. It splits the list in half and it calls itself recursively on the left and right half of that list. It repeats that recursive splitting and calling until it gets down to the base case which is when you have just one element that it starts returning. After every return, it will merge together the left and right sorted list that it got from the recursive calling and then return that repeat all the way back up to get your sorted list. So this works, but there's a problem. The problem is that this uses functions. Functions are boring. We want to use imports. So how are we going to do that? Well, it all hinges on something I said a few minutes ago which is that remember in Python, when you import a module, Python executes the entire source code in the top level of the module. Now when you call a function in Python, Python executes the entire source code in the body of the function. And if you think about that a little bit too hard, you realize that those are actually very similar. They're so similar in fact that it turns out that we can emulate function execution by taking the body of the function that we want to call writing it out into a new module and importing that module. That will cause the same code to be run. So that is exactly what we're going to do. We're going to emulate that merge sort of implementation I just showed you by taking the implementation, writing it out into a new module every time we want to call it and importing the new module instead of calling the function. That's going to generate a structure that looks like this. At the top, we're going to have a starter module called madness. Madness is going to generate a random list of numbers and then it's going to write out a new module called merge sort.py. It's going to import merge sort.py, which is going to cause it to be executed. During that execution, that module is going to split the list that it was given and then create two new child modules, a left and a right child. It's then going to import its own children. That will cause them to be executed. They can then take the list that they were given, split it, create their own children, import those, and so on, all the way down the tree. Until we get to the bottom, when things can start being handed back to their parents, merge together, handed back to the parents, until we get back up to the top. So I don't have time to walk you through this entire implementation. I'm just going to zoom in on one step, which is this intermediate left child node shown here in red. At this point, we were created by this original merge sort module. We handed it half of the original list. We split the list in half, and now to keep this going, we have to create our new children and then import them. So how do we do that exactly? Well, here we go. Now remember, this is a recursive algorithm. So every step is going to run the same code, meaning that the children are going to need to have the same source code as the parents. So if we're in left.py, we want to create a new child, we're going to need to give it our own source code. So let's get our own source code. Python makes that surprisingly easy, because the interpreter gives you a built-in attribute called dunder name, which has the name of the module that you're currently running, which in this case is just left because we're importing left.py. With that name, we can get a reference to the current module out of the module cache. Then we can get our own module's current source code by using a built-in Python library called inspect. So that's the source code for our child. There's a little problem here, which is that we don't want our child to do exactly the same thing we did, because we only want it to operate on half of the list that we were working on. In other words, we need to figure out how to emulate passing an argument into the function that we're emulating. To do that, we just have to remember that source code is simply text, meaning that we can manipulate it using the full suite of string manipulation tools that Python gives us. Here we use a regular expression to find a specific line in our source code with the list to operate on is defined, and we replace the entire line. It gives us the exact source code for our child. So let's take that code and let's write it out into a new function or a new module called left.py. Now wait a minute, George, you might say. Aren't you currently importing left.py? Well, yes I am. So isn't it maybe a bad idea to overwrite left.py? Yeah, it's a terrible idea. But it's actually gonna work out just fine. It is gonna create one complication, which is that remember that Python caches modules. So at this point, if I just try to import my new left module, Python's gonna say, hey, you already have a left module. Let me just give it back to you. We'll have to say Python, thank you for your help, but no thank you, I know what I'm doing, and invalidate some caches. But now the coast is clear. All we have to do is from left import sorted sublist as left sorted. This will find and import our new child, causing it to be run, causing it to split its list, create its own children, import them all the way down the stack, all the way back up until things get merged together and are made available as this sorted sublist in left's namespace. We can then bind that to left sorted to get the left half of our list sorted. We repeat the exact same thing for the right half, and that's the entire core of the algorithm. Now this is gonna create a lot of extra files. So to be a good citizen, let's go ahead and clean up after ourselves by deleting our newly created left child. So yes, we haven't actually finished importing left.py here, but we've already overwritten it and then deleted it. So I guess we don't have to worry about it anymore. All right, we're really close. All we have to do now is create that top level madness module that's gonna kick this whole thing off. Madness needs access to the source code of this implementation so that it can write out the starter merge sort module. Where's it gonna get that source code? All to make things easy, let's follow some common Python advice, which is that it's a good idea to make your code self-documenting. Let's make this code self-documenting by taking the entire implementation of merge sort and jamming it directly into the doc string of madness.py with a helpful comment. This is gonna be really useful because Python gives you runtime access to the doc strings of modules. So Madness can read in this implementation out of its own documentation. Then it can generate the random list of numbers and then write that out into our new merge sort module. And now the moment you've all been waiting for. From merge sort, import sorted sublist as sorted list. This will import that new merge sort module, causing it to be run. It will split its list, create its children, import those children, they'll create their own children and import them all the way down. Things will start bubbling up, getting merged until they're all the way back up here, the sorted sublist, which we can then bind locally as our final fully sorted list. And that is how you implement merge sort in Python using only the import statement to prove to you that this actually works. Here's a highly scripted demo. I'm just going to run my madness module. It'll generate a random list of numbers, read in its own doc string, stick the list of numbers into the source code of the doc string, write that out into merge sort.py, import it, causing all this importing to happen, causing the list to be sorted. Did this work correctly? Well, let's compare it to a conventionally sorted list. They are the same, import sort successful. All right, so this is some really crazy stuff you've just witnessed. Why have you been listening to me talk about this? Well, there's a few things I hope you take away from this. One, it's just an appreciation for how cool and powerful the import system is. Beyond that, I hope this deepens your appreciation for the fact that Python is really, really, really dynamic. We all know that Python has dynamic types, but that's just the start of it. In this example, you've seen that you can get the name of the module that's currently running. You can get its documentation, you can get its source code, you can modify that source code and write it out into a new module, which you can then import all while your original program is still running. That's pretty dynamic. Now, you should almost never actually do those things, but it's pretty cool to know that you can, right? Number two, a practical thing. You can solve almost all of your import errors by looking at your sys.path and making sure that Python knows where to find your code. You can solve the rest of your import errors by thinking carefully about the order of operations in the import system and making sure you've satisfied the requirements at every step. Number three, I think this serves as an interesting example of the power of computational complexity in action. As you know, an algorithm is just an abstract method. It's not tied to any particular implementation. What you've just seen here is quite possibly the worst ever non-fortran-based implementation of merge sort. But it turns out that even a horrible implementation of a good algorithm can beat a perfect implementation of a bad algorithm. To prove that, I coded up a bubble sort, which is a sorting algorithm that runs an n squared time opposed to merge sorts n log n time, compared the performance of these sorts on lists of different lengths. It turns out the list only has to get up to about 5,000 elements before import sort is actually faster. That's pretty nuts. All right, finally, that warning that I promised you. If you walk away with nothing else in this talk, I hope you walk away with an appreciation that importing in Python is dangerous. It's probably the most dangerous thing that you do on a regular basis in Python, and here's why. It came out recently that somebody's been uploading fake packages to the Python package index. These packages are very similar to normal packages with a couple of important differences. One, the names are slight misspellings of real names. So if you try to install URLlib3, and forget how many L's are in URLlib3, you might get the fake version. The fake version is very similar to the real version and has almost the same code. In this case, the difference is that the setup.py does some nasty stuff. So maybe you could just look at the setup.py for the things you're installing, but that's actually not gonna help you all the way because as soon as you import this code, Python executes all of the top-level code, including all the code that that code imports. And if somewhere anywhere in that chain of execution, there's a couple lines that say, install a cryptocurrency miner and then modify the source code to remove the line that did that, you'll be completely compromised and you might never even figure out what happened to you. So when you import in Python, you should be afraid, you should be very afraid. You should treat importing with the full respect that arbitrary code execution deserves. So that's all I have for you today. I hope this will inspire you to have some bad ideas of your own and to learn from them. And tweet them at me at my Twitter handle at rogleader. If you'd like to see the full implementation of this code, it's on GitHub and on my blog at this link. Thank you very much. All right. Thanks George very much for some.