 So, I'm Reuven. This talk is, what happens when you import a module? And a few words about me. So I teach Python. Basically every day I'm in a different city, different country, working with a different company, helping them to improve their Python skills. People who don't know Python, I teach them. People who do, I help to improve them if you use Python. And I'm guessing if you're here, you do. Talk to me if you want to improve your company's Python. I also have books. I have online courses. I have a free weekly Better Developers newsletter. But let's get to the heart of the talk. One of the most important rules in programming is dry. Don't repeat yourself. I said, don't repeat yourself. And this is a rule that I learned from the book, the Pragmatic Programmer. And we actually apply this rule an awful lot, even if we don't think about it that much. So basically, if I've repeated code one line after the other, then I can dry it up by using a loop. If I've repeated code in several different places, then I can use a function. And if I have the same code in multiple programs, then I can use a library. So this is standard programming practice in just about every language. In fact, it should be in every language you use. And if you ever have the temptation to say, oh, come on, I'll just copy and paste that code, what are the chances that I'll have to edit in all of those places? I will tell you, the chances are 100%. Because you will have to change it in all of those places, and you might as well just fix the problem at the beginning. Well, in Python, our libraries are called modules. And they're usually, not always, but usually, files that contain Python code. And as an added bonus, they also provide us with namespaces. And the whole idea of namespaces is that if I'm working on something, and you're working on something, and we want to combine forces, we want to make sure that the variables that I have defined and the variables that you have defined don't clash with one another. And so namespaces allow us to collaborate without having to worry about that sort of namespace collision. So let's import a module. Let's do import random. So people use the import all the time, right? Like, all the time. But what does import really do? Actually, it does two things. It's important to keep track of those, and we're going to talk about both of those here. First of all, import creates a new module object, right? Now everything in Python is an object. We love saying that. But here we're creating a new module object. The second thing that import does is that it assigns that module object to a variable. Okay. Nice. And by the way, this is kind of similar to DEF. When you define a function with DEF, you're doing two separate things. You are creating a function object, and then you're assigning it to a variable. Okay. So some things to notice here about import. First of all, import is a statement. It's not a function, right? So we don't use parentheses. People often have this temptation to use parentheses. Don't. Why? Because it won't work. There's a simple reason, right? But in addition to that, when we give a name to import, we are giving the name of the variable we want to define. We are not giving a string. We are not giving a file name. We're not giving a file object. We are giving the name of the variable that we want to create. Now in many other languages, you do say what file you want to create. You say go load such and such. But in Python, we can't do that. At least we can't do that directly. And we're going to talk in just a few moments about how we can sort of have some effect on that, how we can push things in that direction. But it's a very common mistake that I see people make in my courses where they say import and then the name of the file, especially in quotes, don't do that because it won't work. Well, let's think about modules as objects, right? So if I say here import random as I did before, and then I say to Python, what is the type of random? What kind of object is it? Well, it's a module object. And right, we have module objects just as we have strings and lists and dicks and functions for that matter. And module objects are actually really, really simple. Of all the objects in Python, they don't do that much. They're basically containers. They're containers for other things. They're namespaces. And we use attributes in order to set the names that are going to be there. They don't have any methods. So basically, if I have a module object, all I'm using it for is to keep track of the names that are defined in there. Well, that's very nice. But what if I want to import a module and not use the same name as the module name? And right now, I'm sort of papering over this whole how does it find things. We're going to get to that in just a moment. So if I want to import random, but I don't want to have a variable name random, what do I do? Well, I can use import as. And I say import random as are. And the only thing that this changes is the variable that's created. The rest of it still happens. So remember, two things happen when we use import. We create a module object, and we assign to a variable here. We're just assigning it to a different variable. OK, so far, so good, and perhaps even a little obvious to people who've been working in Python for a while. Well, what happens, though, if module A imports B and then module B imports A. Right? Now, I know this might come as a surprise to some of you, but there are some programming languages out there that if you do this with them and aren't careful, you could enter into an infinite loop. Luckily, those languages are very esoteric, and no one ever uses them. So, but it's a point to bring up as a theoretical construct. I'm talking, of course, about a language not to be horrible toward it, but it's somewhere between B and D. Anyway, we don't want to have an infinite loop in Python, and so what do we do? Well, Python needs to keep track of what have we loaded and make sure that it doesn't load at a second time. In other words, when we load A, remember, when we import A, it could load it or not, but it'll assign. It could load B or not, and it'll assign. Python only imports a module once per session, though it keeps track of things. It will always define the variable, but it will only load it once. Well, that's very nice, but how does Python know? If I say import A in every single one of the files in my program, how does Python know? Who told it? And the answer is sys.modules. Now, sys.modules is sort of like the virtual machine environment for Python. It's where Python keeps track of all sorts of stuff. And this is actually a dict, it's a dictionary, in which the keys are the strings, the module's names, and the values are the module objects themselves. By the way, if anyone ever wonders, can we actually store anything we want in a Python dictionary, the answer is yes. And this is yet more proof of that. We can store module objects in a dictionary as the values, and the keys here again are strings. So when we import, Python says, wait, do we already have this module in our dictionary? If so, then we don't need to load it again, we can just make the assignment. So we're always gonna make that assignment, but we're only sometimes going to load it up, and how can we check for that? We check to see membership as a key in the dictionary. Well, how many modules are loaded by default when we load up Python? Well, so I decide to use three different interactive shells for Python. One of them was the standard interactive shell, just Python three at the command line. Import sys, len of sys.modules, 79. So when you start up the simplest possible Python shell, and again, it's an interactive shell, so it's not exactly the same, 79 modules, okay. But what if you're a fancy-schmancy programmer, and you don't use just the regular Python interpreter shell, you use iPython, because you have lots of Apple products. So if I load up iPython, and I say, what's the len of sys.module? 646 modules, yeah, but what if you're even fancier than that, and you use this newfangled web thing, including Jupyter? Well then, if we use Jupyter, it's over 1,000 modules that are loaded into sys.modules. Again, this doesn't necessarily mean that you're gonna be using all of them, but it means that they were loaded by Jupyter, by iPython, by the Python Interactive Shell, so that you'll have all the things available. By the way, this is still a small fraction of what's in the Python standard library. And so just because something's in the standard library does not mean that you don't have to import it, because everything that's imported takes up memory, takes time, and so on and so forth. Okay, so this is all very nice and everything, but what's really going on behind the scenes here? Well, let me first implement what we've now talked about, what I just described to you in some pseudocode. It's really, really close to Python, but not exactly as you'll see. Don't really run this code. I mean, you can run it if you want, but it's not gonna work, so I'm gonna save you some time here. But what you should instead use is the importlib module, which does come with the standard library, and that's the way that Python sort of exposes how importing works. And what I'm gonna be showing you through the rest of this talk is what happens there. So for example, my import, so that's like the function that does sort of like import. What's it gonna do? It's gonna get a module name as a string. It says, hey, if this module's not in sys.modules, then we're gonna load it. I'm gonna run this function called getModule, which is made up. And then whatever I get back from getModule, I'm gonna stick that in my dictionary, and then under all circumstances, no matter what, I'm going to assign sys.modules of mod, meaning the module object to globals of mod. Wait, wait, wait, wait, wait. Whoops, oh no. I didn't mean that when I said wait. All right, so basically globals of mod, what the heck is that? Well actually, globals is a built-in function that exposes a dictionary of your global variables. You can actually assign to global variables this way. Don't, but you can if you want to, or if you wanna enrage all of your colleagues. And so the idea is this way we can get the module object and then we can assign to, oh. And then we can assign to it. And this getModule we're gonna talk about in just a moment. What about import as where we're importing and then we're assigning to a different name? Well we can do that too. I can say my import of mod and then get an alias there. Once again, once again we're gonna check is our module imported? If so, we don't do anything. If not, then we're gonna load it up. And then we say if alias is none, meaning if I did not get an alias, then we'll just set the alias to be the module name. But otherwise we'll assign it to be the alias. Well let's consider another example. And this is something I alluded to earlier. What if my top level program says import A and then module A says import B and module B says import A, what happens? Well actually, whoop, right? Our module, program says import A. A is not in sys.module so it's loaded. And then the global variable A is assigned, fine. Next though, module A imports B and B is not in sys.module so B is loaded and the global variable B is assigned. And then finally B loads A, well what happens? Well A is in sys.module so we ignore it but we do still assign A to be sys.module A. Meaning you're always going to assign that variable. Now when I talk about global variables here in other languages, global means it's always available. It's available universally. That's not exactly true in Python. In Python what we call a global is really global just within one module, just within one file basically. Right, or just within one namespace. So we actually have many different global variables in Python with each namespace having its own. So module A can have a totally different global X than module B which seems a little weird. Now you can always keep track of what namespace you're currently in by looking at dunder name. Dunder name is a variable, always available and it says what namespace are you in and thus what globals are available there. By the way, and you might know this, the first namespace, the namespace that belongs to the first file that we load up is dunder name. So let's rephrase things just quickly then. So A is not in sys.modules so we assign A and dunder name to be sys.modules A and module B imports B and then we assign to A.B and then we do in module B we import A we've assigned to B.A. That's why this works the same way. But wait a second, you might have also seen something like this from random import randint and you might have thought, great, I'm only loading up that one little thing. So does this save memory? No. Does this load the entire module? Yes. Why? Because I told you so. No, why? Because when you say from random import randint what is it doing? It is actually checking to see and this is gonna sound familiar, checking to see if random is in sys.modules. If it is then we don't do anything but if it's not we load the entire module into Python, into that dictionary and then it defines randint as the global rather than random. And random is not defined as a global. So you're simply switching what global variable you're defining and what it's referring to. And we can actually do this in a slightly different way. Here's another version of my pseudocode. So I'm here going to take one or more names in splat args and we're gonna go through if the module's not loaded we're gonna load it and then for one name and names one by one by one we're going to assign them. And here's a fancy way with get adder to assign to attribute. So I'm gonna go into that module object and assign one or more variables based on what was in the dictionary there. So basically, and by the way it's super, super common for people to think that they're saving memory if they use from import. So now you don't think that. What if I want an alias? Well, once again, we can do that sort of thing. From random import randint as our i. What's happening there? We're simply changing the variable that we're defining. Right, so now my from import I hear I take double splat kw args and I'm just gonna go through it one name at a time and assign to this instead of to that but it's the same mechanism. There's one last version of this that you might be familiar with and that's from import star, right? I can say from import star and this imports all of the names from random as globals into your current namespace. Well, not all of them necessarily. Some of them can not be loaded if they're not in Dunderall if that's defined in the module but I have some very simple advice for you about this. Never do it. Please, I will track you down and check. It is a really, really bad idea. It's like saying, you know, the world would be a much simpler place without surnames. Let's get rid of them. Yes, simpler in some ways. Yes, more confusing in others. Don't do it, please. Let's go back to my pseudocode though. I keep saying get module, what does it do? Well, let's think about it. Get module is my pseudocode function find. It takes a module name as input. It then does two things. It finds the file for a module and then it loads that module based on that file name. And it returns a new module object. We know that because in the pseudocode I kept taking the output from get module and assigning it to a variable. Well, here's the thing. Get module, maybe the name doesn't exist but the functionality does. And it's called importlib.import module. Remember I mentioned importlib exposes how import does things. And so I can just say r equals importlib of import module random. And sure enough, I have defined it. It's exactly as if I had said import random or import random as r. So you can play with this if you want. You probably shouldn't, but you can. But let's now get to something sort of more interesting and deeper. How does Python even find these modules? And here's the simple story I told myself for a long time and that I still tell people in my intro classes. There's this variable called sys.path. It's a list. It's a list of strings where Python will look. And it's defined when Python first starts up and the first match wins. Meaning the first directory in which we find the module we're looking for, that's the file that will be loaded. Now namespace collisions can lead to surprises. So one of the common exercises I give on the first day of class is I have people run this short game where they have to like choose a random number or guess a random number. Sort of like the lottery but with higher chances of winning and zero actual prize. So it's like a bouncing act there. In any event, inevitably someone calls that file random.py. And in random.py, they say import random and it loads itself and then cannot find rendint and bad news all around. Because here's what sys.path looks like. At least on my machine. Basically it's first looking through the Python standard library and then looking at site packages where pip install puts things. And by the way, there is an empty string there at the beginning. And that is the current directory. It's always gonna be the current directory. sys.path zero is the programs directory. It's not necessarily where we're running it from but it's where the program is located. And that way, a program can always say import to something in its current directory. And so after that, as I said, it's the Python standard library broken down into a few different things because maybe you'll wanna load something from there and it's also in a zip file and there's some frozen things. We'll get to that in just a moment. But they're always available. The standard library is always available. And then finally, as I said, we've got site packages. You pip install something. Some of us might have done that on occasion and it goes there. What if you wanna modify sys.path? Well, here's the wrong way to do it. Don't append to that list from within your program. It will work. As I like to say, unfortunately it works. But then you've got this program with a funny sort of sys.path and you've gotta change it. You don't wanna do that. Instead what you should do is set the environment variable Python path outside of Python. And when Python loads up, it will add those directories to sys.path. So for example, if I say here export, this is at the shell, not in Python, export Python path equal slash hello colon slash goodbye. Now if I look at my sys.path, you can see that we have the current directory or the programs directory. We have Python path and then we have the standard library and then we have the pip install directory. So we'll definitely look in that order. By the way, slash hello and slash goodbye, you might not be surprised to hear, do not actually exist on my computer, but they're made up, Python doesn't care, it'll look, and if it does not find that directory, it'll go on to the next one. So what's the pseudocode version of that? It gets a little more complicated, that's true. First of all, we're still gonna check first if mod not in sys.modules, but now we're no longer just like running this get module thing. Now we actually have to look, ha, work. So we're gonna go through each directory in sys.path. We're gonna say, hey, can we find this file with a .py at the end? And if so, then we're gonna use a function that I'm calling load module. Load module, again, not a real function, but we'll see about in a moment. If we go through all of the different directories in sys.path and don't find it, you know what we're gonna do? We're gonna raise module not found error. And again, we'll talk about load module in just a moment. So there's some problems with this story. sys.path doesn't only contain directories and modules are in different places. And there's also this weird thing where if time.py, if I create a new file called time.py, it doesn't get priority, as I described random does. So what gives? And the answer is we have these things called finders and loaders. Finder say, does the module exist? And if so, let's get what's called a module spec. And then loaders are the things that actually take that module spec and load it into Python, giving us a module object. And you can have something called an importer which does both of these together. All right, but wait, where are these finders defined? Oh, that's right, another variable in sys. sys.metapath, right? And so what happens then is Python goes through each of the elements in sys.metapath and it says, are you the right finder? And are you the right finder? Are you the right finder for this type of file? And each finder knows what it's defined on. So if I wanna look for random, it's gonna go through one at a time. We're gonna import sys, we're gonna go through metapath, et cetera, et cetera. And let's see what happens when I run this. Well, it did not find it in this meta finder, did not find it in built-ins, it did not find it in frozen, but it did find it in a path, meaning we load up the file. First one to find the file wins. But if I do it with time, it's a little different. It's not in the meta finder, it is in the built-in finder. Meaning the time is treated differently from random. And that's because time is in sys.builtin module names. These are modules that are specifically defined not to be loaded from the file system, but rather to be loaded from the special pre-compiled stuff that's available. Sys, by the way, is here. When you import sys, no file is harmed. Let's just run through this pretty quickly because I wanna get through a few things. So when we create the module object, actually we go to this spec that was returned by the finder and we run a method called createModule. This is where the module object is created and every single spec knows how to create modules from different kinds of files. And that's great. We create the module and it's now empty. So wait a second, we've created a module object but it doesn't have anything. We're gonna need to load the code from the module. And here again, people have a lot of misconceptions. Many people believe that, oh, we load things in Python. We do. You know how we load them? Executing things. That's right, when we load a module, the entire module runs top to bottom, every line of code is executed. If you put in a long for loop in your module, don't do that, then it will execute it. So it has def, if it has class, anything executes. How? Exec. One of those functions you're told never to use, by the way, don't use it. But, but exec takes a string and runs it as a short Python program. So I say s equals print hello, name is Reuven, print goodbye, and if I run this now with exec, if prints hello, prints goodbye, and what do you know? It assigned to the name variable Reuven. Yeah, that's not gonna help us though. Because what I wanna do is exec this file, all the code in the file, and then pop all those values into its attributes. Well, guess what? We can pass a second argument to exec, and that's a dictionary which is then populated. So we're not actually gonna set global variables, we're gonna set name value pairs in the dictionary. Yeah, but that's not how a module works. Ah, but it is. Because module objects have a DunderDict attribute. And any key in this, any key in this dict is an attribute in the module. So what's gonna happen? We load up the file, we execute, we get the source code as a string. We execute it into the DunderDict of our module object, which was created by our spec, which we got from our finder. And then voila, all of the global assignments in the module are now attributes. And we can even do this with the function in import lib, or in the load are called exec module. We go to the module object and we say exec into you, and finally everything is done. In other words, we can break apart what happens with importing and replace each part of it with a function that we can now understand better, and we could even almost sort of kind of write ourselves. Phew. By the way, all of this is customizable. You can do custom finders, you can do custom pathfinders, you can do custom loaders. Why would you want to? Most people shouldn't, but there are occasionally times and reasons to do it. For example, I met someone a few weeks ago who has a custom loader that checks that hash of the file to make sure that it's loading the correct thing and that's cryptographically safe. So to summarize, what happens when you import? A lot. Well, that's a quick summary. Let's go into a little more detail and then I know my time is up. Python checks to see if the module has been cached, if we've already loaded it. If so, then it returns the module. If not, then it goes through all the finders in sys.metapath. If none finds the module, then you get the exception. But if one of them is appropriate, it returns a module spec object. And if that works, then we get a new empty module object. And if that works then, it uses the loader to exec the source code into the module's dunderdict attribute, which makes all the globals in the file, in the module file, into attributes on the module object. So that's the complete picture. Oh, wait, no, we didn't talk about packages. Yeah, that's a whole, that'll be a talk for next year. But basically, packages are basically module objects just with the itsy-bitsy little issue of lots of directories and heaven knows how they're all resolved. We'll get to that another time. But if we're talking about simple modules, you can see, I hope, how this very simple API, this very simple interface of importing, actually hides a ton of complexity under the behind the scenes. All right, I think we've got about two, three minutes, maybe, for questions. Thanks very much. And I'd love to hear questions. Thank you. Folks, if you want to use the mic there for questions. Thank you for your talk. I'm aware that there's a pep about lazy imports or lazily importing things. How does that affect this? I'll answer when I need to. No, no, no. Actually, the way that lazy imports works, the idea is that from what I understand, from what I've read and heard, that it'll basically say, OK, we're not actually going to exec the file now. We're going to have this stub that's a module object that when we access any of its attributes, then and only then it'll go through the actual loading process. So you will, if you use all the modules that you load, you'll still need to have that exec happen at some point. It'll sort of spread it across your programs running as opposed to doing it all at the beginning, which happens now. Well, thanks for your talk. My question could be a bit specific, but actually we have a distributor control system using very old machines with only 1 gigabyte of RAM. And one of headaches is when upgrading to Python 3, for example, each of the processes we run is using plenty of memory compared to our old Python 2.6 software. I was guessing, can I have my custom module loader that uses a shared memory in multiprocess to save memory use by each of the processes, for example? I have no idea. I'm guessing the answer is yes with a lot of very hard work and potential incompatibility with lots of other Python stuff. It might be needed in your specific case, but it sounds like it'll be really, really hard. Oh, OK. So incompatibility you mean un-pycable objects in shared memory? Like, once you've got things in shared memory, then they are, by definition, shared. And then heaven help you if they're, I mean, if you're only reading from then, that makes it less complicated, but not uncomplicated. OK, thanks. So we are running out of time. Maybe Ruben, you could take the questions on a one-on basis if you have time after this? Yeah, I'm super, super happy to answer any questions people have afterwards. Sorry. We'll meet right outside here. OK, thank you. All right, thanks very much, folks.