 Right thank you very much. I hope you enjoyed coffee and snacks upstairs flapjacks were very nice To get started again. We have another senior engineer from our python infrastructure team Laszlo Laszlo also contributes to various Pi PA projects including PIP and audit will I think you said Yes, he'll be talking to us a little bit about PIP and Pac-Man today. Thank you Thank you Is it working? Yeah, it's working So good morning everyone. Yes, I'm Laszlo working the Python infrastructure team in Bloomberg and I'm here today to talk to you about or to tell you a tale of package managers Let's start by saying I'm making the statement that managing Python packages is actually simple Many people will say I guess mostly seasoned Python users that this is not this is actually true and Why even give a talk about it? My experience is that especially novice users new new people to Python Struggle to work with Python packages. They can find issues when they encounter permission denied issues or Incomputable dependencies or they just struggle to figure out. What is the right tool to in install packages in Python? so in this talk today, I'm going to try to go to a few examples and Get to a few best practices to see how can we make managing Python actually simple because I do think it is actually simple So let's start off by actually looking at what our package managers They are tools to install Search remove and just you know do general operations with Python packages when we're talking about Python packaging We Definitely probably all know PIP which is the main by far most popular package manager in Python Does anyone know what PIP stands for? No It stands for PIP installs packages Right. It's pretty cool. So PIP is an open source project It's a very robust and used by a lot of people. It does all of those things. I mentioned earlier Plus a lot of other things. It's a very advanced tool By default it will install packages from PIP I at the cheese shop So this makes it possible for us to get access to the all those amazing open source projects on PIP I but you can also use it to install local packages or use private indices and a lot more On the other side, we have package managers from the Unix like or like the operating system package managers Which are like apt-yum pac-man and and many others These are much more general purpose package managers. They usually take the upstream source packages Potentially apply patches to them run a test suite if there are extensions which need to be compiled they will compile them and package them up for the appropriate distribution this means that these are generally These work better on any given distribution because they are well tested for that particle distribution But it also means that it takes more work to get them to work Which means that the selection is generally smaller much smaller than what you would be able to get on PIP I and The updates are generally slower So there is a trade-off there So the question is and actually before I before I get to the next one I just want to quickly mention that this is a this is a short talk and I'm completely going to just avoid windows and Conda and I'm very sorry about that. It's a very very important area, but I just don't have time to talk about it today If you do have questions, I We can talk maybe after the after the presentation So yes, I'm sorry, but I'm going to focus on Unix like operating systems and mostly next So which one should we use the big question and the question is of or the answer is of course it depends and It depends on many factors, but I think it makes sense to look at things like How or where do you want to install these packages and it's it's useful to separate installations into categories like system-wide package installations what these are are basically When we talk about the system-wide package installation we consider one global Package installation location, which is shared across that machine by all users for that particular packet Python interpreter Let's see how this works so We Google we are a brand-new user to Python and we we just want to try some Fancy package we found on the internet and we we want to try it out We Google how do I install Python packages and then we get the answer on stack or floor somewhere Oh, we just keep install package name. All right, so let's do that Pipp install Pi M. Oh, let's let's see we install pi M. Oh, and we got common at home. Pipp is actually not installed. Okay. That's interesting So what do we do with that? So Pipp is actually part of the CPython distribution, so if you download it from Python org You should get it to the installer, but on many Linux distributions. It's actually split into a separate package So you do have to install it separately in this case. We actually use Pacman the arch Linux package manager to install Python dash Pipp to get Pipp installed Depending on your distribution, you'll have to see how exactly do this But you can already see that you already need the operating system package manager to install the Python package to get this to work Okay, so we have that next thing and before we move on I want to give you a tip Which is that don't do that do this instead If you run Pipp install that's going to invoke Pipp with whatever version that Pipp was installed for whatever version of Python was installed for And unfortunately still today There are many people using distributions where Pipp or Python is on two seven, which means that now you're installing Pipp Calling Pipp to Python to seven and you're installing packages to Python to seven side packages You might not actually want that and it can be very confusing when you are not able to import packages with Python 3.7. So Yeah, just explicitly specify the interpreter and use minus and Pipp It's going to save you headaches Okay, so now we all know all that and we actually start to install package Python 3.7 and minus and Pipp install py.mo and what we get is a permission deniter right here Why is this happening? So what happens here is that Pipp is trying to install into user live Python 3.7 side packages Which is the global site on this machine. This is like I said shared by all users on the machine. So You won't be able to install as any regular user into that only route general or administrative users can do that Okay, so what do we do in this case we run the same thing with sudo, right? What else? Let's do that and Yeah, no errors it worked Excellent, right everything works. What can go wrong? We have a problem though So now let's say a few days later We find this cool project called beats which we don't even realize is written in Python and we install it on our favorite Linux distribution with Pac-man and in terms of it is indeed written in Python and It depends on this YAML Python package, which we've just installed with Pipp using sudo and now The package manager says error fail to commit transaction Conflicting files when it's trying to install files into this location, which we've just installed files to with Pipp So yes, there is an inconsistent between the package manager's internal view of what should be installed And what is actually installed because we have basically messed up that global site So this is a bad situation to be in and it's actually it can be quite tricky to clean it up so Sudo Pipp is not a good idea It's going to cause problems Now there's another thing here, which is important to mention As you can set up the PI as root is a bad idea So the PI is by far the most popular way of implementing the build and installation steps in Python projects This is changing today Or nowadays, but still this is by far the most popular way to install Python packages so essentially when you do Pipp install with sudo you're executing a Python script with root privileges and Anything can be in that set of the pre-line script. So for example, I wrote a nice example here Let's just import shutl and rm3 everything on root and just for good measure ignore the errors to make sure it works And you might think this is this is silly no one's going to do this, right? But what if I name this URL lib and then I upload it to PI PI and all the people who think that the popular URL lib library Which is actually sped URL lib 3 is the one they're installing and they missed that three with sudo They run this with root, which is really bad, right? and actually This is called typosquarting and in 2017. There was a similar incident on the PI PI or PI PI where some unknown person impersonated several packages and the downloaded set of the PI Scripts were just collecting information from the machine. So nothing malicious happened But any can anything could have been done in those scripts So actually the PSF is actually funding projects on the PI PI to make this more difficult or just to improve the security But it is still a really bad idea to run pit with sudo Okay, so what have we learned about system-wide package installations Well, I think the biggest takeaway what I can say here is that just use the OS package manager Just just stick to that because that's going to work best If the package is available obviously because the package selection is much smaller You should just do this you'll be fine So apt backman or young or whatever your favorite links distribution uses What happens though? If the package is not available Whatever you want to install is just not available in the package repository. Oops It's falling apart Yeah, so in that case We can use non-system wine installations These are ways of installations where we restrict the installation into Maybe a user's home directory or even to just a separate directory and we don't interfere with any other package installations other users on the machine Or anything else. So the the simplest way of doing this is with the user User site installation. So this is something Python has supported since 2.6. So it's really widely available and We can use it with pep install minus minus user if you do this Let's try it and we let's install this not so well known but a very cool project called black and What you see is that everything works fine. It just installs it. There is a warning here Which we're going to ignore here, but I'll come back to that The installation just works fine And then let's say let's see what is in our system path. Just print that and you see that there is this Directory on since the path which is in dot local under my home. This is the user side And if anything is in the dark if that directory exists Then see Python will put it on the path and it will look for files there when you're trying to import something So this is what the minus manager Argumentals and pep so now if we try to import black and Print where the file is from you can see that it is indeed installed into dot local in my home into the Python 3.7 site Over there. So it's great. Everything works and it's really simple. You don't have to use pseudo or anything now If we want to if you want to run the black script though, that's not going to work unless We put this dot local bin Directory on the path and this is what the warning was referring to from pep because it was smart enough to detect that we don't have that director on The path and it can't really do much other than telling us that you need to do this. It's a one-time setup That's really easy to do And after that everything just works as expected So this is cool there is a bot and that is that Basically all applications and libraries installed with minus minus user will share the same location And this means that if you have application a which depends on some library With one particular version and then you have another application which depends on the same library What with a different version? That's not going to work because you can only have one version of that library available in the user site To overcome this we can use virtual environments so virtual environments take this up step further basically they They contain an entire virtual Python installation in a single directory They do some it does some clever tricks to make it seem like it's an actual full-blown Python installation It is not really it's it's kind of it's really lightweight But it works really well It is actually built into Python since They release 3.3 so you can just use minus mvm. It will work You have to install anything if you if you happen to use some older version a legacy Python for whatever reason You can use the awesome virtual mvm package just install it from PyPI Using either your OS package manager or minus minus user as we've just learned and then you can do Python minus some virtual mvm Okay, so how do we actually use this thing? It takes two steps to get to a virtual environment first of all We just run button three seven minus three m minus mvm And then we the name of the environment and this is the name of the director which will be created where the environment will live It will be created in the current directory And it's all done and then we have to activate it the activation is basically just Sourcing this activate script from the environment itself and this activate scripts does a few things for example Updates or prompt to show us The name of the environment that we are inside of your show environment It will also update a few environment variables to make sure that everything is contained within that environment and everything works as Intended and it will save the previous values of those environments environment values Okay, now let's try to use this If now we try to install a package we can just go ahead and pip install and in this case This is perfectly fine. You don't need to write out explicitly the interpreter version because There's only one interpreter in this case self-contained in the environment and Yeah, everything just works as expected no warnings. It's installed and Inside this virtual amp if we again try to import black and then look at the file We say that it is indeed in the my vm directory and it leaves there self-contained If we try to run it it also works Nothing, we didn't have to change anything because the activation script updated our path. So it's already available to use If we want to exit the virtual environment We have to call this special deactivate function Which is also something the activation script made available and then everything goes back to normal it restores all the environment variables and Total amnesia nothing works anymore black is not there like nothing happened the director Obviously still there so you can just go back again and activate it and use it. We don't actually have to activate it We keep what we can do is if we if we just write out the full path and Then to a script in the main directory and try to execute it. It will work perfectly fine And that is because virtual and rewrite the she banks. So the first line of the script And it will put the Python interpreter inside the VM onto the onto the she bang line So this will actually work perfectly fine. You will be able to Potentially put this directory on your path and just use the tools from over there You only need to activate it if you want to make modifications You want to install scripts or you want to do development and you really want to import things for example from the virtual Okay, so now with all that what have we learned? Let's recap Do not install packages pip pip system-wide besides the fact that it's probably not even possible It's going to cause issues especially if we do it as root and That is also going to expose us to potential security issues. So just avoid this Instead use apt-back-manium or your favorite package manager to install Python packages system-wide if you can But it's better to prefer non-system-wide installations User mode installations if you want to keep it simple and you just want to make a tool available to your user Or if you have a more complicated situation or you just want to try out something quickly just use virtual environments and That is all I wanted to say. Okay, we have some time for questions if anybody has a question Thank you. What's your experience with? maintaining package versions in your environment Like think package dot log and also what's your experience with conda? Sorry maintaining package versions in the environment. Yeah, so let's say you created your environments by people installing black one month ago And then you recreated this environment on a different machine. You get slightly different versions You have difference in minor versions, but you have different behavior between two well, I guess it depends so Me I personally for development purposes something like black I would just install it in user mode on whatever machine I'm working on and use or the version available there if I actually need a Specific version for a specific project for development. I will probably make that Make that known in the requirements in the project itself and then make it possible to install it into the virtual environment or Through talks or some other tool to actually make it part of the development process and and require an explicit version of that So I would say it really depends on on the situation I do often like I have black or Python all of these tools installed in on the user side because I use them all the time And I might just have a a simple project where I just want to run pitas through and it's available But you can put it in your virtual environment requiring explicit version. My experience and conda is very little. I don't use I don't use it. I know Mostly what it is and the Anaconda distribution. I think for Windows. It's it's amazing It's probably the best way to get started with Python and Python packages I think there's actually a lot of collaboration happening between Python packaging like the Python packaging authority and the conda Or the Anaconda corporation to get ideas from each other and just collaborate It seems to me that pip is actually getting more and more features from conda and the other way around I think it's really good. The one big drawback with conda is that you have a much smaller selection of packages than what's available in pipi Which I think is probably fine if you are doing machine learning or Data science because that's really widely covered on Anaconda, but it might not be fine for more niche Any other questions? Hello, I had more of a question about package managers in general It feels like we have a package manager for every day of the week And there's always a new one on the horizon Are there any plans in the open source community to try and kind of Merge package managers into one universal tool for programming languages? What do you mean by package managers because I think pip is still the de facto package manager for python if you mean more like Additional tools like I don't know. You might be Thinking of pip and things like that I meant more package managers made for programming languages. So npm kind of yarn Oh, I see like just we have so many of them. Are there any plans just to try and kind of put them on to one? I think that's That would be very difficult because all of these languages have Their own specific quirks and implementation details, which the package managers need to keep in mind So creating one I think would be a very challenging task What what that intersection today is I think is the operating systems package managers Which are general purpose package managers and you can always use them to install Like npm or python packages if they're available in the package selection But I think you're encouraged to just stick to the Like like you saw in this presentation if you can just avoid that and use more self-contained localized installations And just use your own programming languages Package manager. Thank you. I think we've got time for one more question Is there any command in which we install all the dependency? Let's say some package one it's dependent on escalate. So as people install that package one escalate is installed with this Right, you mean native the native dependency of that python package Yeah, so that's actually a problem the pi pi is looking into right now The fact that you cannot track or even express the native dependencies in a python package is It is a problem. You basically just have to know That you have to install it and what ends up happening is that you get an error while you're installing the package that X is not available and you have to google it and then someone on stack overflow says that oh You have to install sqlite or something which is not not a great experience I have to say that there is no solution to this at the moment. Um, it is being worked looked at And the idea is that and this is actually an example where I think the pi ps looking For collaboration with the konda or the anaconda group because they do have something similar and and to make this work in a way that That the pip can somehow Or python packages can express external dependencies which pip can somehow make available to External package managers like the operating systems package manager, but This is very very preliminary and I think there are other people who are Actually, I see one in the audience who's probably much better suited to talk about this Yeah, paul. Yeah Maybe maybe yeah, you can catch him after the talk and he can give you a better answer Yeah, not today Cool, we have to draw it so close there. Thank you very much lazo. Yeah, thank you very interesting