 In this video, we're going to learn how we can install Python packages that do not come with the Python standard library. So first of all, let's briefly review that where do we find the standard library? Well, we said in the last video that in the Python docs, on the main page, there is a link called library reference. And this basically is a very long list of all kinds of additional functionality we can import into our current Python program into our Jupyter notebook. And usually, the standard library is the first source where we should find, where we search for some functionality which we think should already exist because we think that what we're working with should be a problem that some other people already solved for us. However, the Python standard library is only basically updated, let's say, maybe once a year or so. Later there is a new Python version. And that has to do with the fact because Python at the moment is on a roughly one and a half year schedule. So every year or one and a half years, there will be a new Python version. And as we said before, the standard library belongs to that, so it will always be updated with it. And if basically new functionality comes up and people want to add it to the standard library, then only every year or every one and a half years, the new functionality may be available to us. And then also we would have to use the latest Python version sometimes that is also not possible. So there's a different way of how we can include code other people wrote. And the standard way would be to go to the so-called PyPI, which stands for Python Package Index and the URL is PyPI.org. And PyPI is basically a server where anyone can upload Python code and then this code can be installed or downloaded first and then installed locally by anyone in the world. So that already sounds like a big security hole here. And that is true. So whenever you download something from PyPI and install it on your local machine, you should kind of make sure that whatever source you are pulling or whatever source the code comes from is trustworthy. So there are a couple of resources. Also, we should mention that a code that is in the Python standard library is very high, it's very well tested. So we not only know it is there, but we also know that the functionality in the standard library most likely is correct. For code on PyPI that anyone can upload to, maybe we download code that looks nice, but at the end of the day it's kind of buggy. And also in terms of support, maybe most software, once it's published, nobody knows that there is a bug in it because otherwise people wouldn't publish it. And let's say for software that is not so mature yet, there may be some bugs in it that need to be fixed. And it would be very weird if only every one and a half years when there's a new Python version, we could get a fix for some problem for some module in the standard library. So in other words, that is a reason why in the standard library you will only find modules that have quite a mature code base where it is highly unlikely that there are any bugs. There are bugs in it, but not that many. And usually it's minor bugs, nothing very important. However, for younger software projects that are put on PyPI, it is often the case that there are many bug fixes and so on. So in other words, what we learned from that is that the Python standard library is more like a boring repo. So where we find a code that doesn't do anything exciting, but it is very well tested, very well understood. And PyPI is kind of where we find the latest and greatest software in the Python world. So let's go ahead and search for one of the most popular projects on PyPI, which is NumPy. Some people pronounce it NumPy. I pronounce it NumPy. So NumPy, as we see, is the fundamental package for array computing in Python. And as a data scientist, you have to learn about NumPy. You have to really get good at NumPy. That's very important that you do. So let's click on here and we get to go to the download page. But usually this is not where we download the code. So we will download the code in a different way, which we will see quite soon. But then what else do we see here? We see there is a couple of links. So for example, the homepage link, let's click here that opens NumPy.org. It's already good if a software library has its own homepage that is maintained, that has documentation on it, then that is already an indication that we can most likely use the software. Also, let's say if we go to the source code, we usually get a link to GitHub or sometimes even GitLab or some other code sharing service. And then we will see the entire source code. So let's say if I go here into the NumPy folder, that is the entire source code of the NumPy library. Okay, so whenever I want to look up how something is implemented, I could actually do it. Usually you don't do that. But the fact that something is open source and we can look at how something is programmed is quite helpful because that basically enables anyone to learn about the code, how something is done, how a good library is designed. And also if we find a bug, we can actually fix it on our own because we have access to the source code and we can actually contribute it back to this project. And then our fix will be spread to all the people that use NumPy in this case. So these are advantages of open source. And that is also why in the Python community, open source is a very nice thing to do. Then also here is a bug tracker. So let's say in the rare circumstance where we find a bug in NumPy, we could go to the bug tracker on GitHub and could add a ticket for the maintainers of the NumPy project to fix a bug. And then also what is very important is we see on PyPI, on the left-hand side here, a couple of statistics. So for example, stars on GitHub and forks, forks are basically copies that people made of the repository. And the more stars we see, the higher the chance that many, many people are actually using it. So 16,000 stars is already a big repository. Remind yourself that there are not that many programmers in the world and not all the programmers use GitHub and not all the programmers that use GitHub are star projects. So in other words, the numbers are not that high as maybe on Facebook, Instagram and some other social media websites. But 16,000 here is already a very good number. And it's already also an indication that the NumPy project is a project that we can use and that we can rely on. So let's go back to JupyterLab. Let's create a new notebook and let's rename it into third-party packages. This is usually what we refer to these packages, third-party, because they are not from the official first-party, which would be python.org. So whenever you want to download and package it, Python does not come with, then you have to go on the internet and somehow download it. Technically speaking, you could do that with a web browser and probably download some zip folder and then unpack it and then move the Python files to some certain folder on your operating system. But the Python community comes with tools that automate this. And the number one tool for doing that in the Python world is the PIP tool. PIP is an acronym and stands for PIP-installed packages. And so we want to execute it. And now this cell here, just as any code cell in JupyterLab, expects us to use Python code. So for example, as we saw many times before, I could do one plus two, for example. But if I want to use a tool from the command line, so to say, from within JupyterLab, there's also a way to do that. So I cannot just go ahead and type PIP. PIP is the name of the utility. So if I execute this, I don't get an error here. That is very interesting, but it's also not executed. So what we should do is we should prepend that with an exclamation mark. So it seems like PIP actually works without the exclamation mark. That is nice to know. I didn't expect that. But what the exclamation mark here means is that the code that follows is not Python code. So we are basically going to the command line of our machine and executing something on the command line in the terminal window, the one that we don't really want to use in this course, because it's a beginner's course. And the exclamation mark basically means the following code will be executed in a terminal window. However, it seems that if we don't put an exclamation mark here, Python or JupyterLab, to be precise, is smart enough to figure out that PIP is not a variable that we don't know what it is. So let's say if I wrote PIP2, I will get a name error. And I should also, I expected a moment ago that I would also get a name error if I just wrote PIP. But somehow, probably because many, many people forget the exclamation mark, the maintainers of JupyterLab actually made this work. But whenever you execute a utility that is not Python code from within JupyterLab, just remind yourself to put an exclamation mark there to be on the safe side. And then the PIP tool tells us, like here, a help message of how it is to be used. And the way to install something is by simply using the install command. So we'll write PIP install. And then the name of the package on PyPI, which is simply NumPy, we could also install from different servers. But by default, we use the PyPI server. And PyPI is the most commonly used server for doing that. So let's simply, say, PIP install NumPy and execute the code cell. And what this will do is it will install. It will go on the internet, actually. So this won't work without internet connection. And it will install something on your machine. However, as you can read, it says the requirement is already satisfied. So what that means is it's already installed. And the reason why this is the case is because for this course, remember that we don't use python.org's Python distribution, but we use the individual edition of anaconda.com. And as I said in the video where I showed you how to install anaconda Python on a Windows machine, I already mentioned that the anaconda version of Python already comes with many third-party packages included. And now you understand what I mean with that. So in other words, very commonly used packages, third-party packages, like NumPy, are already packaged in the anaconda Python version, which is why we read requirement already satisfied. However, I give you a second example of a library that is definitely not installed here. And I know that for sure, because I wrote this library myself, and I know that it's definitely not packaged in anaconda. So let's install pip install, a library called lalip, or lalip, linear algebra library. That's the hobby project of mine, where I basically re-implement NumPy by just using core Python for the tactical purposes. And I use that in an advanced course for teaching people or for showing how you can build up your own package. And that is just the application here. So let's go ahead and install lalip. And now we see that actually, because it was not installed before, my computer went on the internet, downloaded some file from PyPI, and installed it. And now after I pip installed both NumPy and lalip, let's go ahead and import NumPy. So let's go ahead and say import NumPy. And usually what you say is we import NumPy SNP so that we don't refer to NumPy as NumPy, but just SNP because it's shorter and we don't want to type NumPy all the time. And similarly, I could now go ahead and say import lalip, and this would install lalip, okay? So that's just the way of how we install packages, third-party packages from PyPI. And now let's also briefly look at what NumPy can do, because now that I have introduced NumPy, you may be interested in what NumPy can do. So remember that in this course, we often use a list of numbers and refer to that as numbers. And let's give it the numbers that we always use. So seven, 11, eight, five, three, 12, two, six, nine, 10, one, and four. So these are the numbers from one through 12 in no particular order. And that is now a list object. So let's go ahead and check the type of numbers. And indeed it's a type list. So what can we do with this? Well, what we've seen before is we can index into a list. So take out an individual number. So for example, we could say, let's get the first number out of the list by using the index operator, the brackets here, and index zero because Python is a zero-based language. But also, we could simply go ahead and loop over the numbers. So we could say four number in numbers, and then we'll say print number, and that will print out all the numbers individually on the line on their own, okay? On the contrary, if I went ahead and I said print numbers plural, then we would print out the list like this, okay? So there is a difference between looping over the numbers and printing and not looping and printing. So that is a list object. But now you may wonder, what does NumPy have to do with this? Well, NumPy is a so-called array-based library. So it introduces a data type, which we generally refer to as the array. So let's see how we can use that. So we go ahead and we refer to that as np. So we use the dot operator, the attribute access operator to get a hold of the array constructor. So np.array is a constructor, just like the int float and list constructor in core Python. And now what we do is we call the constructor and pass it our numbers list as the only argument. And we get back a new object of type array, okay? So let's maybe store that in a variable called vector. The reason why is because arrays, one-dimensional arrays are often thought of as vectors. Two-dimensional arrays would be matrices, but we keep things simple here. So now let's look at vector and let's also check the type of vector. And it's of type nd array and nd stands for n-dimensional array or short array, just array for short, okay? So what can we do with the vector object? So one thing we could do is we could also simply index into it by saying index zero, for example. This gives me also back first number. Now interestingly, we could also go ahead and say for entry in vector, so the name entry I just made up, you can use any name here. And we can say print entry. And this will do the same as the for loop over number. So we can loop over a list object, but we can also loop over an array object. So in that sense, the array object is what we would call a drop-in replacement for the built-in list object. So I'm not going into too much detail of how arrays are different in memory. So for short, they are highly optimized data structures for numerical operations and we will look into that in a future video, which is gonna be more in-depth and a longer video anyways because NumPy has some big concepts that we need to understand and it's too much for a beginner. But for now, let's simply do one thing to compare them where in a situation where they both the list and the array data type are different. So remember from an early video that if I multiply a list object by a scalar, for example, the integer two, what happens is I get back a new list object where the elements are repeated, in this case twice, right? Or repeated once because we're multiplying by two. That's an example of list concatenation. So we are concatenating together two list objects. And of course, I could also have done the following. I could also have done numbers plus numbers and that gives us the same result, okay? But multiplying is the more interesting case. So let's do the same thing with the vector. So two times vector. And now we will see a difference. Namely, if you compare that with the original vector, we see that we get back a vector of the same length. However, every element is multiplied by two. So in other words, while two times number up here is an example of list concatenation, two times vector down here is an example of basically scalar multiplication. And scalar multiplication you may know from your linear algebra course, okay? So they behave differently. And that is just one way in which a list object and an array object are different, okay? And yeah, that is it. That is how we can install third-party libraries. We have seen the example of NumPy, but also LAlib and we can then use the functionality that other people wrote and especially NumPy, we will look into detail a lot more in this lecture series. So I see you in the next video where we talk about how we can create our own modules and packages. So see you then.